Void lookups - just another SPF footgun
I have never been a fan of the Sender Policy Framework (SPF), mostly because it's not even clear what the primary purpose of SPF is intended to be: It's not designed to fight general SPAM, as general SPAM does not need to fake trustworthy origin domains in the first place. That would a necessary feature of phishing, but here SPF also fails to be effective, as SPF protects the MAIL FROM header of the SMTP session, while what's visible to the user is the From: header of the mail document itself, and SMTP does not demand those to match. Others have extensively written about other shortcomings of SPF.
But just when you thought you bent all your use cases for email to match the world view of SPF, SPF show what it's really capable of.
Perfectly valid SPF entry, mails getting blocked
I was informed by one of our customers that our emails were being rejected on their side. As a reason, I was given a screenshot of their security appliances log, which claimed a "DMARC error". Strangly enough, mails were getting through from time to time while others were outright rejected.
I checked the DMARC error for syntax errors or missing fields defined as required by RFC 7489. The DMARC record itself was OK. The DMARC lookup of mxtoolbox.com found no issues either. So I assumed a problem with the SPF record.
The SPF record basically looked like
v=spf1 a mx include:spf.protection.outlook.com ~all
Mail is mostly being sent via M365, some automated mails are still being sent out via our firewall (which is pointed to by the MX record for Exchange hybrid setup reasons) or from our website (hence the a mechanic). Nothing to see here.
From the logs provided by the customer's support, I checked the IP the mail in question had been delivered from. The mail came from M365 and was being delivered from an IPv6 address. I compared the IP address to the ranged authorized by the SPF record of spf.protection.outlook.com and found it to be valid.
Since there were no issues left I could think of, I contacted the customer's support in an effort to receive more information on the nature of the reported error. Their support had already been in contact with the vendor of said security appliance for clarification and was therefore able to shed some light on the actual issue.
Enter void lookups
RFC 7208, which defines SPF, features some pretty extensive Security Considerations section, which mostly deals with the abuse potential of DNS requests caused by checking SPF records. The SPF specifications places a number of limits to different kinds of DNS requests under certain conditions in order to prevent these abuses.
One of these limits affects DNS queries, which "return either a positive answer (RCODE 0) with an answer count of 0, or a "Name Error" (RCODE 3) answer". The RFC calls those "void lookups" and recommends a processing limit of two.
This is where an otherwise perfectly valid SPF record turns into mail delivery havoc with the spicy extra of an on-off behavior.
The SPF mechanics (the singly entitied to check the sender IP agains, the SPF terminology calls them "mechanics") requiring DNS resolution (such as a or mx) make no difference between IPv4 and IPv6 (so when it says a, it confusingly actually means "A" or "AAAA"). Microsoft was sending out mail partially via IPv6. This causes the recipient MTA, according to our SPF record, to first check the sender IP address agains a AAAA record of out domain (a mechanic), which returns an empty set as out domain name has no IPv6 address bound to it (strike 1!), then doing the same with the MX record, which has no IPv6 address either (strike 2!).
The combination of Microsoft sending via IPv6 and our domain name neither including an AAAA record nor an IPv6-capable host in its MX made the SPF evaluation fail.
Solution
In its own good tradition (looking at you, SRS), SPF itself offers a rather cluncy workaround in suggeting to use IPv4-mapped IPv6 addresses to compensate for its inability to tell IPv4 and IPv6 apart. The more elegant and reliable method seems to me to either replace DNS-bound mechanics such as a or mx with DNS-independent mechanics such as ip4 and ip6 where appropriate or to simply rearrange the mechanics, as the RFC mandates them being evaluated from left to right and processing ends on first match.