Monday, March 27, 2023

On DMARC, ARC and DKIM Replays

Introduction

I happened on a message looking at the email headers a while ago and noticed something strange in them: several ARC- headers, one of which looked suspiciously like a DKIM-Signature and another which was some permutation of the Authentication-Results header. I was vaguely aware of DMARC as a replacement for ADSP and found their working group trying to move it from Informational to Proposed Standard. ARC had arisen out of that working group for some unknown reason and no rationale is given for its existence in the charter so curiouser and curiouser... I decided to sign up to the working group to see what was going on. 

DMARC

Since I've written about ARC before, I'll start with DMARC itself. After ADSP was moved to HISTORIC for basically made up reasons and most of all politics, a group of people -- I assume it's from the industry group M3AAWG -- decided to take another bite of that apple. It remains unclear what motivated them and who the players were. Nor is it clear whether the main detractors of ADSP had a part in its creation. It would seem rather surprising that they would have for one main reason: DMARC is basically warmed over ADSP. DMARC did contain some new reporting capabilities for receivers to send reports to the originating domain, which sounds like it might be useful but who knows how well deployed it is in receivers since things that help somebody else's problems are not usually high up on the list of things companies want to deploy. 

The DMARC working group split the reporting off into another draft which is good because the policy protocol of DMARC has nothing to do with the reporting protocol. Since it was originally submitted as an individual submission Informational RFC, that seems perfectly fine they combine the two, and now going to proposed standard is perfectly fine to separate them. 

So what does that leave? As I said DMARC is basically warmed over ADSP. All of the DKIM related policy is essentially identical, from what I can tell with a bit of wordsmithing on the policy to make it different from ADSP for unknown reasons. The one thing they added was support for coexisting with SPF. When we originally were working on ADSP, there was no reason to get involved with SPF since they had their own policy mechanisms so why cause a turf war? 

I'm not entirely sure what motivated its inclusion but they did. One of the things DMARC seems to go to long effort is the concept of "alignment". Alignment if I understand this correctly is where the 822.From is in alignment with the domain creating signatures or SPF checks. As far as I can tell, this doesn't change anything over the wire for either SPF, DKIM or DMARC, so is not a protocol issue per se. That is to say that it truly is informational for receivers in sort of a BCP kind of way. I don't get the sense that it asks receivers to behave differently (unlike the policy of SPF and ADSP), but more to have a more clear definition of the way that DKIM and SPF can coexist in a receiver and derive some different cases to be considered. 

So removing the reporting leaves us with a document that gives a new operational lexicon for DKIM and SPF coexistence, and a few minor changes to the policy verbs from all I can tell. How exactly is that different than ADSP? Maybe there are some explicit policy protocol ramifications of the new-found embrace of SPF that I missed but that does not change the fact that the DKIM specific part of the draft is essentially identical to ADSP. If the reasons ADSP's move to HISTORIC are all still there, why is DMARC OK while ADSP is not? DMARC is not widely deployed especially with it's policy with teeth (ie, p=reject). DMARC is just as susceptible of being misconfigured too, right? 

This leads me speculate that there is some weird politics going on or some 4 dimensional chess that I don't understand. All of the usual suspects who hated ADSP are still active on the working group. It's not clear whether they are trying to sabotage DMARC or not. It's hard to imagine they had a change of heart though. If you think I'm insinuating that they are, you'd be wrong because I truly have no idea.

So in conclusion I'm rather mystified with what's going on there. But given that this group produced ARC of which the I'll go into in the next section,  it seems like the old saw "never assign to malice that which is adequately explained by incompetence" might be in operation here. Or something.

ARC

ARC is the original thing that caused me to come back to the DKIM world after close to 15 years of not paying attention to it at all. It is comprised of 3 things: ARC-Signature, ARC-Authentication-Results, and ARC-Seal. ARC-Signature is a DKIM signature with a minor addition of a new tag. It's not clear why it needed to be its own different header, but it is an experimental RFC so maybe that was part of the motivation. Likewise ARC-Authentication-Results seems like regular old Authentication-Results with the addition of a new tag. ARC-Seal seems to be a signature over the ARC signature and the ARC Authentication Results. 

So the burning question is why? What is this trying to accomplish? From what I can tell -- and it hasn't been easy to get straight answers -- the IESG didn't like that DMARC and ADSP before it caused trouble with mailing lists and the like that invalidate DKIM signatures. SPF has always been problematic for intermediaries which was one of the things that DKIM had an advantage, so long as they didn't change the message such that the signature still validates on the end receiver. Mailing lists have long added things like footers to unsubscribe and tags in subject lines which cause the signature to break. ARC was supposed to address this, apparently. The irony is that in the mean time many mailing list managers are now taking into account DMARC policy and acting accordingly to not trigger a more restrictive policy such as p=reject with various techniques of their own.

So what it seems to be trying to be accomplished is the binding of a DKIM signature to an Authentication-Results that the resigner's infrastructure produced. In the case of a mailing list, that would generally be the author's domain who wrote the message and its verification status. But ARC seems to suffer from some amnesia that an intermediary has always had the ability to add its own signature and has always had the ability to sign their own authentication results. This was fully the intent for going on 20 years. So that just leaves is us with these tweaks to the signature header and the authentication results header. It appears that they are trying to bind the two together. 

Why? Why is that important and what does it bring to the table that DKIM and signed authentication results can't adequately address? I tried really, really hard to get an explanation and I was unsuccessful. So they are inventing a completely new protocol and its associated overhead for one feature that nobody can explain why it's needed. That is really suspect.

So ARC basically faithfully recreates DKIM and Authentication-Results with one minor tweak that nobody can articulate why it's needed. How does this solve the mailing list traversal problem? It doesn't that I can tell. Well, it doesn't in any way that DKIM couldn't already do. DKIM can help with mailing list traversal if the mailing list signs using the mailing list domain (or really any domain it has control of). Receivers can develop reputation for that domain just like they can develop reputation for originating domains. But you don't need ARC for that. So it's a complete mystery why it was developed, and especially in a working group like DMARC. 

DKIM Replay

DKIM replay is the latest bit wft'ery coming out of this corner of the IETF. It's been known that you can replay a DKIM signed message for almost 2 decades. This is a feature, not a bug and was actually a design goal that separated it from the SPF approach. Seemingly some mailbox providers (including enterprise, I assume) have reputation systems to combat spam and phishing, so spammers try to game the reputation of sending services to get them to DKIM sign messages and piggy back off their reputation. This apparently harms the reputation of the signing domain eventually.

The attack seems to go like this: a spammer signs up to some service that has a good reputation (how do they determine that?) and starts running spam through it to get it signed. If the spam evades both the outbound spam filters of the sending domain and the inbound spam filters of the target (?) receiving domain, it can then be transferred to a server that the spammer controls and start blasting out its signed spam to zillions of mailboxes. This in turn causes something to happen on the receiver that hasn't been described (Bayes? something else?) to start to see it as spam and the receiver then gives a rap on the knuckles of the sending domain by decreasing its reputation. After enough of these it apparently starts to affect their deliverability (how do they know? my assumption is that most spam is blackholed not bounced). 

This seems to be mainly affecting big bulk email providers, but it could conceivably be affecting mailbox providers too. I assume it isn't much of a problem for enterprise, etc since they would presumably not be too plussed by an employee using their infrastructure for spamming. But who knows? It's not clear what steps these providers take to mitigate these attacks. At its base, the obvious solution for senders is to not send spam. There seems like there could be a lot of operational things those provider could do like filter their outgoing mail, keep track of accounts who are sending spam via their filters, correlating it with account age, and that sort of thing. If the receiver does the sender the favor of bouncing the newly discovered spam, they could correlate the bad behavior with the account who sent it and potentially ban them. Depending on their enforcement, it may be trivial for the spammer to make another account and rinse and repeat though. Ban evasion is obviously an operation issue but again we don't know how well they are at detecting that.

A lot of this is pretty opaque. That's because the mailbox providers are not keen to share what their secret sauce is to combat spam. There is an industry group called M3AAWG which presumably knows more but they are closed and I assume under NDA about what can be shared and what can't. So there are serious structural issues about how the working group can operate when the basic parameters surrounding the operating environment can't be disclosed. The DKIM working group was rechartered with the potential to write or update operational advice, but I'm not sure how that would work given the opaqueness. A BCP needs, after all, to know what is best and common. 

More worrisome is that M3AAWG could (and maybe will) write a BCP but it seems like they are the ones driving this new effort to get DKIM rechartered so it's pretty clear they don't know what to do since they are asking IETF to solve it. You can't write BCP that solves the problem if you don't know how to solve it, after all. And if they had some protocol solution(s) in mind, the typical thing to do is write a draft and bring it to IETF to vet it. That has not happened to my knowledge. There are some tentative drafts proposing solutions, but they don't seem like they have any consensus within the industry.

All in all this strikes me as a Hail Mary from M3AAWG to the IETF. They don't know what more they can do, and participants on the public IETF group don't know enough of the details to really know what to do either. And we certainly don't have the wherewithal to know if any proposed solution would work in practice. Maybe more information will be forthcoming, but it's not been encouraging to say the least.

As for the proposed solutions, some have been farcical like having mailbox providers strip out  the DKIM signatures -- it's hard to imagine a smaller Maginot Line since the spammer can just send to a domain they either control, or doesn't care. Another draft that I don't fully understand seems essentially to require an email flag day to be successful. Others seemingly want to add envelope information to the DKIM signature's signed headers. That seems deeply problematic on a number of levels and still isn't clear whether it would do any good.

So the working group was rechartered to solve a problem without any particularly clear way forward on how exactly it was supposed to do that. The proposed solutions don't seem like they would work in practice and there hasn't been a bunch more proposals to deepen the bench. That is not encouraging. Somebody (I think Scott Kitterman) mentioned that the basic problem is that it boils down to differentiating good uses of replay from the bad uses of replay. But that's nothing more than spam and ham, so we are left where we started. 

And it's even worse on the BCP front considering that the current participants don't know what the best common practices are beyond what they are currently doing, which they admit they don't consider adequate. BCP's aren't supposed to be speculative, after all. That is the hallmark of a research project for which IETF is not a good venue, and I doubt that IRTF would be much better given the issues of surrounding secrecy. 

Last of all, there is not a clear definition of what constitutes success in the first place. The spam game is not a matter of absolutes. It's a matter of probabilities and inflicting enough pain on the spammers so they go try some other means of getting their spam delivered. But for obvious reasons there isn't a lot of appetite to share what that looks like from receivers . So we won't even know when we "solved" it.

And then there is the perpetual problem of working group chairs being passive aggressive. It's bad enough when the cast of characters contains a number of them, but when chairs "Good Morning" you like Bilbo to Gandalf (oh sorry, I mean "submit text" when there is not even a working group draft to submit text to), you know that they are after their own solution or some other agenda. In Bilbo's case, it was to get Gandalf to buzz off. They were especially awful on the DMARC group, and the new chairs (or at least the active one) don't seem like they will disappoint on that front either.

This respining up of the working group was a mistake on many fronts and seems to contain a lot of wishful thinking. Given that current cast of characters don't have a very good track record of delivering solutions that work or do something new and useful (there's a lot of overlap between the DMARC wg, and even the old DKIM wg), that doesn't seem to be a recipe for success. And a research project would be even worse because the tools to do that just don't exist within the larger IETF community.

DKIM itself was at some level speculation when we first created it. That being able to authenticate a message that is tied to a domain may provide some utility for the receiving domains. That seems to have happened, but in that case we weren't trying to "solve" a particular concrete problem so it didn't matter that the way it was being used was opaque. The replay problem doesn't have that characteristic. We need to know. That's a problem.

Conclusion

All of all the reasons I left the DKIM working group the first time still seem to be in play these days. There is way too much wishful thinking and lack of the ability to determine success. Also there is clearly some weird politics going on with DMARC itself. Another reason I left. Once I can determine that the working group isn't going to do something actively harmful, I'll leave again. Or maybe they will run out the clock getting something out like the DMARC working group and I'll be dead for which I won't care again,