Rip Van Webble

Tuesday, June 20, 2023

RFC 8252 (OAUTH BCP) is a complete joke

About 10 years ago I discovered that IETF was working on OAUTH as a replacement for sites that need user credentials to do things on their users' behalf, typically for use to post stuff to social media sites at the time, but also as a convenient general login mechanism. I had a native app I wrote and I didn't like having to store user credentials so that seemed great. However when I thought about it it seemed there was nothing to prevent me to still get the login credentials from the user of my app. Native apps, like phone apps, have complete control of the UI unlike a web browser which can be assumed to be a neutral player from the user's standpoint. When an app asks you for your login credentials for, say, Facebook you have to make a decision whether you trust the app or not. With OAUTH it makes it seem like it's safe regardless whether you trust the app or not.

It isn't. Since the app has complete control of the UI unlike a browser, it completely controls what the user sees. There is an infinite number of ways for a native app to game the user to get their credentials while still completing the OAUTH transaction on the user's behalf. I brought this up to the OAUTH wg at the time and was roundly flamed by the working group and especially the lead author at the time (who it seems flamed out later for seemingly unrelated reasons). The end result was a little line or two blurb in the security considerations and the end result is, as I predicted, that nobody would care about OAUTH use in native apps and it would become commonplace.

Later I heard that the OAUTH wg had created RFC 8252 which at first I thought was vindication after the hostility I was shown by the wg. I was looking it up again today though and found out that instead of just being an informational "don't do this" it is in fact a BCP. The jist is that native apps should use browsers to do the login. This is tantamount to asking foxes to be nice while guarding the hen house. Or closer to home, that RFC 3514 and the evil bit should be employed. Native apps intent on stealing your credentials can still steal your credentials no matter what RFC 8252 says and the user will be none the wiser.

What the BCP should be is that OAUTH should *never* be used for native apps and that users should *always* be cognizant that an evil app can steal their credentials just like when I specifically had to store them for my app to do stuff on their behalf. How on earth did the IESG let this get through? I mean seriously, this is a complete joke. Asking people to not be evil is not security and is certainly not a best common practice. This RFC should either be declared historic or rewritten.

Monday, March 27, 2023

On DMARC, ARC and DKIM Replays

Introduction

I happened on a message looking at the email headers a while ago and noticed something strange in them: several ARC- headers, one of which looked suspiciously like a DKIM-Signature and another which was some permutation of the Authentication-Results header. I was vaguely aware of DMARC as a replacement for ADSP and found their working group trying to move it from Informational to Proposed Standard. ARC had arisen out of that working group for some unknown reason and no rationale is given for its existence in the charter so curiouser and curiouser... I decided to sign up to the working group to see what was going on.

DMARC

Since I've written about ARC before, I'll start with DMARC itself. After ADSP was moved to HISTORIC for basically made up reasons and most of all politics, a group of people -- I assume it's from the industry group M3AAWG -- decided to take another bite of that apple. It remains unclear what motivated them and who the players were. Nor is it clear whether the main detractors of ADSP had a part in its creation. It would seem rather surprising that they would have for one main reason: DMARC is basically warmed over ADSP. DMARC did contain some new reporting capabilities for receivers to send reports to the originating domain, which sounds like it might be useful but who knows how well deployed it is in receivers since things that help somebody else's problems are not usually high up on the list of things companies want to deploy.

The DMARC working group split the reporting off into another draft which is good because the policy protocol of DMARC has nothing to do with the reporting protocol. Since it was originally submitted as an individual submission Informational RFC, that seems perfectly fine they combine the two, and now going to proposed standard is perfectly fine to separate them.

So what does that leave? As I said DMARC is basically warmed over ADSP. All of the DKIM related policy is essentially identical, from what I can tell with a bit of wordsmithing on the policy to make it different from ADSP for unknown reasons. The one thing they added was support for coexisting with SPF. When we originally were working on ADSP, there was no reason to get involved with SPF since they had their own policy mechanisms so why cause a turf war?

I'm not entirely sure what motivated its inclusion but they did. One of the things DMARC seems to go to long effort is the concept of "alignment". Alignment if I understand this correctly is where the 822.From is in alignment with the domain creating signatures or SPF checks. As far as I can tell, this doesn't change anything over the wire for either SPF, DKIM or DMARC, so is not a protocol issue per se. That is to say that it truly is informational for receivers in sort of a BCP kind of way. I don't get the sense that it asks receivers to behave differently (unlike the policy of SPF and ADSP), but more to have a more clear definition of the way that DKIM and SPF can coexist in a receiver and derive some different cases to be considered.

So removing the reporting leaves us with a document that gives a new operational lexicon for DKIM and SPF coexistence, and a few minor changes to the policy verbs from all I can tell. How exactly is that different than ADSP? Maybe there are some explicit policy protocol ramifications of the new-found embrace of SPF that I missed but that does not change the fact that the DKIM specific part of the draft is essentially identical to ADSP. If the reasons ADSP's move to HISTORIC are all still there, why is DMARC OK while ADSP is not? DMARC is not widely deployed especially with it's policy with teeth (ie, p=reject). DMARC is just as susceptible of being misconfigured too, right?

This leads me speculate that there is some weird politics going on or some 4 dimensional chess that I don't understand. All of the usual suspects who hated ADSP are still active on the working group. It's not clear whether they are trying to sabotage DMARC or not. It's hard to imagine they had a change of heart though. If you think I'm insinuating that they are, you'd be wrong because I truly have no idea.

So in conclusion I'm rather mystified with what's going on there. But given that this group produced ARC of which the I'll go into in the next section, it seems like the old saw "never assign to malice that which is adequately explained by incompetence" might be in operation here. Or something.

ARC

ARC is the original thing that caused me to come back to the DKIM world after close to 15 years of not paying attention to it at all. It is comprised of 3 things: ARC-Signature, ARC-Authentication-Results, and ARC-Seal. ARC-Signature is a DKIM signature with a minor addition of a new tag. It's not clear why it needed to be its own different header, but it is an experimental RFC so maybe that was part of the motivation. Likewise ARC-Authentication-Results seems like regular old Authentication-Results with the addition of a new tag. ARC-Seal seems to be a signature over the ARC signature and the ARC Authentication Results.

So the burning question is why? What is this trying to accomplish? From what I can tell -- and it hasn't been easy to get straight answers -- the IESG didn't like that DMARC and ADSP before it caused trouble with mailing lists and the like that invalidate DKIM signatures. SPF has always been problematic for intermediaries which was one of the things that DKIM had an advantage, so long as they didn't change the message such that the signature still validates on the end receiver. Mailing lists have long added things like footers to unsubscribe and tags in subject lines which cause the signature to break. ARC was supposed to address this, apparently. The irony is that in the mean time many mailing list managers are now taking into account DMARC policy and acting accordingly to not trigger a more restrictive policy such as p=reject with various techniques of their own.

So what it seems to be trying to be accomplished is the binding of a DKIM signature to an Authentication-Results that the resigner's infrastructure produced. In the case of a mailing list, that would generally be the author's domain who wrote the message and its verification status. But ARC seems to suffer from some amnesia that an intermediary has always had the ability to add its own signature and has always had the ability to sign their own authentication results. This was fully the intent for going on 20 years. So that just leaves is us with these tweaks to the signature header and the authentication results header. It appears that they are trying to bind the two together.

Why? Why is that important and what does it bring to the table that DKIM and signed authentication results can't adequately address? I tried really, really hard to get an explanation and I was unsuccessful. So they are inventing a completely new protocol and its associated overhead for one feature that nobody can explain why it's needed. That is really suspect.

So ARC basically faithfully recreates DKIM and Authentication-Results with one minor tweak that nobody can articulate why it's needed. How does this solve the mailing list traversal problem? It doesn't that I can tell. Well, it doesn't in any way that DKIM couldn't already do. DKIM can help with mailing list traversal if the mailing list signs using the mailing list domain (or really any domain it has control of). Receivers can develop reputation for that domain just like they can develop reputation for originating domains. But you don't need ARC for that. So it's a complete mystery why it was developed, and especially in a working group like DMARC.

DKIM Replay

DKIM replay is the latest bit wft'ery coming out of this corner of the IETF. It's been known that you can replay a DKIM signed message for almost 2 decades. This is a feature, not a bug and was actually a design goal that separated it from the SPF approach. Seemingly some mailbox providers (including enterprise, I assume) have reputation systems to combat spam and phishing, so spammers try to game the reputation of sending services to get them to DKIM sign messages and piggy back off their reputation. This apparently harms the reputation of the signing domain eventually.

The attack seems to go like this: a spammer signs up to some service that has a good reputation (how do they determine that?) and starts running spam through it to get it signed. If the spam evades both the outbound spam filters of the sending domain and the inbound spam filters of the target (?) receiving domain, it can then be transferred to a server that the spammer controls and start blasting out its signed spam to zillions of mailboxes. This in turn causes something to happen on the receiver that hasn't been described (Bayes? something else?) to start to see it as spam and the receiver then gives a rap on the knuckles of the sending domain by decreasing its reputation. After enough of these it apparently starts to affect their deliverability (how do they know? my assumption is that most spam is blackholed not bounced).

This seems to be mainly affecting big bulk email providers, but it could conceivably be affecting mailbox providers too. I assume it isn't much of a problem for enterprise, etc since they would presumably not be too plussed by an employee using their infrastructure for spamming. But who knows? It's not clear what steps these providers take to mitigate these attacks. At its base, the obvious solution for senders is to not send spam. There seems like there could be a lot of operational things those provider could do like filter their outgoing mail, keep track of accounts who are sending spam via their filters, correlating it with account age, and that sort of thing. If the receiver does the sender the favor of bouncing the newly discovered spam, they could correlate the bad behavior with the account who sent it and potentially ban them. Depending on their enforcement, it may be trivial for the spammer to make another account and rinse and repeat though. Ban evasion is obviously an operation issue but again we don't know how well they are at detecting that.

A lot of this is pretty opaque. That's because the mailbox providers are not keen to share what their secret sauce is to combat spam. There is an industry group called M3AAWG which presumably knows more but they are closed and I assume under NDA about what can be shared and what can't. So there are serious structural issues about how the working group can operate when the basic parameters surrounding the operating environment can't be disclosed. The DKIM working group was rechartered with the potential to write or update operational advice, but I'm not sure how that would work given the opaqueness. A BCP needs, after all, to know what is best and common.

More worrisome is that M3AAWG could (and maybe will) write a BCP but it seems like they are the ones driving this new effort to get DKIM rechartered so it's pretty clear they don't know what to do since they are asking IETF to solve it. You can't write BCP that solves the problem if you don't know how to solve it, after all. And if they had some protocol solution(s) in mind, the typical thing to do is write a draft and bring it to IETF to vet it. That has not happened to my knowledge. There are some tentative drafts proposing solutions, but they don't seem like they have any consensus within the industry.

All in all this strikes me as a Hail Mary from M3AAWG to the IETF. They don't know what more they can do, and participants on the public IETF group don't know enough of the details to really know what to do either. And we certainly don't have the wherewithal to know if any proposed solution would work in practice. Maybe more information will be forthcoming, but it's not been encouraging to say the least.

As for the proposed solutions, some have been farcical like having mailbox providers strip out the DKIM signatures -- it's hard to imagine a smaller Maginot Line since the spammer can just send to a domain they either control, or doesn't care. Another draft that I don't fully understand seems essentially to require an email flag day to be successful. Others seemingly want to add envelope information to the DKIM signature's signed headers. That seems deeply problematic on a number of levels and still isn't clear whether it would do any good.

So the working group was rechartered to solve a problem without any particularly clear way forward on how exactly it was supposed to do that. The proposed solutions don't seem like they would work in practice and there hasn't been a bunch more proposals to deepen the bench. That is not encouraging. Somebody (I think Scott Kitterman) mentioned that the basic problem is that it boils down to differentiating good uses of replay from the bad uses of replay. But that's nothing more than spam and ham, so we are left where we started.

And it's even worse on the BCP front considering that the current participants don't know what the best common practices are beyond what they are currently doing, which they admit they don't consider adequate. BCP's aren't supposed to be speculative, after all. That is the hallmark of a research project for which IETF is not a good venue, and I doubt that IRTF would be much better given the issues of surrounding secrecy.

Last of all, there is not a clear definition of what constitutes success in the first place. The spam game is not a matter of absolutes. It's a matter of probabilities and inflicting enough pain on the spammers so they go try some other means of getting their spam delivered. But for obvious reasons there isn't a lot of appetite to share what that looks like from receivers . So we won't even know when we "solved" it.

And then there is the perpetual problem of working group chairs being passive aggressive. It's bad enough when the cast of characters contains a number of them, but when chairs "Good Morning" you like Bilbo to Gandalf (oh sorry, I mean "submit text" when there is not even a working group draft to submit text to), you know that they are after their own solution or some other agenda. In Bilbo's case, it was to get Gandalf to buzz off. They were especially awful on the DMARC group, and the new chairs (or at least the active one) don't seem like they will disappoint on that front either.

This respining up of the working group was a mistake on many fronts and seems to contain a lot of wishful thinking. Given that current cast of characters don't have a very good track record of delivering solutions that work or do something new and useful (there's a lot of overlap between the DMARC wg, and even the old DKIM wg), that doesn't seem to be a recipe for success. And a research project would be even worse because the tools to do that just don't exist within the larger IETF community.

DKIM itself was at some level speculation when we first created it. That being able to authenticate a message that is tied to a domain may provide some utility for the receiving domains. That seems to have happened, but in that case we weren't trying to "solve" a particular concrete problem so it didn't matter that the way it was being used was opaque. The replay problem doesn't have that characteristic. We need to know. That's a problem.

Conclusion

All of all the reasons I left the DKIM working group the first time still seem to be in play these days. There is way too much wishful thinking and lack of the ability to determine success. Also there is clearly some weird politics going on with DMARC itself. Another reason I left. Once I can determine that the working group isn't going to do something actively harmful, I'll leave again. Or maybe they will run out the clock getting something out like the DMARC working group and I'll be dead for which I won't care again,

Tuesday, August 3, 2021

Some of the Things You Get Thrown into When First Hired

Drinking from the Fire Hose

So you're fresh out of school with a shiny new CS or Software Engineering degree, congratulations! You are totally unprepared for the reality of being a software engineer! Seriously, you are about to be thrown into a pit of despair and suffer imposter syndrome! You're going to have all of your algorithms at the tip your tongue, your data structures and graph theory raring to go and... find out that almost all of it is rarely used because it's all implemented in libraries for the languages you use. Which isn't to say that it was useless because you still need to know what you're looking for. But so disappointing.

This piece is not even comprehensive so in fact it's much worse than what I paint here. And if you're in some specialized field you'll have to learn that specialization on top of all of this, like if you're doing scientific or robotic programming. But all is not lost: despite this long, long list of things you're going to have to deal with the vast majority of engineers make it, and even excel in it because they're so excited after the grind of college to actually start using the knowledge for their craft.

Development

Development Environments

Let's start with the physical development environment. Pre-pandemic, the standard issue space for engineers is to put them on banquet table in row crops pretty much shoulder to shoulder. The rationale was to "promote interaction and communication", but the reality is that the first thing people do is get headphones and turn them on high so they don't hear anything and are impossible to be distracted. So the rationale never passed the sniff test which lays bare why HR does this: money. The pandemic has fortunately shown that the shoulder to shoulder physical "networking" that never happened in real life is trivially replaced with a longer wire which was the way it actually happened. Chatting across town is just the same as chatting two seats down. This is a win for everybody, but you will have to learn how to get into routines when you telecommute and how to deal with human interaction when it's needed. Fortunately, just about everybody is in the same boat figuring this out so you actually have an advantage of not having any expectations.

Next up, they have probably had you write some code while you were in school. You may already be writing your own code for your own projects. Maybe they specified the development environment maybe they let you choose your own. If the former, they likely threw you into some fancy integrated development environment (IDE) which are often language or platform specific (think Eclipse for Java or Xcode for Apple). IDE's can be nice, but they aren't inevitable and they can too often become their own ends, rather than a means to an end (see section on Frameworks). Learning them and their idiosyncrasies can be time consuming. My experience is that they are often rife with bugs (looking at you Xcode) and inexplicable behavior where googling is your only hope. And there are lots of them making the likelihood of fuckery exponential.

While you're not always going to have a choice because of operating environments or company mandates, it's really good to have a base line capability to edit code, compile or whatever you need to run it, and be able to debug it where debugging by printf is perfect valid and good. There are tons of editors out there and its religious which one is best, but vi, emacs, and others are all good choices. Emacs is, of course, the best because my god told me so and I believe Him.

Debugging

The ability to intelligently debug is an essential capability of any software engineer. They probably didn't teach you how to debug other than mentioning that debuggers exist for whatever languages they were using, and then more likely not even that. There are a variety of ways to debug something from simple using of printf and tailing logs to sophisticated debuggers with breakpoints, watchpoints, the ability to look up variable symbolically with arbitrary expressions, etc. For things like the web, there are built in application specific debuggers that make traversing the DOM really easy.

Debugging is much akin to the Scientific Method. First of all you find out that something is misbehaving so you make a hypothesis about what might be going on. You then create experiments to test your hypothesis and rinse and repeat until you have a hypothesis that meets the observations. You can then attempt to fix the problem which further confirms your hypothesis. Code review is really nothing less than peer review in the Scientific Method where outsiders can throw darts when the fix looks like a Rube Goldberg contraption that look like it fundamentally misses what the core of the problem is, and that what you have is a bandaide not a fix.

You will learn about one of the deadliest of all bugs: the Heisenbug. Like the Heisenberg Uncertainty Principle which states that you can't know a particle's position and momentum at the same time with accuracy, a Heisenbug similarly vanishes when an attempt to observe it is made. It is maddening and especially prone with multi-threaded code with race conditions.

A cousin of the Heisenbug is the Schrodingbug. While hunting for an obscure and intermittent bug you finally find the cause and how to fix this. Unfortunate for you, you cause the bug's wave function to collapse and cat died and all hell breaks loose until the patch is applied. Once you detect that this code should have never worked at all, it's curtains for the kitty.

Learning Languages

It really annoys me and is a peeve that companies hire for $LANGUAGE programmers. The reality is that languages are tools and frankly the differences are mostly angels on a pinhead. You will learn languages over your life (see SHINY below) and they will change and evolve whether they need to or not (see FOR ITS OWN SAKE below).

It's not to say that language differences are superficial, but a lot of them are. By far more important in my opinion is the richness and consistency of the libraries that you can access easily from them. Some languages get traction with new stuff going on and become go-to. A current example is with machine learning and Python. I haven't checked for sure, but I doubt there is anything inherent with Python that makes it good for ML, it probably just sort of happened. If you have a not-run-of-the-mill problem, libraries and other goodies should be a big consideration in choosing a language.

Last there are some important considerations for language choice. Memory management with garbage collection or not. Object Oriented or not. Raw access to hardware features or not. Typed or not. They all have their tradeoffs and as always there ain't no such thing as a free lunch. My experience is that most tasks don't require much resources so it far more important to optimize for the speed of coding and maintainability -- not the speed of the code. Even if you end up having a scary inner loop, you can often architect it to drop into low level code which is controlled by a higher level language, either as a native extension, or some other means.

API's and Libraries

API's and libraries are the lifeblood of writing code. When you're learning a new language a lot of your time is going to be spent hunting down how to not reinvent the wheel. No seriously, you don't need to reinvent malloc (though I have). You need to get the skills to find things so that you can concentrate on whatever the problem at hand is. When you're new to the scene you're probably going to be given a language to work with which will imprint on you like a baby bird. Mama bird is goodness and never wrong. There can only be one best mama bird and her ways will always be the one true way. Those of us who have been around a long time see mama bird -- and the whole flock behind her who look for all intents and purposes the same.

That's not to say that all API's are of similar quality of course. The C runtime library is sort of a mess and definitely shows its age, and why things like PHP and PERL who slavishly copied it really missed their chance. But once you get standardized facilities like oh, say, hashes it really doesn't make a lot of difference if they are called a Dictionary in Python or an Object in Javascript: they all do pretty much do the same thing albeit with different interfaces and/or syntax. Your job is to recognize these patterns and then go hunt for the equivalent in whatever the base API's are for your language.

API's can also be used define calls over the net for various services. In this case you are not required to make the API represent a Remote Procedure Call (RPC) and frankly those seem to have gone out of fashion (buh-bye SOAP), but it is conceptually the same as pushing parameters onto a runtime stack to call a method. With both, you'll learn that what is needed is protocol agreement. That is, the thing making the call and the thing interpreting the call must agree to what parameters are there, what is optional, and so on.

Network based API calls may or may not come with language specific wrappers that hide some of the messiness. You should probably take some time to see what's going on under the hood at least for a few of them just so it's not so mysterious because some day you might be called on to to design one yourself like it the next section.

Writing Libraries and/or API's

Hopefully this is not one of your first tasks because designing libraries and API's is really an art as much as a piece of technical competence. There are a lot of variables and considerations to designing them. Designing them upfront is usually the easy part because you have some functionality that you want to expose and that is purpose built for a given task. Great! That was easy. You most assuredly fucked up.

Requirements change. Features are added. You get more and more users using your API. How easy is it to modify the API? Can you make breaking changes? Can you deprecate things that were fuck ups? How do you design API's that are more resistant to breaking changes? How do you go about deprecating mistakes and/or obsolete? How much churn with users is acceptable?

I'm not going to say what makes a great API designer because I'm not an expert at it and when I've had the chance I like to beg, borrow or steal from existing API's. But the one piece of advice I'd give is to be very reluctant to expose a net facing API for as long as possible and make certain that it makes sense from a business standpoint. Maintaining code that is used by thousands of sites that started out as a "gee it would be fun for my programmer friends to be able to play with this" will make you very sorry you didn't think it through.

Algorithms

The truth of the matter is that the vast majority of programming is mundane. For any one project, there is likely to be only one or two interesting algorithms. If you're at a place with more senior engineers, that algorithm is not going to have your name on it. I've been lucky to have been able to design some really interesting algorithms at a young age, but I was working at startups, one of which I was literally the only software engineer. Sometimes the sharks aren't hungry when you're thrown into that shark tank, but these days VC money is usually not naive on that front and if they are, they view it as a lottery ticket which will get re-engineered if needed.

What you can do is find those key algorithms and find out why they are key. Was it implemented well? What are its advantages? What are its deficiencies? Has it been optimized? Does it really need to be optimized to make an operational difference? If you think it could be improved and it will make a positive difference should you bring it up with who maintains it? These sort of things are usually somebody's baby and you're about to call it ugly. You have to learn to be tactful. If you can prove your changes in reality rather than theory that bolsters your case.

"Hey, I've been trying to understand $ALGORITHM and have been playing with it on my own. Here are some things I hacked on and made it $X percent faster is this reasonable or am I missing something?"

Frameworks

Ok, I'll be right up front: I am a framework skeptic. See my section on "for it's own sake" for one of the big reasons. Like designing a computer language or writing an operating system, it is often the life goal for every self-appointed hot shot engineer to design a framework to do something. That every other framework that has come before it is shit and Only I Can Save You. Sorry, you're not and the chances that your framework is anything beyond mediocre are vanishingly small. Unless your entire existence is wrapped up in your framework and its evangelizing, your chances are pretty much zero of it being important.

As for frameworks themselves they are far too often Procrustean. The author has a view of the world and the only way to salvation is to view it that way too. Rails advertises itself as shamelessly having that attitude, but the fact of the matter is that they all have that attitude even if it's not stated. Frameworks get old and creaky, often a victim of their success without acknowledging their shortcomings. New frameworks are a dime a dozen and usually riddled with bugs and poor design and in the end don't solve the problem any better than what they are trying to replace.

To keep whipping Rails, people went oohh---awww when a single command could generate a web site with all of the CRUD operations generated from templates connected to database tables. Nobody had the presence of mind to ask who would use such a web site. In the early 80's Ingress was a relational database which had a front end program called QBF (Query by Form). Rails is essentially QBF 40 years removed. And thus without asking those basic questions an entire generation of programmers started using Rails, all to find that that is not how real web sites are designed.

The flip side of frameworks are its users and for young engineers, leads us to the next section...

Shiny

Shiny is a subspecies of Fear of Missing Out (FOMO). Young engineers are completely convinced that most senior engineers are complete idiots who are stuck in their ways and that if only they had youth and vigor they would be able to appreciate the sheer beauty and worth of $SHINY. Of course it's going to revolutionize everything. I mean, they say so themselves! I'm reminded about AWS Lambda when it first came out. What is this I asked? Investigate a little: oh, new age batch jobs. Oh and Docker, what is this? Investigate a little: oh, new age time sharing. For the most part nothing is new under the sun and it's all been done before. The canonical trap that young engineers spring with Shiny is lock in. Somebody isn't giving you this wondrous new miracle for the good of humanity. They are far too often trying to lock you into their walled garden. Shiny is almost always the enemy and should always be viewed with extreme suspicion. We didn't get these grey hairs for nothing.

A subsection of Shiny is language-isms. Lots of languages like to generate new and shiny ways to code something up that is completely idiosyncratic to that particular language and are impenetrable to somebody not as familiar with the language, or even people who are very familiar but are not caught up in the desire for Shiny. If you can design something with relatively language independent constructs and it doesn't materially hurt performance goals, it is far preferable to do that from a maintenance standpoint. Not all engineers have the same level of language archana and even if they don't need to fix something, they might need to look under the hood at how it works for maybe a similar problem. Don't be that dick who makes it impenetrable gratuitously.

For Its Own Sake

All things fill to available capacity. It's the law of the land. Something similar happens with software projects: they don't know when they are done. The can't know when they are done, because that admits there is actually an end state which is tantamount to defeat with all of the similar projects who can't know they are done for the same reason.

Like Shiny, newer should not be taken as better on its face. If something is working well for your purposes and has good bug and security patching, there isn't a lot of motivation for upgrading for the sake of upgrading. Upgrades cause churn and either create bugs or expose bugs. The latter is OK, but the former is not worth it unless there is good motivation.

Interacting with the OS

At this point the world looks pretty Unix-y. I don't know what the percentages are for servers in the backend but Linux has to be dominate. For front end you basically have three choices: Web, IOS (Unix), and Android (Linux). Windows as an OS is not terribly relevant since writing native apps for it is pretty stagnant. Yes, laptops and desktops are still overwhelming Windows, but that doesn't mean it has a lot of relevance to you as a new programmer. Linux and its distos are generally free and you are free to pick and chose. Windows is a business model with walled gardens they are enticing you to go into. It's best to stay out. Same goes for other walled gardens like AWS.

So you'll need to learn the basics of how Unix OS calls work. Much of that is abstracted away in higher level languages where you can open a URL as easily as you can open a local file, but a lot of the API's in the OS parts of languages are patterned after Unix OS calls and Berkeley sockets. You don't need to go crazy understanding every system call -- mmap and brk are probably not going to be thrown at you any time soon -- but open/close/read/write/lseek may. Unless you're writing relatively low level, you're probably not going to need to know much about how signals work, but if they gave you a intro hardware architecture course, they are hardware interrupts translated into user space.

Mostly you need to get familiar with the basic of the OS itself but just as importantly you need to get familiar with all of the utilities from the command line. It's OK to not know how to use find(1) off the top of your head because man(1) is there to help you. Using locate(1) to find files, learning how to redirect output so you can show somebody else that something hosed is going on... all of these things are going to be a daily part of your job. You're going to have to learn them quickly because they didn't teach you any of this in school.

Networking in Reality

They probably taught you about the OSI network stack. Unless you're a networking geek like me that's probably about the amount you need to understand the plumbing that goes on after you bits leave your program. But there really is much more to it than that within your app. The world is pretty much clients and servers with clients doing CRUD'ful operations and servers serving it up. You need to also learn about things like Websockets which unlike the client server paradigm, servers push out data proactively to clients. Think chat clients. You'll also need to get an understanding how to distribute load for server pushes so that you'll end up need to understand message buses like RabbitMQ.

If you are doing anything remotely web related you have to learn about Ajax calls (aka HttpXmlRequest). The are a fundamental part of creating a modern interactive web app. My personal favorite design pattern is to have a skinny backend which has two purposes: serving up data, and performing access control. For the front end, it takes the raw data and builds the UI. Others may like more in the backend, but fundamentally it's going to be a mix at the very least. The days of static web sites are long gone.

If you want to add audio and video and especially conferencing you're going to learn how all of that is done. You don't necessarily need to know the nitty gritty of RTP transporting the output of codecs, but it is useful to know that you have those tools at your disposal if they are appropriate for an app. But you might need to know that integrating a point to point conference is cheap and easy, but if you need audio mixers and video muxes it becomes a more costly endeavor.

Bottom line is that the world is a network and it's the way you deliver data from source to destination.

Optimization

There are 10 types of people in the world: those who have heard Knuth's quote about optimization, and those who haven't. Knuth's quote is:

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."

If you don't even know who Donald Knuth is, your degree is revoked. While it's good to not write gratuitously gross algorithms, the truth is that most code does not need to scale to anything significant. Your main job just is to just get something up and running, not create a master's thesis. O(N) is just fine for the vast majority of in-memory searches.

My general philosophy is to get something up and running and figure out where the hot spots are later. This is especially relevant with the unending changing requirements and scope creep. The thing you were assigned in the first place may end up looking nothing like what it ultimately needs to do in the end. Any optimization you do is very likely to be wasted time. Instead of spending a lot of time optimizing, spend more time making your code easier to refactor. That is, always consider hedging your bets that code might need to change in ways that reflect new requirements. One way to do this is write pluggable architectures. Don't go crazy though. Only do it for the highest level stuff.

For core mechanisms for key algorithms some amount of optimization is fine mainly to prove that it can be optimized further if needed. Optimization follows an exponential curve with efficiency on X and effort on Y. Sometimes companies will pay millions of dollars to optimize algorithms which are asymptotically close to zero like Fintech trading, but they are exceedingly rare.

Performance Profiling

So you finally were forced to face needing to optimize that you assiduously avoided by reading the previous section and postponing optimizing. Optimizing is something of a science and an art. It's both getting concrete data about what is going on and then often re-imagining how the same requirements can be done in better ways.

The are really two different kinds of profiling: ad hoc where you are pretty sure you know what's running hot, and then using built in profiling software provided by the disto or the language you're using. The former is easier to grok because you know what you're looking for and are trying to confirm or deny that it is part of the problem. The big problem with its is that you can be wrong in your conclusion about the problem and instead just nibble at the edges.

Bring in performance profilers. Profilers usually work by setting up hardware clock interrupt and sampling what's going to produce a histogram of where the program has been and how much time it's spending. The classic for C programs is gprof. Various higher level languages have bindings for their own profilers too which is necessary since things like gprof doesn't understand, say, Python's internal symbol table or anything like that. They are no panacea. Often they can be misleading because they show core functions getting hit, but not who is calling them. Some may have the ability to record farther up the call stack but usually at first blush it's just the active call that is recorded. And then of course profilers are invasive so if you have any kind of real time quality to your code, you get Heisenprof to contend with too. Last for upper level languages, their profilers can be fiddly and not terribly well supported. They are often idiosyncratic in my experience and not all that easy to understand.

Fortunately profilers are not needed or even useful for a major source of poor performance: database interactions. The curse of trying to map an object model onto a relational database is the source all kinds of gigantic fails. Remember that "wow, I pressed enter on 'rails new foo' and I now have a web app"? Uh, yeah. Rails is hardly the only perp here, but it is very representative of the mindset of not thinking about what is happening in the underlying database. Out of sight, out of mind. Except when it matters.

It is essential that you understand how the underlying database works, and how to see if the ORM (Object-Relational Mapping) is making good decisions about handing the mapping of objects to database. This is true of *any* database type, not just SQL. By way of example to show that even I who has been around relational databases for 4 decades, I can get snookered too. The simple way with Rails is to create models which are really just a layer on top of database tables and let the ORM deal with the joins. I had a small table with the definitions of a particular items which I'd join with another table which would fetch those definitions. This saves space in the other table since it's just endlessly repeated data.

The problem is that the other table got really, really big and it was indexed using a B-tree. B-trees you know full well with your shiny new degree are O(log n). So having to look up a key in the big table may involve looking up many many intermediate nodes on the tree at disk speeds (even with flash, it's still bad). I might be getting exactly how the problem manifested wrong because it's been many years, but the jist of it is that joins are not always your friend. Sometimes data replication is far and away the better choice.

I'll end here by saying that you should always be suspicious of tools that make things "easy" as you scale up. You should also have some humility to say that you have no clue what is causing something to be slow and that it will take some time to investigate it. Last, know when to say good is good enough. It's extremely tempting to once you've opened the hood that you tinker with it for a month, weeks longer than it really need to be tinkered with. Stop while you're ahead.

Databases

Databases are the beating heart of almost all applications. As shown above, they can also make or break an app when used incorrectly or carelessly. While nobody is saying that you should be a full blown DBA who knows every nook and cranny about SQL archana, it is good to understand the basics of what they do and why they do it.

There has been a somewhat recent backlash to SQL like in the last decade with NOSQL databases. This is by and large SHINY in my opinion. In the vast majority of cases you do not need Cassandra and its ilk, and using it is pretty much showing that you have no clue and are just following trends. Likewise things like MongoDB. If you don't understand ACID and its tradeoffs, you are doomed to repeat every mistake that comes from being ignorant of it. Databases are really simple and fast when you don't support the hard things that slow them down.

Here's the thing though: it's not like relational database vendors are blind and can't see the good parts of current trends. Mongo pioneered using JSON blobs and querying based on that. Postgres saw that and went "great idea, we can do that too!". I don't even know at this point if Mongo can be ACID compliant (I imagine it is now), but they had to retrofit it back in while things like Postgres has had it for decades and knows how to optimize when the various aspects of ACID are not needed.

The other thing you need to learn is that databases are pretty much forever. Once you make the decision to use one, you are going to be saddled with it on a project pretty much for life. Consider the problem you'd face: trying to go from databases that even if they are both SQL they have different extensions which are incompatible. Multiply this by fail if one or more are different kinds of database. Now consider that you are making this change while you're whole system is running in a world that doesn't sleep.

Lock-in is multiplied by a zillion when it's a database on a hardware walled garden like AWS. Not only are you locked into a database for life now you're locked into a hardware platform too. You are fucked squared. Don't do that.

Your biggest take away should be that you are not a database expert just because you have a new CS degree and that generations of people studying this vital problem are not idiots. There are places for specialized databases where scaling is extreme but the likelihood you have or will have that problem is almost zero. Be extremely conservative.

Threading and Concurrency with Cores

Like a lot of things parallelization is as much of an art as it is a skill. When I was young I found out that I had a somewhat uncanny ability to visualize concurrency and especially race conditions that the company's engineers we were contracting for were amazed. I don't really know if it's ingrained or not, but it is something you'll need to at least be aware of if you're using threads and other parallel programming.

They no doubt taught you about mutual exclusion, but it's trickier to figure out whether it's needed or not. Mutexes are definitely not free so you want to avoid using them if at all possible, but if you miss one that is needed, prepare to weep because it will probably arrive as a Heisenbug. So you're going to need to be able visualize the situations when one thread can mash on common data causing another thread to puke.

One last thing since I don't want to belabor this much is that there is an unfortunate misconception that throwing cores at the problem with multiple threads will solve everything. If you are using an interpreted language think again: you need to understand the Global Interpreter Lock (GIL). With almost all higher level languages, there is lots of code that is not reentrant and thus not thread safe. Languages get around this by locking the interpreter when it needs to execute that code. The net effect is that all of those nice and fancy cores you are paying more for by the minute cannot be used except in cases where a thread blocks. There are quick diminishing returns on hardware cores in the face of a GIL. There are some languages which don't have a GIL but they are far and few between. The answer for the most part is to just have a bunch of processes with different interpreter instances. Sorry threading, you are no panacea.

Top Down/Bottom Up/Middle Out

Now we come to development styles. Different organizations or even dev groups are going to have different styles. Some times that will be out of necessity be strictly enforced. You don't want your space telescope launched to a La Grange Point to have coders be haphazardly playing around and fix bugs as needed. I personally have never understood Bottom Up but there are people who are like that including a friend of mine for whom I found it pretty maddening.

Top down is also necessary in situations when you outsource writing code. If you get into a situation where some suit has the bright idea that they can save money by outsourcing writing code, the first thing you learn is that outsourcers will write exactly what it is you asked for and nothing more. They won't hedge bets, build in future proofing, prepare for obvious new features. Nothing. They write to the spec and that is that. If you are bad at writing specs, you're fucked and it's your fault. If you have changes in requirements as always happens or that the there are ambiguities in requirements you pay. If they are significant you may pay a lot. I personally would not want to do this, but there may be a time when you have to. The key take away is that they are a business trying to get your money.

My personal favorite way to develop is middle-out. I often have an idea that sounds interesting but I'm not sure exactly how it will play out. I don't have all of the requirements and am not entirely sure what it is that I'm going for. So I build quick prototypes and see what happens not paying much attention to code cleanliness or speed or anything else, just trying to understand the problem space. This lends itself well to rapid prototyping, especially if you need to do a sell job to management to see where your head is at and why what you're working on is useful. If it's not evident, I'm obviously a fan of the 20% own time kind thing that Google and others have.

Refactoring

Middle-out programming explicitly relies on refactoring as a strategy: you find out that something is worthwhile and then you refactor it to clean it up and make it real. Refactoring is the process of looking at all of the moving parts and see how they interact with each other. You often find that parts are reusable in ways that you weren't thinking of when you originally designed a piece of code. This isn't a failing and if there is a failing to be had, it is on the side of over-generalizing things that are not in fact general. In my opinion, the more private methods you have, the better. Methods should only be promoted to public when there is a clear need because public implies support. If you have a public method, others have the right to bitch if it doesn't work correctly for their use.

Refactoring is just a way of life with writing code. You'll be doing it often and it is inevitable. But there is good refactoring and bad refactoring. Good refactoring is like washing your car to get rid of the gunk that accumulates over time. Bad refactoring is like finally getting around to fixing your car after three wheels have fallen off. Be the former and treat your car well.

Compartmentalization

Everybody is taught about modularity but in reality is is a skill you have to learn on the fly. Like most things, there is a happy medium. Lots of young programmers make a zillion and 7 modules or classes that have one of two methods and that's it. Since they end up in separate files most often, it makes it a pain to search for them. Many methods are really purpose built support methods for a public method. The likelihood of their reuse is minimal and can be detrimental if you have to hunt down who is using that method if you want to change its functionality.

In my opinion methods should be private until proven otherwise. DRY (don't repeat yourself) is nice in principle but it can be it can be taken to extremes where instead of a nice purpose built helper function you have this rusty Swiss army knife with blood all over it from people trying to use it. If a method has a shitload of mode and flag qualifiers, it's probably that knife you'd sooner avoid.

On top of that, short little functions/methods likely make it harder for things like CLANG and GCC to do loop unrolling. Maybe they can do this across modules/classes but they most likely have to be more careful. I don't know how much loop unrolling various interpreted languages do, but it's probably even harder. The main point here is not to go crazy with DRY as unifying principle.

Tools

Building tools is an often overlooked essential skill. Lots of programming is repetitive where you need to monitor something or generate output to be munched on to get stats. Basically anything and everything. When I was a young engineer I was one software engineer in one company supporting dozens of engineers in another company with a hardware product (a laser printer) they had contracted us to build. They had no experience with embedded systems and were getting a crash course on how you write for and debug them on the fly. This was before email across sites and I lived a hour away from them so didn't go down and visit often.

I had the idea that I needed a debugger for the hardware for my own purposes. I decided to make it pretty fancy so that I could view variables symbolically, set break points, do profiling, etc. I thought that this was pretty neat from a feather in cap standpoint, but frankly in hindsight it is probably the single most important thing I did that caused the project to succeed. The reason is that it gave our client's engineers a familiar looking way to run and debug their code in something that was otherwise totally alien.

The moral of the story is don't underestimate how hugely important tools can be. It's easy to get wrapped up with tools as their own ends, but most companies are tool-poor not tool-rich.

Googling and Other Scrounging

So you freshly have your degree and can spout every algorithm and data structure with pinpoint accuracy to the recruiters in the hiring process. You land you job and proceed to use that knowledge to write your version of all of those algorithms. They then fire you. Why did that happen? Because wheel reinvention is a waste of time, and almost certainly your implementation is going to suck in comparison to somebody else's who probably wrote a masters thesis on it decades ago. So why do they ask you those questions in interviews? Because you are green and they are lazy.

The reality is Google-fu and being able to scrounge on the net is the way that actual research is done these days. Stackoverflow is not just a handy site on the net for programmers it is the expected way you'll find answers to your questions. You have a bizarre error message that makes no sense? Google it explicitly and see who else was stumped. Google-fu is its own skill and you need to learn it. Figuring out the right incantation can be a dark art in many cases, but the more you work on it, the better you'll get.

UI Design

You are not a UI designer, you say. That is for somebody else and never shall your estimable engineering hands be soiled with such dirty inconsequential details. You are in for a rude surprise. Nobody says you have to be an ace graphic designer with taste right out of Italian fashion houses, but everything has input that needs to be ground on and then shipped out. You may not need to do the actual window dressing itself, but you may need to make a first approximation that is so hideous that actual UI designers can't wait to get rid of your affront to the design sensibilities.

Frankly everybody should know the basics HTML layout and some CSS. Even if that's not your day job you often come in contact with the need for internal tools. Far too often internal IT is not going to fund one of their folks to do this work and then -- more importantly -- maintain it. And your UI designers aren't wasting their limited time making your tool look better than the turd it is. So it's going to be up to you, and you're going to have to learn this on the fly too.

Security as Actually Practiced

Interacting with the World

Network Security Basics

OS Security Basics

Web Security

Logging In

Role Based Access

Permission Based Access

Figuring Out Security Requirements

Testing

Testing Along the Way

Unit Testing

Integration Testing

Regression Testing

Interacting with DevTest

Working with Teams

The Mythical Man Month

If you haven't read The Mythical Man Month drop everything and read it. Twice. In a nutshell the best way to make a late software project later is to add more people to it. Fred Brooks' (RIP) insights are precious and hard won on his part. It's from the days of mainframe computers in the 70's, but is every bit as applicable today as it was back then. There are many other gems including The Second System Effect and No Silver Bullet.

Interacting with Others

Source Control

Requirement Gathering

You will find that when it comes to defining problems everybody is a software architect. That's especially true of marketing types, sales engineers, etc. They can make your life miserable because they create an architecture that isn't right for the job or doesn't actually solve the customer problem. You must train them to tell you what they want, not how to design it. Customers are not immune to this either so it can be a delicate dance. It's an important one though because you are the one who has access to the internals and its architecture and all of the subtle tradeoffs that entails.

The other problem with this is that customers often don't have a very good sense of what the larger problem they are pointing out is. Often there is a kernel of "this is what I want" that wants to be expanded and rationalized rather than patched and hacked.

Bug Management

Feature Management

Constantly Changing Requirements

Meetings

White Boarding

The Interrupt Stack

Development Process

Different Process Types

Waterfall, Agile

Time Management

Estimating Time

Working at Software Sausage Factories

Working at Startups

What Dev Managers Are For

Requirements and the Debugging Thereof

Deployment

Build Management

Packaging and Release Management

Provision Servers, etc

Monitoring

Scaling Services

Fire Drills

Sunday, April 4, 2021

Quic: the Elephant in the Room


Elephants should be here, not in rooms

[foreword: I revised this several times expanding my thoughts and worked on getting packet sizes and counts correct, but it's quite possible I've made some mistakes in the process.]

I was recently thinking about Quic, the combined TLS and Transport protocol that Google initially designed to streamline session start up and a wish list of other improvements to better target web traffic needs. The motivation is mainly latency and the number of round trips needed to start flowing the underlying HTTP traffic. While Quic certainly does that and is an improvement with the strict layering with TCP, looking at this from an outside perspective (I am no TLS expert), the biggest source of latency at startup seems to be sending the server certificates themselves. I looked at my certs and the full chain pem file (ie, my cert plus the signer's cert) are about 3500 bytes. I tried gzip'ing them and it made some difference, but was still about 2500 bytes all said and done, but TLS doesn't seem to be doing that. So that's a minimum of three MTU sized packets just for the credentials and one MTUish sized packet for the ClientHello. While the cert packets are sent in parallel given the congestion window like TCP, they are still MTU sized packets which have the latency that Google was trying to get rid of. One curious thing I noticed is that Wireshark seemingly said that it was using IP fragmentation which if true is really a Bad Thing. I sure hope that Wireshark got that wrong.

If I understand Quic correctly, basically they got rid of the the TCP handshake and used the TLS handshake instead since it's a 3 way handshake too. So the flow goes sort of like this:

DNS A/AAA Lookup ->
DNS Response A/AAAA <-
ClientHello+Padding ->
ServerHello+QuicHandshake1 (cert) <-
QuicHandshake2 (cert cont) <-
QuicHandshake3 (cert cont) <-
QuickHandshake (finish) ->

So in all the server is sending ~3 MTU sized packets. This is on the assumption they are sending pem which might not be a good assumption as they could be sending the straight binary X.509, but from the looks of it on Wireshark it looks like they're just sending PEM. I'm assuming that the ClientHello is small, but I read that there are issues with reflection attacks so they are relatively large. Assuming that I read that right it's about 1200 bytes for the client and server Hello, so all told 4 ~MTU sized packets and small client finished handshake packet. So in bytes, we have about 1300+1500+1500+1000+100 which is ~5400 bytes.

Getting Rid of Certificates using DNS

What occurs to me is that if they weren't using certificates it could be much more compact. The rule for the reflection attack is that the server should send no more than 3 times the ClientHello packet size. Suppose instead of using certificates we used something like DANE (RFC 6698) or a DKIM (RFC 4871) selector like method:

DNS A/AAAA Lookup ->
[ DNS TLSA Lookup -> ]
DNS A/AAAA Response <-
ClientHello+Padding ->
ServerHello+QuicHandshake <-
[ DNS TLSA Response <- ]
QuicHandshake (finish) ->

The server QuicHandshake would be relatively small depending on whether you fetch the public key from the DNS or just query DNS as to whether, say, a fingerprint of a sent public key is valid (DANE seems to be doing the latter). In either case, the size of the QuicHandshake is going to be quite a bit less than an MTU, say 600 bytes. That means that the ClientHello only needs to be about 200 bytes or so, so it is a medium sized packet. Thus we've reduced the sizes of the packets considerably. That's 3 small packets and two medium ones. In bytes it's like 200+600+100+300+100 which is about 1100 bytes about 5x smaller.

But wait, there's more: DNS is cachable so it's pretty likely that the DNS response is going to be sitting in cache so it becomes 3 smallish messages and ~900 bytes instead which is about 7x smaller and 2 messages less. It also doesn't have any problem with IP fragmentation if that's really what's going on. Plus we're back to the traditional the 3 packet handshake as with TCP. Note that DNSSec requires additional lookups for DNSKEY and DS RR's but many of these will end up in caches, especially for high traffic sites.

Using DNS obviously would in this case require DNSSec to fully reproduce the security properties of certificates but that shouldn't be an impediment. As with the original Quic from Google, Google owns both browsers and servers so it controls whether they come to agreement or not. All they have to do is sign their DNS repository (which I assume they already do) and the browser needs to make certain that the DNS response is signed properly. All of this can happen in user space that is completely under their control.

Update: I moved the DNS TLSA lookup to be speculative after the A/AAAA record lookup if it's not in cache. The client could keep track of the domains that have produced TLSA records in the past as a means to cut down useless speculative lookups. A better solution would be to have the TLSA record "stapled" to the A/AAAA lookup, but I'm not sure what the rules for such things are, and of course it would require buy in from the DNS server to add them to the Additional RRset.

DNS Implications

Using DNS as a trust root is a much more natural way to think about authentication: domains are what we are used to either trusting or not. Certificates created an alternative trust anchor and frankly that trust anchor is pretty self-serving for a whole lot of certificate vendors. It would obviate the need for that side channel trust anchor and get it on the authority of the domain itself directly. Gone would be the need to constantly renew certificates with all of the hassle. Gone would be the need to pay for them. Gone would be the issue of having dozens of certificate roots. Gone would be the risk of one of those roots being compromised. Gone would be a business model that was predicated on 40 year old assumptions of the need for offline verification which is obviously not needed for an online transport layer protocol.

Another implication is wildcards. Certificates have the ability to have wildcards in the name space, so that foo.example.com and bar.example.com can have one certificate with *.example.com. DNS has wildcards too, but whether they would meet the security properties needed is very questionable as I'm pretty sure that there is a lot of agreement that DNS wildcards are messed up to begin with. If they don't, you'd have to enumerate each subdomain's DANE records. I'm willing to bet that DANE addresses this, but haven't seen it specifically in my skim of it.

Another implication is that a lot of clients rely on upstream resolvers which is a thorny issue when authentication is involve. However, my experience is that browsers either implement their own stub resolver, or rely on a OS stub resolver. Given ecommerce, etc, my feeling is that trying to eek out some sort of CPU performance benefit is generally a bad tradeoff and that browsers can and should actually authenticate each transaction before storing it in a local cache. RSA/ECDSA verifies are extremely cheap these days, and besides browsers are already doing those verifies for certificates.

TLS Implications

I am by no means a TLS expert and can barely play one on TV, but my understanding is that TLS allows for naked public keys these days. Update: this is specified in RFC 7250 and uses X.509 syntax, but strips everything out by the public key. I'm not sure how TLS deals with validating the raw public key, but I assume that it just hands it up to the next layer and says "it validates, whether you trust it is now your problem". That takes the DNS/DANE exchange completely out of the hands of TLS so implementers wouldn't need to get buy in from TLS library maintainers.

An Alternative for Certificates

While certificates require 3 packets to transmit, it is not inevitable that they must be sent each time a session is started. A client could in principle send the fingerprint(s) of certificates that it has cached for the domain in the ClientHello and the ServerHello could then reply with the chosen certificate fingerprint if it has possession of its key. That too would cut the exchange down to 3 packets instead of the 5. The downside is that it would require buy in from the TLS community to implement the new protocol extension. Additionally, the ClientHello would still be required to be an MTU'ish sized packet since the client wouldn't necessarily know whether the server supports that extension or not.

Conclusion

I've stressed throughout this that a Google-like company could take this into their own hands and just implement it without buy in from anybody. That was what made Quic possible in the first place since anything else than that is beating up against an ossified and sclerotic industry. Indeed the Certificate Industrial Complex would completely lose their shit as their gravy train is shut down. Given DANE and DKIM, the use of DNS to authorize public keys for use elsewhere is well understood and should be completely safe given DNSSec, and arguably safer given that there are far fewer middle men CA's involved to screw up.

A real life implementation would go a long way to proving how much latency it would cut out because my numbers here are all back of the envelope. It remains to be seen what the actual improvement is. But if it did nothing more than break the back of CA's, that would be an improvement in and of itself. Admittedly, this only changes the startup cost, not the per packet cost which might contribute to some of the gains that Quic sees. Since Quic allows longer lived connections and multiplexing of requests to deal with head of line blocking, it's not clear whether the gains will be significant or not. The business side implications, on the other hand, are clearly significant, though it has to be said that x.509 would need to be supported for a good long time.

Monday, March 8, 2021

Certificates Confuse Everything


Not the solution to everything

I'm fairly certain I had a basic understanding about how certificates for identity worked, though not much about the underlying technology before 1998. But in 1998 all of that had to change really quickly because I opened my mouth about the security problems for the residential voice over IP project I was working on at Cisco and in particular the signaling protocol we were using a called MGCP (nee SGCP). MGCP is a pretty simple command/response protocol where a server tells a home POTS (eg phone) gateway to, say, go off hook, or ring the ringer, etc. Needless to say having some script kiddie being able to ring the ringer or listen in on the microphone would not be ideal. For opening my mouth I got told to solve it. So there I was having to do a crash course on network security and all of its protocols and really how it worked at all.

My group in particular was tasked with creating the residential gateway which was a box that had a couple of POTS ports and was integrated together with either a cable or DSL modem. These needed to be authenticated both ways so that the service providers could prevent rogue gateways getting access to their telephone network. In this case the gateway is the client device in a client/server relationship. Normally clients use passwords but that doesn't seem especially elegant for a box sitting in the corner, though now that I think about it that is exactly what my router does when connecting using PPPoE to my ISP. There was a requirement that the user wouldn't have access to the gateway so that would have made it more difficult, and especially for manufacturers if they had to pre-provision the secret keys.

So it was time to learn about asymmetric keys. Well, rather the first thing to learn about was certificates because that's how they always got couched. Certificates were these magic identity thingies that through some math voodoo allowed the other side of the conversation to know who they were talking to. Once you had a certificate all of that math voodoo became mostly irrelevant so I mostly concentrated on them rather how asymmetric keys actually work. To give an understanding of how clueless I was at the time, I remember asking another engineer whether we could just RSA sign the MGCP packets or something like that. Looking back that seems like a silly question to ask, but as it turns out it was exactly the right question to ask with signing email for DKIM just a few years later.

So everything was in terms of certificates, how to get them onto the box what to do with them once they were there and how this all related to keeping kiddie scripters from ringing my phone in the dead of night. Some of my previous group were working on IPsec so I got up to speed with that and it seemed like a good solution to the crypto needs for our residential gateway security problems. Though TLS (then SSL, I think) was definitely in the air back then, MGCP was a UDP based protocol, and TLS only works on TCP (though now integrated with QUIC). I was persistent on this point in the SIP working group too -- SIP could also be run over UDP -- because I thought IPsec in transport mode was a better choice since it dealt with UDP as well. Instead, others went off and designed DTLS to meet the UDP requirement. Oi. The irony now is that SIP is so bloated that it wouldn't even fit in an MTU packet anymore, so we were both "wrong" in that deprecating UDP would have been the better choice.

Now that we had an underlying crypto mechanism it was back to getting those certs on to the gateway and what were these certs anyway? The general idea was to have a root CA which vouched for approved manufacturers (I was by that time participating with Packetcable, Cablelabs' residential voice standardization project). To me this was still all rather mysterious and something of a black box. I finally started to grok the larger picture when I was in a meeting with Van Jacobson and he said "ah, the enrollment problem". We didn't have a certificate problem, we had an enrollment problem. How do you enroll those devices such that the server knows who is who? That is the basic problem going on, and in the race to solve it with certificates nobody asked why they were needed at all.

That's sort of how it always seems go when people start talking about using certs as if they were some magic incantation and The way you used asymmetric keys. Nobody asked why we needed to bind a name to a public key in the first place. I finally started to understand the underlying math and how IKE worked and especially how RSA signing and encryption worked. Being able to determine who you're talking to doesn't require having a key bound to a name at all. The public key itself is unique and can be used directly as an identifier itself. Certs completely obscure that property. Later when Jim Fenton and I designed IIM which is one of the precursors of DKIM we took advantage of that property and just used the public keys as an identifier itself. It was DK that had a somewhat gratuitous name/key binding in the form of selectors, but it didn't hurt anything and allowed me to have a selector name called "fluffulence".

So why do I like to bag on certificates? Because they confuse getting to the bottom of what you're trying to do. Like I said, since everything is very certificate oriented, nobody asks the obvious question of why do you need a name to key binding? In the Packetcable case I recall us struggling with what exactly the name should be in the cert. That right there says that first principles almost certainly need to be revisited. We didn't have a naming problem, we had an enrollment problem and the name was irrelevant and thus there was no requirement to carry it using an obscure and ossified bag of bits in the form of X.509 and ASN.1. The other part that I dislike about certificates is that they are a business model. It costs nothing to put a public key into the DNS or in some database. It costs enough to support lots of CA vendors' bottoms lines for certificates though. There is one use case that certificates can do that are not easy to reproduce in other ways: offline verification. This was an important use case when they first arrived since expectations of online was a rare beast in the 80's. Today the need for offline verification is niche and the whole world is a connected internet. So we're supporting billions dollar business model for a feature almost nobody uses.

When our residential voip project was going on over 20 years ago it might be somewhat justifiable because a whole lot of us were getting a crash course on network security. However I don't really think much has changed on that front. Everybody proceeds from a cert-first mindset as if it were a given without thinking about what the actual requirements are and then deciding whether a name/key binding is needed at all, and next to determine how that binding is achieved if needed. It's also unfortunate that so many protocols have a built in expectation that certificates must be used, though it's my understanding that TLS and IPsec both allow for naked public use obviating the need for actual valid certificates. I'm not sure if they implement it just by sending a self-signed cert where the server just ignores the CA signature, or whether it truly is a means of just sending a naked public key (the latter would certainly be better since the intent is clear).

In the VoIP/Packetcable use case, client certificates were never needed. The naked public key (or a hash of it) was perfectly serviceable as an identifier for the residential gateway. All that needed to happen was to get it enrolled somehow. There are many ways to do that depending on security requirements. Dispensing with the complex X.509 infrastructure makes the entire problem both easier to administer and much simpler to understand. It should be a dead giveaway for anything that proposes client side certificates to ask why they are needed. In the wild, client side certificates are exceedingly rare, so why is this different?

The reason I decided to write this post is because as of the writing I was having a conversation in which I voiced my dislike of certificates and most especially the certificate-centric view that most people have with authentication with asymmetric keys. I had brought up SSH which along with DKIM are two of the most used tools that use asymmetric keys (TLS being the most), neither of which use or need certificate based identity. Somebody pointed out that SSH allows for client certificates, so I looked it up and it seems that they hacked the protocol to get that to work and that apparently it is used as a replacement for the SSH authorized_keys file on servers which is supposedly better at scale. When I pointed out that it would be easier to just put the SSH public key into the user's profile with a LDAP directory or some such, I got told that it was infinitely easier to create a certificate and put it on the client. Since both have to upload the public key to something, that cancels out. How can putting certificates on a client be easier than to doing nothing at all? Magic, I guess. Or confusion. Lots of confusion.

The moral of this story is to not start with certificates as a given if you are thinking about using asymmetric keys for authentication. That just confuses everything. You need to understand what problems you are trying to solve first and foremost. What are the requirements for authentication? Do those requirements require a key/name binding? Do those requirements need the ability to verify authentication when the verifier is offline? If the answer to both of those is yes, then you should consider using certificates. If the answer to offline is no, then you don't need certificates and it can be designed without them by using naked public keys. Simplicity is always good with security. Certificates are not at all simple and should be used only as necessary.

Sunday, January 24, 2021

Birthing DKIM

Foreward

This is completely from my perspective needless to say. I really wish Mark Delany in particular would write something similar as it's the other half of the equation and his perspective would be really enlightening. DKIM is a remarkable piece of convergent evolution.

IIM

Tasman Drive

In 2004 Cisco just like everybody else was being inundated by spam. With my personal mail server, Spamassassin couldn't keep up with the permutations. Cisco had no visibility or expertise with email but we were heavy users of email so we had an outsiders view that the situation was really bad and didn't seem like it would get better any time soon. So Dave Rossetti assembled myself, Fred Baker, Eliot Lear, Jim Fenton and maybe one other that I'm forgetting to talk about what Cisco could do about the spam problem. The main thing going on at the time was Bayesian filtering, but that was being defeated by image spam. After one of these meetings, I came up with an idea that if the mail servers did nothing more than apply an unanchored digital signature to the mail but with a consistent key, that maybe the Bayesian filters could latch onto that as a signal for spam or ham. I remember talking to Eliot after a meeting telling him my idea, and he was interested as I recall, but dubious that a free floating key would work. Some time after I told Jim too, but he had a better idea: why not anchor the key to a domain? And thus the genesis of of Identified Internet Mail, IIM. I'm fairly certain Jim came up with IIM because if left to me I would have probably tried to make some cutesy tortured acronym ala KINK.

Since we now had a trust anchor (ie, the sending domain) it became obvious that we could possibly also publish a record which said whether the sending domain signed all of their mail or not. If the receiving domain received unverified mail and the sending domain says it signs everything, it would be a forgery in the eyes of the sending domain. Thus the concept of Sender Signing Policy (SSP) was born.

So off we went. Jim was still part of his group, and I was still working for Dave Oran at the time, so we were more or less doing this free-form and under the radar. Jim wrote most of the IIM draft, and I wrote the actual IIM code, telling Jim what the syntax of the header was from my running code, and how I implemented the SSP code. IIM had a concept of a key registration server (KRS) that ran on top of HTTP. For discovery, we used a well-known top level SRV record to find the KRS. We were a little nervous about the overhead from HTTP for fetching the key, but we had a means to allow it to be cached, so we figured it was probably acceptable. We were also really nervous about the overhead of the RSA signing operation. But when I wrote the code using a sendmail milter I quickly found out that the signing overhead was drowned out by the overall processing of the message so it wasn't a problem.

While this was going on we had heard of some exec at another company falling for a spear phishing attack purportedly from another employee. We didn't think our execs were any brighter and security savvy -- and frankly, none of the engineers either since it isn't easy to figure out even if you're looking for it. So with Dave Rossetti we decided that spear phishing was a scary problem for Cisco and decided to create a research group within Cisco which was charged with dealing with this employee-employee spear phishing attack where I was employee #1 (Jim stayed in his group throughout this). We got some coworkers that we had worked with before including one -- Bailey Szeto -- who had close ties to Cisco IT. The object was to create an IIM signing/verifying MTA and insert it into the mail pipeline to sign and verify signatures.

While this was going on, we were starting to reach out and socialize the ideas externally. Our co-worker Dan Wing was good friends with Jon Callas then at PGP Corp so we had him over to talk it over to make certain we weren't crazy. I'm not sure if Jon was impressed or not, but he didn't find anything substantially wrong as I recall, so we weren't going to badly embarrass ourselves going to IETF at least. We were making fast progress on actually implementing IIM internally as well while this was happening, and getting buy in from the IT folks to insert my IIM code into the email pipeline. Finally holding our breath we went live with IIM in the mail pipeline. A little at first then a little more until we were signing and verifying signatures for an entire Fortune 100 company. A company that lives and dies by email, I'll add.

Domain Keys

Tasman Adjacent

We kept our feelers outside of Cisco and eventually found out that right down the street a mile or two away at Yahoo! Mark Delany was working on something called Domain Keys (DK) and had actually deployed it into their mail pipeline. What was remarkable about DK is how similar it was to IIM. He too was working on an internet draft documenting DK. DK also had a signing policy mechanism as I recall, but it was more tentative and maybe aspirational as I recall Mark saying which makes sense from the perspective of a email provider. When we finally became aware of each other we started meeting in larger groups of interested people informally called the Email Signing Technical Group of maybe about a dozen to try to figure out what to do, both with the two I-D's and generally how to standardize something. Barry Leiba was part of that early group who along with Stephen Farrell would go on to be the DKIM working group chairs. Nothing is simple with the IETF world, and it takes time to agree on the color of the sky even on good days, so it's usually the best plan to have quite a bit of buy in and a coherent front for the inevitable push back and vested interests. Mark's DK worked. Ours worked. It was deployed just like ours was. They were both fundamentally doing the same thing.

The IIM draft was first published on June 3, 2004 and DK was published on June 24th, 2004. As I recall we both had live implementations running when we published our drafts. I don't know when Yahoo! started signing it outgoing mail, but I have always assumed it was before us, but who knows (and if you do, let me know and I'll update it).

The Fusion of DKIM

Mark being at Yahoo! was very service provider oriented. Our situation at Cisco being from an enterprise standpoint was more complex where the IIM draft laid out a bunch of the use cases that needed to be supported. It wasn't entirely clear whether they could be supported by DK or not. As I recall, we met with Mark at Cisco to see if we could hammer out a combined spec instead of the usual routine at IETF of having two competing drafts and the pissing matches that ensued. The pissing match was already happening in SPF-land with SenderId. There was a real engineering trade off between using DNS and using HTTPS. Security was easy for HTTPS, much more of stretch for DNS. But DNS lookups are cheap vs. HTTPS and we kept going around on that though neither of us was dogmatic. I liked DK's header syntax better as mine was a little overwrought. The big deal though was whether DK could do the enterprise-y things that we wanted.

After the meeting I thought about it for several days reading the DK draft and comparing it to IIM and its use cases until I convinced myself that it was a product of convergent evolution; DK could just be extended for our needs. I bit the bullet and told our group we should just adopt the DK mechanism and add the things we needed. The lingering concern about HTTP performance was greater than the security concerns of DNS. The irony these days is that DNS over HTTP (DoH) is now a thing so we're back to where we started with IIM: we could have used HTTP from a performance standpoint after all. The other part of basing it off of DK was tactical: Yahoo! was a big fish in the email world where Cisco was a barely hatched fry. That said, I think IIM had it right in the long run. DKIM gets knocked all of the time about DNS and the lack of deployment of DNSSec. While I think that is overblown, you can't argue that setting up TLS on a HTTP server was a well known skill even in those days.

At that time we already had IIM deployed throughout Cisco and were starting to gather some stats for our stated goal of dealing with spear phishing. Part of the problem was identifying the sources of email in the company that were not routed through the Cisco mail pipeline and that was daunting and proved something of an Achilles heel, though not entirely. DMARC's reporting facility would have been very helpful, but of course that requires wide deployment from other domains, and we didn't even have a merged protocol yet. Our main problem was with external mailing lists of which we were painfully aware because that's how IETF did its business. I wrote a bunch of heuristics to recover signatures that went through mailing lists to see if it could be validated. I got tantalizingly close with about 90% recovery, but we had a lot of unsigned email from other sources so we couldn't take action.

Where Eric Allman, Mark Delany, Jim Fenton, Jon Callas, Miles Libbey, and I hammered out DKIM at my place in San Francisco

The combined spec was coming together. Eric Allman was given the editor's pen for the combined spec that was hammered out in my dining room in San Francisco with all of the named authors in attendence. When enough of DKIM was cobbled together I got to work converting IIM into DKIM with my implementation. I had found out that Murray Kucherawy at Sendmail had a DK implementation written as a milter as well (it was never clear to me if that's what Yahoo! was using. Edit: Mark says it was Murray's milter). So the race was on. I got done enough that I sent Murray email (signed!) telling him I was ready to interop. Murray was right behind me and the next day we started to debug our implementations. Murray was at a big advantage because the protocol looked on the outside a lot like DK. Our main interop issue was me getting the NFWS body canonicalization correct as I recall. Beyond that I think we had interop possibly that day, but certainly within a few days.

As it turns out, that was a theme with lots of implementations to follow, and most importantly lots of interest across the industry. The next step was to take the combined DKIM draft to IETF. As I mentioned IETF is a painful process, and getting a working group spun up is always extremely difficult because everybody and their brother gets their $.02 worth in. DKIM had the advantage that it was a fully formed spec with a lot of vetting at that point from a lot of eyeballs as well as implementations. If I recall correctly it was at the Paris IETF in 2005 where we had our debutante's ball. There was a lot of sound and fury from the usual attack poodles. Jim got saddled with writing a threats informational RFC, much of it written sitting on the floor in the halls of the Paris IETF venue as I recall. The one thing I do recall out of all of the sound and fury was that Harald Alvestrand (then IETF chair) stood up saying this entire process was ridiculous and should just proceed. Thanks Harald!

I don't recall whether we actually were chartered in Paris, but do remember filling up a friend's tiny restaurant with an assortment of IETF folks with the wonderful food coming from her postage stamp kitchen including a chocolate mousse with cayenne. Everybody loved it. So anyway the working group was chartered, the threats draft was published and work began on what was already a pretty mature draft with a growing number of interoperable implementations. Probably the single biggest change to the original draft was the message body canonicalization. NFWS turned into "relaxed" for reasons I don't really recall. Relaxed seemed better, but not that much better and required us to re-interop. Oh well, something had to change. We did eventually have an in-person interop with probably 20 different implementations hosted by the affable Arvel Hathcock at Altn (now MDaemon) in Dallas. We were treated to a Brazilian restaurant where prodigious amounts of meat was consumed.

So at this point DKIM was pretty well set and would go on to become a proposed standard RFC 4871 in mid 2007. Believe it or not, that was a good speed for IETF process, but we did have the advantage of an interoperable spec without any competing specs, or in IETF parlance the rough consensus and running code were there before the working group was formed. On the home front I continued to do experiments as we tried with increasing frustration to find all our sources of email.

SSP/ADSP

Early on I believe after the working group formed, it was decided to split DKIM and SSP apart. That's a fine decision in retrospect -- they are two different on the wire protocols. But SSP elicited shall we say fervor from people who disliked it. It still seems to elicit similar fervor in its DMARC instantiation which makes me wonder why the people who dislike it participate at all. But there was a lot of resistance to SSP suffice it to say. It was at some point renamed ADSP for reasons lost to me, but for all of the bickering it remained pretty much the same SSP with some tag wordsmithing I assume so as to justify the name change. One of the authors was even in the resistance crowd which again makes you wonder why you'd work on something you don't support. To this day, DMARC which is yet another bite at the apple is fundamentally the same (modulo the reports) as ADSP. It also added support for SPF to be used as a policy check too along with DKIM. As for DMARC, I really don't know why they went off to reinvent ADSP instead of just extending it, but it's possible that the shall we say the fervent poisoned the well too much. One of them even wrote an article for a tech rag against its existence after participating -- mainly delaying -- in its production. Finally RFC 5617 was made a proposed standard in mid 2009.

That's All Folks

At Cisco we had deployed DKIM into the mail pipeline, but we were also working on a more ambitious project that could take multiple protocols and apply security to the various streams instead of just email. I was most intrigued with SIP because SIP has a lot of the same issues that email does, the inter-domain problem being the biggest. Since I had previously been working on VoIP stuff before DKIM, I still kept tabs on what was going on with SIP. SIP was at that time creating what was called the P-Asserted-Identity header which supposedly told you what the caller-id was being asserted. I was a regular Casandra shouting at the top of my lungs that their assertion that voice will be an old-boys-network just like the old days was wrong and this was going to backfire on them since there was no authentication mechanism. I even hacked up a SIP stack and started DKIM-signing SIP INVITES to prove it could be done with probably little or maybe no changes with DKIM. More later.

Cisco had decided a ways back that it was in fact interested in getting into the email security business. I did due-diligence for a number of companies including Ironport which was eventually chosen and integrated making our prototyping work redundant (they even participated at the Altn interop too). Both Jim and I had figured that we'd just move over to Ironport. Apparently we were too "Old Cisco" and both of us were rejected with myself at least labeled completely unqualified to write code or something like that. We just won you the fucking startup lottery, assholes. Thanks a fucking lot you ingrates. Have I mentioned how much I dislike puffed up egos?

Epilog

Off to Ski

My group (one that I was responsible for forming) had decided to go off on some wacky scheme with Skype which I had absolutely no input on and absolutely no interest. As I was looking around for something new to do at Cisco, I was also fascinated by having taken my Garmin GPS to Kirkwood skiing and dumping all of my points into Google Maps. I was completely fascinated by this with all of the possibilities of finding your friends on the mountain, seeing how fast you were going ("It can tell me how fast I'm going?!") and gaming with your friends. Since I didn't find anything interesting at Cisco and was bored, I left to go ski for 5 years in August 2008. Two months later Android came out and my adventure into phone apps began.

DKIM to STD

While I was off skiing the DKIM working group kept bumping along. While most RFC's stay at the proposed standard level, there is a complicated process to move it from proposed to draft standard and then to a full internet standard. By the time I decided to ski for a living, I had had it with the petty politics and stopped paying attention to the working group altogether and unsubscribed from the mailing list. I have no idea what happened in the intervening 3 years but in 2011 DKIM became STD 76. 76 makes it clear that there are not many protocols that make it to full standard. By the time I left, DKIM was already widely deployed at the major email providers with billions of pieces of email signed and verified every day.

One of the interesting things that came out of DKIM is that it implements a public key infrastructure (PKI) and is probably the second largest PKI next only to HTTPS/TLS. What I particularly like is that it shows that it is not inevitable that a PKI needs to use certificates. In fact DKIM shows that X.509 is particularly dated and unnecessarily complex with its CA's, ASN.1, blah blah blah. TLS is water under the bridge at this point, of course, but there seems to be some magical thinking that if you use asymmetric keys that certificates are required. DKIM proves that is emphatically wrong and that it can be just as simple as publishing the key/name in DNS or pointing to a web server to fetch it.

Ah SIP, My Old Friend

As a sad case in point I submit to you STIR/SHAKEN (RFC 8226). While my main beef with STIR is that it solves the wrong problem -- they are trying to determine whether somebody is allowed to assert a given telephone number rather than just hold the senders accountable as DKIM did. They also clung to the X.509 world made which made it much less comprehensible in the process. On top of that there are many classes of deployments that STIR can't address at all. The RFC was published in 2018, 10 years after I had shown that they could just reuse DKIM. STIR is so rife with errors and under-specification that I had to stop writing a blog post about it. If it flops -- and there is a good chance it may -- there is always the DKIM route, which also has the benefit that it also solves for the non-bellheaded use cases in the From address.

ARC, WTF?

I had vaguely heard that a set of people created a standard which was a successor to ADSP which at some point was brought to IETF as an information RFC. I looked it over more carefully and it seems to unify policy with SPF which is fine -- we didn't care about SPF at the time because they had their own policy mechanism so why pick needless fights? It also has a reporting mechanism for when signatures fail, etc which is in reality a completely different protocol than ADSP and has no advantage of being tied to the ADSP policy mechanism.

That said, I happened to see looking at the headers of a message a weird DKIM-like signature called ARC-Signature along with ARC-Seal and ARC-Authentication-Results. I joined the DMARC working group trying figure out what this was about. There were a lot of fresh new to me faces on the working group, but also a lot of people who should have known better that ARC brings nothing new to the table than plain old DKIM. The main premise is that ARC is supposed to solve the mailing list traversal problem, or more generally intermediates who invalidate the originating DKIM signature. There is definitely a lot of magical thinking because when pressed on the issue when asked how ARC will do what DKIM supposedly can't is that it depends on the receiving domain to trust the ARC-Signature's domain. Doh. Uh, folks that resolves to a previously unsolved problem because intermediaries DKIM sign all of the time these days and there is absolutely nothing stopping a receiving domain from trusting that domain for the past dozen years. I really can't understand how the IESG let this happen because it is really ill conceived, though it is just an (failed, imo) experimental RFC at least. Through the process, however, I have come to the conclusion we should just ignore the mailing list traversal problem and set p=reject and let the chips fall where they may. For the vast majority of domains it is unlikely to ever be a problem. I wrote a post here about why.

Conclusion

DKIM is definitely one of the biggest achievements of my life and I'm very proud of it. Starting from a kooky idea about feeding Bayesian filters, working up a fully fleshed out implementation and internet draft, finding convergent evolution just down the street and marrying them off instead of a protracted pissing match to a full internet standard 76. What a trip!

I recently came across a really interesting study about DKIM, SPF and DMARC showing what the effects they have had: TL;DR not a silver bullet -- nothing is with spam -- but it's having a noticeable effect on the problem. It's an interesting if long read but worthwhile if you're into email security.