Saturday, July 7, 2012

ASN.1 considered harmful

So I've recently spent a bunch of time playing with Javascript crypto libraries. There are Javascript crypto libraries you ask? Well yes, such as they are. The one that seems most complete for my purposes is jsrsasign, but it's still missing things that I needed, so I had to scrounge the net to cobble then together. It itself is a frankenlib cobbled together from various sources and extended as the author needed.

The one thing that the library didn't have was a PEM public key decoder method. PEM is a base64 encoded  ASN.1 DER (dynamic encoding rules) encoding of the public key's exponent and modulus. That is, two numbers. It also has some meta information about the keys, but I've never had a need to find out what's in there so I can't tell you what it is. The final point is actually symptomatic of why I have such incuriosity: it's part of the ASN.1 train wreck.

So I decided that this can't be too hard so I'll code it up myself. There was an example in the code which decodes the RSA private key from its PEM format, so how hard could this be? Very hard, as it turns out. Ridiculously hard. It's just two fucking numbers. Why is this so hard? If this were JSON encoded, it would have taken 10 seconds tops to write routines which encode and decode those two numbers. In another 15 seconds, I could have written the encoder for the private key too -- hey, it's got several more fields and it takes time to type. With ASN.1 DER encoding? It took literally 2 days of futzing with it. And that's just the decoding of the PEM public key. Had I needed to encode them as well, it would have been even longer.

What's particularly bad about this is that I've actually had experience with SNMP in the ancient past, so both knew about ASN.1, and have coded mibs etc which requires BER (basic encoding rules). Yes, ASN.1 has not just one encoding, not two, but at least 3 different encodings -- the last being PER (packed encoding rules). All binary. All utterly opaque to the uninitiated. Heck, I'll say that they're all utterly opaque to the initiated too.

So why am I ranting about ASN.1? After all, once the library is created nobody will have to deal with the ugliness. But that misses the point in two different areas. As a programmer, there's lots of stuff that's abstracted away so that you don't have to deal with the nitty-gritty details. That's goodness to a point: if you're using something regularly, it's good to at least be somewhat familiar what's under the hood even if you have no reason to tinker with it. In this case, I really had no idea what was in a PEM formatted public key file, and I've dealt with them for years. It's just two fucking numbers. In the abstract I knew that but ASN.1's opaqueness took away all curiosity to actually understand it, and when I saw methods that actually required the two parameters of modulus and exponent, I'd get all panicky since I didn't really know how they related to the magic openssl_get_publickey which decodes the PEM file for you. Seriously. How stupid is that? Had it been a json or XML blob I'd have been able at once to recognize that there's nothing to be afraid of. But it took actually finding an ASN.1 dumper and looking at the contents to realize how silly this was.

That gets to my next point: after I found out that was really just two fucking numbers, it still took me two days to finally slay the PEM public key decoding problem. The problem here is innovation. Maybe there are masochists who would prefer spending time encoding and decoding ASN.1 and they are entitled to their kink, but the vast majority of us want nothing to do with it. Even if there were good ASN.1 encoding and decoding tools -- which there are not in the free software world -- I'd still have to go to the trouble writing things up in their textual language and run it through an ASN.1 compiler. Ugh. It's not just javascript that has this problem, it's everything. ASN.1 lost. It's not supported. It makes people avoid it at almost all costs. That hampers innovation because if you want to add even one field to a structure, you're most likely breaking all kinds of software. Or at least that's what any sane programmer should assume: most of this ASN.1 code is purpose built, and not generalized so you should be very scared that you'll break vast quantities if you added something. End result: stifled innovation.

I write this because I think that a lot of the problems with getting people to understand crypto is tied up with needless distractions (the other is that certificate PKI != public key cryptography, but that's a rant for another day). Crypto libraries are hard to understand generally because let's face it, crypto is hard. But crypto libraries/standards use of ASN.1 makes things much, much more difficult to understand especially when all you're talking about is two fucking numbers. It's all lot of what the problem is in my opinion, and it's a real shame that there doesn't seem any practical way out of this predicament.




2 comments:

  1. The real question is why you thought writing your own ASN.1 parser for one athoc case was a good idea? It's like me trying to write a json parser to read two integers it doesn't really make sense.

    You should instead find or write a good general purpose ASN.1 library.

    ReplyDelete
  2. Yeah, just what I want: load 500kb worth of asn.1 bloatware library. And in this case, I inherited somebody else's asn.1 parser. if it were json instead, it would have taken 2 seconds rather than 2 days.

    ReplyDelete