Tuesday, July 17, 2012

Asymmetric Keying -- after implementation

I wrote a straw proposal for how to use asymmetric keying and while the idea is a little frightening from the localStorage perspective  I still don't see it as a deal breaker. At least for now. And at least for the types of sites that I'm interested in thinking about which are Phresheez-like sites. That is, sites that  people would ordinarily use low to medium value passwords. Since then, I've shopped the idea around and aside from the expected asshats who dismiss anything unless you're part of their tribe, I've actually been encouraged that this isn't entirely crazy. So I went ahead and implemented it about 3 weeks ago over a weekend.

The main difficulty was frankly getting a crypto library together that had all of the needed pieces. The quality of the crypto library is certainly not above reproach, but that's fundamentally a debugging problem -- all of the Bignum, ASN.1 and other bits and pieces are well specified and while you might think that writing them in javascript is bizarre, it's just another language at the end of the day. Random number generation -- a constant problem in crypto -- is still a problem. But if this proves popular enough, there's nothing stopping browser vendors to expose things like various openssl library functions up through a crypto object in javascript too, so PRNG can be seeded with /dev/urandom, say. So I'm just going to ignore those objections for now because they have straightforward longer term solutions.

In any case, there seems to be a library that lots of folks are using for bignum support from a guy at Stanford. There's also a javascript RSA project on sourceforge, though it lacks support for signing. I found another package that extends the RSA package to do signatures from Kenji Urushima. His library was missing a few bits and pieces, but I finally managed to cobble them all together. So I now have something to sign a blob, as I outlined in the strawman post, something to do keygen, something to extract a naked public key. Now all I need to do is sit down and write it.

Signed Login

It was surprisingly easy. I decided that the canonicalization that I'd use is just to sign the URL itself in its encoded format. That is, after you run all the parameters through encodeURIComponent (). I extended the library's RSAKey object to add a new method signURL:


RSAKey.prototype.signURL = function (url) {
    url += '&pubkey='+encodeURIComponent (this.nakedPEM ());
    url += '&curtime='+new Date ().getTime ();
    url +='&signature=' + encodeURIComponent (hex2b64 (this.signString(url, 'sha1')));
    return url;
};

which all it does is appends a standard set of url parameters to query. It doesn't matter if it's a GET or POST, so long as it's www-form-urlencoded if it's a POST. The actual names of the url parameters do not need to be standardized: they just need to be agreed upon between the client javascript and the server, of which a website controls both. Likewise, there need be no standardized location: I just use:

http://phresheez.com/site/login.php?mode=ajax&uname=Mike&[add the standard sig params here]

My current implementation requires signature to be the last parameter, but that's only because I was lazy and didn't feel like making it position independent. That's just an implementation detail though, and should probably be fixed. I should note that nakedPEM returns a base64 encoding of the ASN.1 DER encoding of the public key exponent and modulus, but my minus the ----begin/end public key---- stuff and stripped of linebreaks.

On the server side, I'm just using the standard off the shelf version of openssl functions in PHP to do the openssl_verify after doing an openssl_get_publickey with the supplied public key in the url itself. Note I've signed the pubkey and the curtime. In my implementation, I'm only signing the script portion of the full url, and not the scheme/host/port. That's mostly an artifact of PHP (what isn't?) not having the full url available in a $_SERVER variable. This is most likely wrong, but I'm mostly after proof of concept here, not the last word on cryptography -- if this is an overall sound idea, they'll extract their pound of flesh.

So the server can verify the blob now, as well as create new accounts in much the same manner (you just add an email address). I've implemented a "remember me" feature which doesn't store the keys after keygen if it's not checked, and removes keys if they are available.

Enrolling New Devices

On the enroll from a new device front, the logic is pretty simple: if you type in a user name, and there isn't a key for it in localStorage, it pings the server and asks for it to set up a new temporary password for you to enroll the new browser. This could be the source of a reflection attack, so server implementations should be careful to rate limit such replies. The server just sends mail to the registered account at that point with instructions. The mail has both the temporary password, as well as a URL to complete the login. The first is in case you're reading the email from a different device than the new browser device. The second is the more normal case where you click through to complete the enrollment like a lot of mailing list use. I should note that at first my random password generator was using a very large alphabet. Bad idea: typing in complicated stuff on a phone is Not Fun At All. Keep it simple, even if it needs to be a little longer. This is just an OTP after all. Once the temporary password is entered, it's just appended to the login URL and signed as usual.

Replay

I mentioned in original post on the subject that replay was obviously a concern. For the time being, I've kept this pretty simple in that it has the expectation of synchronization between the browser's and server's clocks. The javascript client just puts the current system time into the URL, and the server side vets it against its system time. Like Kerberos, I also keep a replay cache for within the timeout window. This is done using a mysql table that is keyed off of the signature. If the signature is in the replay table, it gets rejected. If the timestamp in the signature is later than, say, an hour it gets rejected. I haven't quite figured what to do about timestamps in the future, I believe they just need to be ± 1 hour or they get rejected. That said, I do have some concerns about synchronization. It's a very NTP world these days, but with the tracking aspect of Phresheez I've seen some very bizarre timestamps. Like, as in, years in the future to just an hour or so. I can't really be certain if these are GPS subsystem related problems (probably), or something wrong with the system time. It's a reason to be cautious about a timestamp related scheme, and if needed a nonce-based scheme could be introduced. I'm not too worried about that as the crypto-pound-of-flesh folks will surely chime in, and this is definitely not new ground.


Sessions

I should note that I am not doing anything different on the session front. I'm still using the standard PHP session_start () which inserts a session cookie into the output to the browser, and logout just nukes the session cookie with session_destroy (). Nothing changes here. If you hate session cookies because of hijacking, you'll hate what I've done here too. If it bothers you enough, use TLS. It's not the point of this exercise . 

I should note that an interesting extension of this is that you don't actually need sessions per se with this mechanism if you are willing to sign each outgoing URL. While this not very efficient and is quite cumbersome for markup, it is possible and may be useful in some situations where you have a more casual relationship with a site.

Todo

A lot of this mechanism relies on the notion that the supplied email address is a good one when a user joins. That happens to be an OK assumption with Android based joins since you can get their gmail address which is known good, but other platforms not so much. So for joins, I might want to require a two step confirm-by-email step. This sort of step decreases usability though. It may be ok to require the email confirmation, but not require it right away though -- enable the account right away, but put the account in purgatory if it hasn't been confirmed after a day or two of nagging. I still haven't worked that out. There isn't much new ground here either though as anything that requires an operational/owned email address confronts the same usability issues, and it's honestly an issue with the way Phresheez currently does passwords now.

I haven't tackled much on the shared device kinds of problems either. Those include being able revoke keys both locally as well as the stored public keys on the server (two separate problems!). Nor have I implemented local passwords to secure the localStorage credentials. They should be part of the solution, but I just haven't got around to it.  It's mostly coming up with the right UI which is never easy.

Oddities

One thing I hadn't quite groked initially about the same-origin policy is that subdomains do not have access to the same localStorage as parent domains do. It's a strict equality test. For larger/more complex sites this could be an issue that requires enrollment into subdomains with the requisite boos from users. One way to get around this is to use session cookies as they can be used across subdomains. That is, you login using a single well-known domain, and then use session cookies to navigate around your site.

Performance

Actually performance was surprisingly good. I have an ancient version of Chrome on my linux box, and keygen -- the most intensive part of the whole scheme -- is generally less than a couple of seconds. I was quite surprised that on my Android N1 (very old these days), it was quite similar. Firefox seemed to take longest which is sort of surprising as it's up to date so it's js engine should be better. I haven't tried it on a newer IE yet which is likely to suck given how it usually just sucks. For signing, there is absolutely no perceptible lag. Again, it would be better on many fronts to have the browser expose openssl, say, to javascript, but the point here is that it's not even that bad with current-ish javascript engines.

Conclusion

It really only took me a couple of days to code this all up, and the better part of that was just getting the javascript libraries together. Once I had that, setting up the credential storage in localStorage, modifying the login UI, and signing URL's on the client side was quite easy. Likewise, setting up the list of public keys table per UID was simple, and the replay cache code wasn't particularly difficult either. My test bed has been running it for several weeks now, and it seems to work. I still have concerns about UX.

Another interesting thing that's happened is that the LinkedIn debacle has obviously hit a nerve with a lot of us. I found out that there's been an ongoing discussion on SAAG about this, and that Stephen Farrell and Paul Hoffman have a sketch of a draft they call HOBA which is trying to do this at the HTTP layer using a new Authorization: method. I've been talking to Stephen about this, and we're mutually encouraging each other. I hope that their involvement will bring some grounding to this problem space from the absolutist neigh sayers who like to inhabit security venues and throw darts at the passers-by. If anything happens here to back out of this Chinese finger trap we call password authentication, we've done some good.

Saturday, July 7, 2012

ASN.1 considered harmful

So I've recently spent a bunch of time playing with Javascript crypto libraries. There are Javascript crypto libraries you ask? Well yes, such as they are. The one that seems most complete for my purposes is jsrsasign, but it's still missing things that I needed, so I had to scrounge the net to cobble then together. It itself is a frankenlib cobbled together from various sources and extended as the author needed.

The one thing that the library didn't have was a PEM public key decoder method. PEM is a base64 encoded  ASN.1 DER (dynamic encoding rules) encoding of the public key's exponent and modulus. That is, two numbers. It also has some meta information about the keys, but I've never had a need to find out what's in there so I can't tell you what it is. The final point is actually symptomatic of why I have such incuriosity: it's part of the ASN.1 train wreck.

So I decided that this can't be too hard so I'll code it up myself. There was an example in the code which decodes the RSA private key from its PEM format, so how hard could this be? Very hard, as it turns out. Ridiculously hard. It's just two fucking numbers. Why is this so hard? If this were JSON encoded, it would have taken 10 seconds tops to write routines which encode and decode those two numbers. In another 15 seconds, I could have written the encoder for the private key too -- hey, it's got several more fields and it takes time to type. With ASN.1 DER encoding? It took literally 2 days of futzing with it. And that's just the decoding of the PEM public key. Had I needed to encode them as well, it would have been even longer.

What's particularly bad about this is that I've actually had experience with SNMP in the ancient past, so both knew about ASN.1, and have coded mibs etc which requires BER (basic encoding rules). Yes, ASN.1 has not just one encoding, not two, but at least 3 different encodings -- the last being PER (packed encoding rules). All binary. All utterly opaque to the uninitiated. Heck, I'll say that they're all utterly opaque to the initiated too.

So why am I ranting about ASN.1? After all, once the library is created nobody will have to deal with the ugliness. But that misses the point in two different areas. As a programmer, there's lots of stuff that's abstracted away so that you don't have to deal with the nitty-gritty details. That's goodness to a point: if you're using something regularly, it's good to at least be somewhat familiar what's under the hood even if you have no reason to tinker with it. In this case, I really had no idea what was in a PEM formatted public key file, and I've dealt with them for years. It's just two fucking numbers. In the abstract I knew that but ASN.1's opaqueness took away all curiosity to actually understand it, and when I saw methods that actually required the two parameters of modulus and exponent, I'd get all panicky since I didn't really know how they related to the magic openssl_get_publickey which decodes the PEM file for you. Seriously. How stupid is that? Had it been a json or XML blob I'd have been able at once to recognize that there's nothing to be afraid of. But it took actually finding an ASN.1 dumper and looking at the contents to realize how silly this was.

That gets to my next point: after I found out that was really just two fucking numbers, it still took me two days to finally slay the PEM public key decoding problem. The problem here is innovation. Maybe there are masochists who would prefer spending time encoding and decoding ASN.1 and they are entitled to their kink, but the vast majority of us want nothing to do with it. Even if there were good ASN.1 encoding and decoding tools -- which there are not in the free software world -- I'd still have to go to the trouble writing things up in their textual language and run it through an ASN.1 compiler. Ugh. It's not just javascript that has this problem, it's everything. ASN.1 lost. It's not supported. It makes people avoid it at almost all costs. That hampers innovation because if you want to add even one field to a structure, you're most likely breaking all kinds of software. Or at least that's what any sane programmer should assume: most of this ASN.1 code is purpose built, and not generalized so you should be very scared that you'll break vast quantities if you added something. End result: stifled innovation.

I write this because I think that a lot of the problems with getting people to understand crypto is tied up with needless distractions (the other is that certificate PKI != public key cryptography, but that's a rant for another day). Crypto libraries are hard to understand generally because let's face it, crypto is hard. But crypto libraries/standards use of ASN.1 makes things much, much more difficult to understand especially when all you're talking about is two fucking numbers. It's all lot of what the problem is in my opinion, and it's a real shame that there doesn't seem any practical way out of this predicament.