Rip Van Webble: 2012

Tuesday, July 17, 2012

Asymmetric Keying -- after implementation

I wrote a straw proposal for how to use asymmetric keying and while the idea is a little frightening from the localStorage perspective I still don't see it as a deal breaker. At least for now. And at least for the types of sites that I'm interested in thinking about which are Phresheez-like sites. That is, sites that people would ordinarily use low to medium value passwords. Since then, I've shopped the idea around and aside from the expected asshats who dismiss anything unless you're part of their tribe, I've actually been encouraged that this isn't entirely crazy. So I went ahead and implemented it about 3 weeks ago over a weekend.

The main difficulty was frankly getting a crypto library together that had all of the needed pieces. The quality of the crypto library is certainly not above reproach, but that's fundamentally a debugging problem -- all of the Bignum, ASN.1 and other bits and pieces are well specified and while you might think that writing them in javascript is bizarre, it's just another language at the end of the day. Random number generation -- a constant problem in crypto -- is still a problem. But if this proves popular enough, there's nothing stopping browser vendors to expose things like various openssl library functions up through a crypto object in javascript too, so PRNG can be seeded with /dev/urandom, say. So I'm just going to ignore those objections for now because they have straightforward longer term solutions.

In any case, there seems to be a library that lots of folks are using for bignum support from a guy at Stanford. There's also a javascript RSA project on sourceforge, though it lacks support for signing. I found another package that extends the RSA package to do signatures from Kenji Urushima. His library was missing a few bits and pieces, but I finally managed to cobble them all together. So I now have something to sign a blob, as I outlined in the strawman post, something to do keygen, something to extract a naked public key. Now all I need to do is sit down and write it.

Signed Login

It was surprisingly easy. I decided that the canonicalization that I'd use is just to sign the URL itself in its encoded format. That is, after you run all the parameters through encodeURIComponent (). I extended the library's RSAKey object to add a new method signURL:

RSAKey.prototype.signURL = function (url) {
   url += '&pubkey='+encodeURIComponent (this.nakedPEM ());
   url += '&curtime='+new Date ().getTime ();
   url +='&signature=' + encodeURIComponent (hex2b64 (this.signString(url, 'sha1')));
   return url;
};

which all it does is appends a standard set of url parameters to query. It doesn't matter if it's a GET or POST, so long as it's www-form-urlencoded if it's a POST. The actual names of the url parameters do not need to be standardized: they just need to be agreed upon between the client javascript and the server, of which a website controls both. Likewise, there need be no standardized location: I just use:

http://phresheez.com/site/login.php?mode=ajax&uname=Mike&[add the standard sig params here]

My current implementation requires signature to be the last parameter, but that's only because I was lazy and didn't feel like making it position independent. That's just an implementation detail though, and should probably be fixed. I should note that nakedPEM returns a base64 encoding of the ASN.1 DER encoding of the public key exponent and modulus, but my minus the ----begin/end public key---- stuff and stripped of linebreaks.

On the server side, I'm just using the standard off the shelf version of openssl functions in PHP to do the openssl_verify after doing an openssl_get_publickey with the supplied public key in the url itself. Note I've signed the pubkey and the curtime. In my implementation, I'm only signing the script portion of the full url, and not the scheme/host/port. That's mostly an artifact of PHP (what isn't?) not having the full url available in a $_SERVER variable. This is most likely wrong, but I'm mostly after proof of concept here, not the last word on cryptography -- if this is an overall sound idea, they'll extract their pound of flesh.

So the server can verify the blob now, as well as create new accounts in much the same manner (you just add an email address). I've implemented a "remember me" feature which doesn't store the keys after keygen if it's not checked, and removes keys if they are available.

Enrolling New Devices

On the enroll from a new device front, the logic is pretty simple: if you type in a user name, and there isn't a key for it in localStorage, it pings the server and asks for it to set up a new temporary password for you to enroll the new browser. This could be the source of a reflection attack, so server implementations should be careful to rate limit such replies. The server just sends mail to the registered account at that point with instructions. The mail has both the temporary password, as well as a URL to complete the login. The first is in case you're reading the email from a different device than the new browser device. The second is the more normal case where you click through to complete the enrollment like a lot of mailing list use. I should note that at first my random password generator was using a very large alphabet. Bad idea: typing in complicated stuff on a phone is Not Fun At All. Keep it simple, even if it needs to be a little longer. This is just an OTP after all. Once the temporary password is entered, it's just appended to the login URL and signed as usual.

Replay

I mentioned in original post on the subject that replay was obviously a concern. For the time being, I've kept this pretty simple in that it has the expectation of synchronization between the browser's and server's clocks. The javascript client just puts the current system time into the URL, and the server side vets it against its system time. Like Kerberos, I also keep a replay cache for within the timeout window. This is done using a mysql table that is keyed off of the signature. If the signature is in the replay table, it gets rejected. If the timestamp in the signature is later than, say, an hour it gets rejected. I haven't quite figured what to do about timestamps in the future, I believe they just need to be ± 1 hour or they get rejected. That said, I do have some concerns about synchronization. It's a very NTP world these days, but with the tracking aspect of Phresheez I've seen some very bizarre timestamps. Like, as in, years in the future to just an hour or so. I can't really be certain if these are GPS subsystem related problems (probably), or something wrong with the system time. It's a reason to be cautious about a timestamp related scheme, and if needed a nonce-based scheme could be introduced. I'm not too worried about that as the crypto-pound-of-flesh folks will surely chime in, and this is definitely not new ground.

Sessions

I should note that I am not doing anything different on the session front. I'm still using the standard PHP session_start () which inserts a session cookie into the output to the browser, and logout just nukes the session cookie with session_destroy (). Nothing changes here. If you hate session cookies because of hijacking, you'll hate what I've done here too. If it bothers you enough, use TLS. It's not the point of this exercise .

I should note that an interesting extension of this is that you don't actually need sessions per se with this mechanism if you are willing to sign each outgoing URL. While this not very efficient and is quite cumbersome for markup, it is possible and may be useful in some situations where you have a more casual relationship with a site.

Todo

A lot of this mechanism relies on the notion that the supplied email address is a good one when a user joins. That happens to be an OK assumption with Android based joins since you can get their gmail address which is known good, but other platforms not so much. So for joins, I might want to require a two step confirm-by-email step. This sort of step decreases usability though. It may be ok to require the email confirmation, but not require it right away though -- enable the account right away, but put the account in purgatory if it hasn't been confirmed after a day or two of nagging. I still haven't worked that out. There isn't much new ground here either though as anything that requires an operational/owned email address confronts the same usability issues, and it's honestly an issue with the way Phresheez currently does passwords now.

I haven't tackled much on the shared device kinds of problems either. Those include being able revoke keys both locally as well as the stored public keys on the server (two separate problems!). Nor have I implemented local passwords to secure the localStorage credentials. They should be part of the solution, but I just haven't got around to it. It's mostly coming up with the right UI which is never easy.

Oddities

One thing I hadn't quite groked initially about the same-origin policy is that subdomains do not have access to the same localStorage as parent domains do. It's a strict equality test. For larger/more complex sites this could be an issue that requires enrollment into subdomains with the requisite boos from users. One way to get around this is to use session cookies as they can be used across subdomains. That is, you login using a single well-known domain, and then use session cookies to navigate around your site.

Performance

Actually performance was surprisingly good. I have an ancient version of Chrome on my linux box, and keygen -- the most intensive part of the whole scheme -- is generally less than a couple of seconds. I was quite surprised that on my Android N1 (very old these days), it was quite similar. Firefox seemed to take longest which is sort of surprising as it's up to date so it's js engine should be better. I haven't tried it on a newer IE yet which is likely to suck given how it usually just sucks. For signing, there is absolutely no perceptible lag. Again, it would be better on many fronts to have the browser expose openssl, say, to javascript, but the point here is that it's not even that bad with current-ish javascript engines.

Conclusion

It really only took me a couple of days to code this all up, and the better part of that was just getting the javascript libraries together. Once I had that, setting up the credential storage in localStorage, modifying the login UI, and signing URL's on the client side was quite easy. Likewise, setting up the list of public keys table per UID was simple, and the replay cache code wasn't particularly difficult either. My test bed has been running it for several weeks now, and it seems to work. I still have concerns about UX.

Another interesting thing that's happened is that the LinkedIn debacle has obviously hit a nerve with a lot of us. I found out that there's been an ongoing discussion on SAAG about this, and that Stephen Farrell and Paul Hoffman have a sketch of a draft they call HOBA which is trying to do this at the HTTP layer using a new Authorization: method. I've been talking to Stephen about this, and we're mutually encouraging each other. I hope that their involvement will bring some grounding to this problem space from the absolutist neigh sayers who like to inhabit security venues and throw darts at the passers-by. If anything happens here to back out of this Chinese finger trap we call password authentication, we've done some good.

Saturday, July 7, 2012

ASN.1 considered harmful

So I've recently spent a bunch of time playing with Javascript crypto libraries. There are Javascript crypto libraries you ask? Well yes, such as they are. The one that seems most complete for my purposes is jsrsasign, but it's still missing things that I needed, so I had to scrounge the net to cobble then together. It itself is a frankenlib cobbled together from various sources and extended as the author needed.

The one thing that the library didn't have was a PEM public key decoder method. PEM is a base64 encoded ASN.1 DER (dynamic encoding rules) encoding of the public key's exponent and modulus. That is, two numbers. It also has some meta information about the keys, but I've never had a need to find out what's in there so I can't tell you what it is. The final point is actually symptomatic of why I have such incuriosity: it's part of the ASN.1 train wreck.

So I decided that this can't be too hard so I'll code it up myself. There was an example in the code which decodes the RSA private key from its PEM format, so how hard could this be? Very hard, as it turns out. Ridiculously hard. It's just two fucking numbers. Why is this so hard? If this were JSON encoded, it would have taken 10 seconds tops to write routines which encode and decode those two numbers. In another 15 seconds, I could have written the encoder for the private key too -- hey, it's got several more fields and it takes time to type. With ASN.1 DER encoding? It took literally 2 days of futzing with it. And that's just the decoding of the PEM public key. Had I needed to encode them as well, it would have been even longer.

What's particularly bad about this is that I've actually had experience with SNMP in the ancient past, so both knew about ASN.1, and have coded mibs etc which requires BER (basic encoding rules). Yes, ASN.1 has not just one encoding, not two, but at least 3 different encodings -- the last being PER (packed encoding rules). All binary. All utterly opaque to the uninitiated. Heck, I'll say that they're all utterly opaque to the initiated too.

So why am I ranting about ASN.1? After all, once the library is created nobody will have to deal with the ugliness. But that misses the point in two different areas. As a programmer, there's lots of stuff that's abstracted away so that you don't have to deal with the nitty-gritty details. That's goodness to a point: if you're using something regularly, it's good to at least be somewhat familiar what's under the hood even if you have no reason to tinker with it. In this case, I really had no idea what was in a PEM formatted public key file, and I've dealt with them for years. It's just two fucking numbers. In the abstract I knew that but ASN.1's opaqueness took away all curiosity to actually understand it, and when I saw methods that actually required the two parameters of modulus and exponent, I'd get all panicky since I didn't really know how they related to the magic openssl_get_publickey which decodes the PEM file for you. Seriously. How stupid is that? Had it been a json or XML blob I'd have been able at once to recognize that there's nothing to be afraid of. But it took actually finding an ASN.1 dumper and looking at the contents to realize how silly this was.

That gets to my next point: after I found out that was really just two fucking numbers, it still took me two days to finally slay the PEM public key decoding problem. The problem here is innovation. Maybe there are masochists who would prefer spending time encoding and decoding ASN.1 and they are entitled to their kink, but the vast majority of us want nothing to do with it. Even if there were good ASN.1 encoding and decoding tools -- which there are not in the free software world -- I'd still have to go to the trouble writing things up in their textual language and run it through an ASN.1 compiler. Ugh. It's not just javascript that has this problem, it's everything. ASN.1 lost. It's not supported. It makes people avoid it at almost all costs. That hampers innovation because if you want to add even one field to a structure, you're most likely breaking all kinds of software. Or at least that's what any sane programmer should assume: most of this ASN.1 code is purpose built, and not generalized so you should be very scared that you'll break vast quantities if you added something. End result: stifled innovation.

I write this because I think that a lot of the problems with getting people to understand crypto is tied up with needless distractions (the other is that certificate PKI != public key cryptography, but that's a rant for another day). Crypto libraries are hard to understand generally because let's face it, crypto is hard. But crypto libraries/standards use of ASN.1 makes things much, much more difficult to understand especially when all you're talking about is two fucking numbers. It's all lot of what the problem is in my opinion, and it's a real shame that there doesn't seem any practical way out of this predicament.

Friday, June 29, 2012

localStorage security

In my previous post, I wrote about a scheme to replace symmetric key login with an asymmetric key scheme where it keeps its keys in localStorage. I mentioned that the use of localStorage is certainly a questionable proposition, and looking around the web there are definitely a lot of OMG YOU CAN"T DO THAT!!! floating around. So it's worth going over the potential attacks on this scheme which so relies on the relative security of localStorage.

First off, what are the security properties of localStorage itself? Not much. The browser walls localStorage off from other origins so that other sites can't snarf up its contents, but other than that anything executing within the context of the origin site can grab the entire localStorage object and have at it. That sounds pretty scary to be saving rsa private keys. But taking the next step, I can really only see two vector that could be used to reveal those keys: somebody or something with access to, say, a javascript console could trivially grab those keys. The other vulnerability is XSS and other kinds of injection attacks where malicious javascript loaded into the browser could gain access to the keys.

Local Access to LocalStorage

So how much of a threat is malware or somebody with console access snooping on keys? Well, it's obviously a big threat. If somebody has unfettered access to your device they can do just about anything they want including going into your browser and opening up the javascript console, for example, and snarfing your keys. So that sounds fatal, right? Well yes it is fatal, but it's not uniquely fatal: they can also snarf up all of the stored browser passwords trivially, install keyboard loggers to grab your credit cards, and generally wreak havoc. So pointing out that bad guys 0wning your computer is bad doesn't say much about this scheme. Does this scheme make things even worse? I don't see it. You might point out that I can lock my browser with a password which assumedly encrypts its stored passwords. Assuming that that password hasn't been compromised (a big assumption), it provides some protection against snoopers. But there's nothing fundamental about my scheme that prevents using a local PIN/password to unlock the key storage as well. Indeed, some apps may very well want to do that to increase the security at the cost of more hassle for the user. Another possibility is that we can do some cat-and-mouse games with the keys. See below.

Evil Code and Key Security

Like most environments, executing evil code is not a particularly good idea. For javascript, once a piece of code has been eval'd, its just so much byte code in the interpreter and the browser cares the least about its provenance with respect to who can see what within that browser session. So the short answer is that evil code loaded into your site's browser window is going to give complete access to the user's private key. We can maybe make that a little harder for the attacker (see below), but at its heart I cannot see how you can truly avoid disclosure. That said, code injection is a serious threat period: if evil code is inserted into your browser session, it can snoop on your keyboard strokes to get blackmail material when you're on sites that maybe you ought not be, make ajax calls back to your site in the user's name to, oh say, buy those UGG's the attacker has been drooling over, and generally do, well, any number of evil things. So there is ample reason to protect against code injection attacks and the web is full of information about what one must to to protect yourself. So I don't see how we're making things worse by keeping private keys in localStorage.

Storing Passwords vs Asymmetric Keys

It's worth noting that we're talking about asymmetric keys and not passwords when evaluating threats. As Linkedin shows, the real threat of a password breach is not really so much to the site that was breached, it's all of the sites you used the same password on too. That is, the collateral damage is really what's scary because you've probably forgotten 90% of the sites that have required you to create an account. So storing a password in localStorage gives a multiplier effect to the attacker: one successful crack gains them potential access to 100's if not 1000's of sites. That's pretty scary especially considering that the attacker just has to penetrate one site that's clueless about XSS to get the party rolling. Worse: revocation requires the user to diligently go to every site they used that password on and change it. Assuming you remember every site. Assuming the bad guys haven't got there first.

So for the inherent vulnerabilities of storing asymmetric keys in localStorage, it's worth noting that we gain some important security properties, namely the localization of damage due to being cracked. If an attacker gains access to a user's private key they only gain access to the site that the key is associated with, not the entire Internet full of sites that require accounts. And revocation is trivial: all that is required to stanch a breach is to revoke the individual key that was compromised on the individual browser that stored it. It's not even clear that you ought to revoke all of the keys associated with a given compromised user's account since they'd still need to have access to the individual devices with the two attacks above. But it might be prudent to do so anyway. And it seems very prudent when storing the public keys associated with a user account to store some identifying information like the IP address, browser string, etc so that users can try to get some clue whether new enrollments have been performed (which would indicate that their email password has been breached too given the way we do enrollment).

Cat and Mouse

One last thing, everybody with half a security head will most likely tell you that security through obscurity is worthless. Heck, I've said it myself any number of times. And it's still mostly true. However, it's also true that bad guys are as a general rule extremely opportunistic and are much more likely to go after something that's easy to attack than something that's harder -- all things being equal. And if an attack gives you a multiplier effect like getting a widely shared password, even better. So while we can't absolutely thwart bad guys once they're in and have free reign, we can make it a little harder to both find the keys in localStorage, and to make use of them once they find them. For one, we could conceivable encrypt the private key in localStorage with a key supplied by the server. Yes of course the attacker can get that key if it has access to the code, but the object here is to make their life harder. If they have to individually customize their attack scripts, they're going to ask the obvious question of opportunity cost: is it worth their while to go to the effort, especially if the code fingerprint changes often? Most likely they'll go after easier marks.

Likewise, we could change the name of the localStorage key that contains the credentials and set up a lot of decoys in the localStorage dictionary to require more effort on the attacker. And server-side, we could be on the lookout for code that is in succession going through decoy keys looking for the real key. And so on. The point here is to get the attacker to go try to mug somebody else.

So is it Safe?

Security is a notoriously tricky thing so anybody who claims that something is "safe" is really asking for it. Are there more attacks than I've gone over here? It would be foolish to answer "no". But these are the big considerations that I see. Security in the end is about weighing risks. We all know that passwords suck in the biggest possible way, so the real question is whether this scheme is safer. Unless there are some serious other attacks against the localStorage -- or the scheme as a whole -- that I'm unaware of, it sure looks to me that the new risks introduced by this scheme are better than the old risks of massively shared passwords. But part of the reason I've made this public is to shine light on the subject. If you can think of other attack vectors, I'm all ears.

Friday, June 22, 2012

Asymmetric Key Login/Join

Using Asymmetric Keys for Web Join/Login

I've written in the past about how Phresheez does things a little different on the username/password front by auto-generating a password for each user at join time. This has the great property that if Phresheez has a compromise, it doesn't affect zillions of other accounts on the net. However, the password is still a symmetric key which has to be stored encrypted for password recovery. Storing any symmetric key is not ideal.

As I mentioned in the previous post, what we're really doing here is enrolling a new device (browser, phone, etc) to be able to access the Phresheez server resources -- either the first time as when you join, or for subsequent logins. So I've come up with a new method using asymmetric keys which neatly avoids the problem for storing sensitive symmetric key data on the server. The server instead stores public keys, which by definition are... public. So if Phresheez is compromised, they get none of the sensitive credential information, just public keys which are worth something to Phresheez, but worthless to everybody else unless they have the corresponding private key, which is supposedly hard to obtain. This takes this compromise isolation scheme to the next step, and is surprisingly straightforward.

For web use, I'm taking advantage of the new html 5 localStorage feature to store the asymmetric key. localStorage seems sort of frightening, but the browser does enforce a same origin policy to limit who can use it. If we believe that the browser protections are adequate (and that's worth questioning), then we can use it to store the asymmetric key. Note that although these flows do this in terms of web sites, there is nothing that *require* using web technology. The cool thing is that it works within the *confines* of current web technology, which is sort of a least common denominator.

Join

To join, the app generates a public/private key pair and prompts the user for a username and some recovery fallbacks (eg, email, sms, etc). The asymmetric key is stored in localStorage for later use along with the username it is bound to. The app then signs the join information (see below) using the private key of the key pair, and forms a message with the public key and the signed data.
The server receives the message and verifies the signed data using the supplied key. This proves that the app was in possession of the private key for the key pair. The server creates the account and adds the public key to a list of public keys associated with this account.

Login

Each time the user needs to log in to the server, it creates a login message (see below) and does a private key signature of the message using the asymmetric key stored in localStorage. The signed login message along with the associated public key is sent to the server.
The server receives the message and verifies the signed data. If the supplied public key is amongst the set of valid public keys for the supplied username, then the login proceeds. See below for a discussion about replay.

Enrolling a New Display Head

When a user wants to start using a different device, they have two choices: use a currently enrolled device to permit the enrollment or resort to recovery mode using email, sms, etc
To enroll a new device using an existing app, the app can prompt the user for a temporary pass phrase on the currently enrolled app. this password is a one-use password and expires in a fixed amount of time (say, 30 minutes). It doesn't need to be an overly fussy password since it's one-time and timed out. The app sends this temporary password to the server with a login message (see below). The server saves the temporary password and timestamps it for deletion -- say less than one hour. An alternative, is that the app can generate a one-time password for the user and send it to the server. Either work.
Alternatively, if an enrolled device is not available, a user can request that a temporary password be mailed, sms'ed, etc to the user. The server stores the temporary and timestamped password as above. The user receives the temporary password and follows steps 4 and 5 to complete the enrollment.
The user then goes to the new device where they are prompted for a username and the temporary password. The new device creates a public/private key pair as in the join flow, and signs a json blob with the username and temporary password (see below). The new key pair is stored in the new device's localStorage
The server receives the signed json data along with the new public key and checks the temp password stored in 2a or 2b. If they match, the temp password is deleted on the server, and the new public key is added to the list of acceptable public keys for this user. Subsequent logins from this device follow the login flow.

Message Formats

Note that there isn't anything sacrosanct about json here. It could be done using a GET/POST URLEncoded form data too. I just happen to find json a nice meta language. And by Signed, I mean a sha1|256 hash over the data and signed with the private key. I suppose I could sign the pubkey as well, but that's just details, just like i'm not specifying the canonicalization of what's in the hash.

login/join message

{"pubkey":"--pubkey data--", "signature":"RSA-signature over login-blob", "body":"--signed login/join--"}

signed login

{"cmd":"login", "username":"bob", "timestamp":"unix-timestamp", "optional-temp-password":"otp"}

signed join

{"cmd":"join", "username":"bob", "timestamp":"unix-timestamp", "email":"bob@example.com", "sms":"1.555.1212"}

replies (not exhaustive)

{"sts":200, "comment":"ok"}
{"sts":400, "comment":"database down"}
{"sts":500, "comment":"bad encrypted data"}
{"sts":500, "comment":"timestamp expired"}
{"sts":501, "comment":"username taken"} // joins, but a re-join with a enrolled key is ok

Replay Protection

It's worth discussing replay protection. Here I have a timestamp which would assumedly need to be fairly well synchronized with the server time, and be relatively short lived -- say a few minutes. Alternatively, if it's acceptable to add a step, the client can request that the server send a nonce and add the nonce to the encrypted blobs instead.

In all cases, however, it should be assumed that the entire transaction is sent using TLS so that the server-client communication is private. Subsequent transactions may or may not be sent over tls... session management, etc is out of scope of this idea.

Multiple Accounts on One Device

A shared device with multiple account is possible if the username is stored along with the asymmetric key pair binding them to each other. Multiple entries can be kept, one for each credential, and selected by the current user. This, of course, is fraught with the possiblity for abuse, since you're enrolling the device potentially long-term. A couple of things can possibly be done to combat that. First, the user can request that the credential be erased from localStorage. Similarly, in the enrollment phase, a user could request that the key pair only be kept for a certain amount of time, or that it not be stored at all. Last, it's probably best to just not use shared devices at all since that's never especially safe.

About Public Key Encryption

I'm a little creaky on RSA right now so forgive me if I get some of the details wrong. I've checked on a newish linux box running Chrome and public key verifies are cheap while private key signatures from within javascript are more painful (~1s for a 1024bit rsa). I doubt that's a deal breaker, and in the long term giving native BIGNUM support to javascript may not be a bad idea. For hybrid apps like Phresheez, it could even reach out to the native layer to get keys and signatures if it's a big problem. Likewise, generating keys can be slow, but possibly backgrounded while the user is typing username, email, etc if they need to. And of course, there's the perennial question of the RNG. How good Math.Random() in js is certainly an interesting question. However, we have to keep in perspective what we're really changing from which is crappy megashared user passwords. A 512 bit RSA key with a not terribly good RNG is most likely still better that the current situation, and there's a pretty darn good chance that we can do better - maybe even much better.

Conclusion

In conclusion, this mechanism provides a way to finally break the logjam of pervasive insecure shared secret schemes that are so prevalent not only on the web, but everywhere. The server never needs to keep a long term and potentially sensitive symmetric key, nor does it ever need to store anything that is not fundamentally public (ie, a public key). This wasn't really available for use one the web until we could use localStorage to store credentials in the browser. In conjunction with out of band recovery mechanisms like email, SMS, as well as currently enrolled devices, we can enroll new devices using that generate their own credentials so that a compromise of one device doesn't even compromise other devices you own.

The big question is whether we can make the user experience close to what people's current expectations are, but with a few twists -- like, for example, making clear that "recovery" isn't a moral failing, but the expected way you enroll new devices. UX is a tricky thing, and should not be discounted, but it seems there is at least hope that it could be successful.

Saturday, June 9, 2012

Client vs Server Charts

Charts are a very quick way to view statistical data, and good charting packages can bring a lot of neat ways to slice and dice that data. Since Phresheez started out as a fairly typical server side web site, it was pretty natural to generate the charts server side as well. Back then, javascript was still pretty slow, and html5 canvas support nonexistent. A more serious problem in reality is that I hadn't taken the step to process and cache the statistics for a day, so the amount of data required to be sent to the browser could pretty big -- on the order 100kb typically. So I never really considered it.

Besides, the graphing package I used (jpgraph) is pretty complete and I've really had no complaints about it per se. It's biggest problem honestly is its reliance on an underlying library -- libgd -- which isn't the best. Ok, having done graphics kernels before, it pretty much sucks. In particular, the curve algorithm really sucks producing jaggies and that really bothers me (it's almost as if they're using a two direction ellipse step algorithm rather than 3 directions which didn't Bresenham figure out ages ago?) . And it can't figure out how to write text on the baseline when it's anything other than horizontal, which makes the graphs look rather amateurish. But they have served me well, and it's definitely useful because sometimes only an image will work, like when you need to post goodies to Facebook which doesn't allow arbitrary blobs of html and javascript for pretty obvious reasons.

In the past year, I had made some changes to the server side graphs to freshen them up. This included using a graphic artist's best friend -- gradients. Without getting ratholed about whether that's a good or bad thing, one noticeable effect of using gradients is that the size of the .png (.jpg's look horrible) goes up dramatically. No surprise, but the once 10-20kb graphs were now 30-40kb each. Given that they looked slicker, it seemed a decent tradeoff. A more pernicious issue, however, was caching. People jump between pages with various graphs all of the time. Since people are looking at the graphs, oh say, at lunch when they've been skiing, the images cannot reasonably be cached -- the GPS uploaders are all busy at work for both you and your friends, and you expect that the charts will be kept up to date. So in reality, that 30-40kb is multiplied by the number of charts, your number of friends, and the the number of times you look at the app again. While it was certainly server load, I was much more concerned about user experience since often the reception at resorts suck and trying to download a 30-40kb image each time seems... slow.

So I had long ago fixed the stat aggregation caching problem for it's own obvious benefit. I had been playing around with html5 canvas stuff and was generally impressed with how well it behaved cross platform -- even ie9 does a pretty good job from what I can tell. So I decided on a whim to start looking for a javascript package that does graphing. I'll admit that my research on the subject wasn't the deepest -- in the beginning I was mainly interested in just testing the waters -- but I eventually settled on RGraph. Since the server side graphs had been evolving for years, I was rather worried about how long it would take to just get to the baseline of what I had server side, but I was rather pleasantly surprised that it only took me a week, maybe two tops to get to parity. Better is that the rendering on browsers is much better than libgd, so goodbye jaggies. And it can do cute little animations. And since it's client side, I can attach events to the graphs more easily -- yes, I know that it's a hack since it's a Canvas rather than SVG, but still it's easier to contemplate than krufty image maps.

I had been vacillating about whether to make the change for quite some time for one reason: it increases the size of the web widget by about 100kb, which was pretty substantial. What finally won me over was that I realized that I was being penny wise pound foolish: the cost of an image is say 30kb, and you might look at 3 of them for yourself at one sitting, several for your friends and then you may have several sittings as well. This all adds up, and as I mentioned it creates noticeable lag in the user experience. The client side graphs, on the other hand, all use the same cached statistics blob which is about 10k uncompressed, 3-4k compressed over the wire. So where you might be looking at 200-300kb or more of data transfer over a day, doing it client side is probably on the order 10-20kb, if that. And it appears almost instantaneously, especially if it doesn't need to refresh the stats blob. Compare that to the upfront 100kb code investment which is amortized over the life of the web widget which is generally every couple of weeks, maybe longer, it became obvious that this was a no-brainer.

So I've managed to convert everything over and push out a release. Everything seems to be working, but corner cases on graphs are hard to ferret out (thinking labeling, grrr) so I expect there will be some futzing as they crop up. The support at RGraph was very quick and they're receptive to upgrades which I have a few smallish things that I've dropped the ball on. It's a client side world.

Friday, June 8, 2012

Phresheez Join Passwords

Phresheez requires that you create an account because in order to do anything interesting it needs a place to send points to on the backend. However, we had quite a bit of evidence that users abandon the app before ever signing up. There's probably a variety of reasons for this, but it probably boils down to one of two reasons: either they just don't want to have yet another thing that requires a username and password, or they find that it is too onerous to type all of the necessary information in. I read a very interesting piece on iPad Usability which mostly applies to phones too, and one not very surprising observation is that people really dislike typing on their phones. For Phresheez that's probably even worse because they are probably finding out about us through friends and are probably in a hurry to get out skiing.

So I asked myself, what can I do to lower that energy barrier? Starting out with a naked form is the least friendly, and auto-generating a user account is the best. So I started looking on Android and lo and behold, there is a way to get the user's gmail address. Groovy. Since the left hand side of a gmail address is very likely to be globally unique, I can then use that as the seed to create a unique user id. For the email address, it's a no brainer since we already have the email address. That just leaves the password.

When I started thinking about this, it occurred to me that I could just auto-create a good strong password for them. The app stores the password so it doesn't have to be something they need to remember. Well that's almost true: Phresheez is both an app and a web site, so they may want to know the password to see their stuff on the site as well. I fretted about this quite a bit, but ultimately I decided that a compromise was that I'd auto-generate their password, but leave it in clear text until they decided to lock it which gave them the opportunity to type their password into the web site. That and there's always password recovery. That's where things currently stand.

In doing this I realized that this method has a very interesting security property: since Phresheez generates the password for you, any compromise of Phresheez will not compromise other sites where you might otherwise use the same/similar password. Yes we all know that it is bad to use the same password on multiple sites, but it is the reality of the world that people do this. And why wouldn't they? People are required to join probably hundreds if not thousands of sites for various reasons. Are we really to expect that they create a unique and hard to guess password for each site? Of course not, that's complete idiocy and anybody who spouts such a thing should be flayed alive.

The Linkedin fuckup got me to thinking about this again though. In my annoyance, I posted to NANOG what I thought was so completely wrong about the blog post's posturing toward st00pid lusers. Many people chimed in that anybody who isn't using a password vault thingamajig deserves what's coming to them. But that really misses the point: putting the onus on users to protect themselves is first of all a provably losing proposition, but also obscures the fact that we have been putting them in a completely untenable situation. The current username/password scheme is nearly 50 years old and it really shows. Everybody knows it sucks, so scolding users for being human is not the answer for what is really an engineering failure.

What occurred to me is that the real security advantage of the way that Phresheez does things is that it puts security in the hands of Phresheez rather than users who don't have any clue. They don't have to know to download and use some password vault thingy. Apps can already store your credentials, and all browsers have password rememberers. And even if the browser doesn't have a rememberer, you can almost certainly use html5 localStorage to remember it. As for the need for cross-device passwords that vexed me? Well, now that I consider it, the real answer is password recovery. Every site needs the ability to recover usernames and/or passwords and it is done via your supplied email address. This is just a fact, and is completely orthogonal to password generation. If password recovery has to be there anyway, why not use as feature rather than a necessary evil? Since it is a necessity we shouldn't make password recovery a semi-shameful thing that you "forgot", but the normal way of enrolling a new display head to the site. Maybe we should put a positive spin on password recovery from being something you "forgot" to being something that allows you to add a new device to see your goodies on. That it's the *normal* and expected way to see stuff on multiple display heads, not a failure of character.

In conclusion, I started down this road because auto-generating passwords was more user friendly, but it has turned out that it is seemingly a much more secure way of enrolling users as well. And it puts the onus for better security on developers rather than end users. Snicker all you like about that, but at least there's a chance that developers can be beaten to do the right thing, especially since this isn't all that hard to do.

Sunday, March 18, 2012

Phresheez Has a Yard Sale

Kaboom

I wouldn't have said that Phresheez had an ironclad disaster recovery plan, but at least we had a plan. We do mysql database replication back to my server on mtcc.com and do daily backups of the entire database. We have backups of the server setup, and more importantly have a step-by-step build-the-server-from-the-disto along with config files are checked into svn. We only have one active production server, so that implicitly accepts that significant downtime is possible. On the other hand, we have been running non-stop since 2009 with exactly one glitch where a misconfigured Apache ate all its VM and wedged. That lasted about an hour or so -- hey, we were skiing at the time, so all in all not terrible for a shoestring budget.

Our downtime doesn't take into account routine maintenance, and I had been in need of doing a schema update on our largest table, the GPS point database. So I happened to wake up at 2am and decided to use the opportunity to make the change. Nothing complicated -- just take the site offline and make a duplicate table with the new schema. That's when the fun began. Each of several times I tried, the master side gave up complaining about something going wrong with the old table. I then tried to do a repair table on it, and it bombed too. Strange. After the fact -- a mistake, but it didn't matter as it turns out -- I decided to do a file system copy of the database table file. Death. Dmesg is definitely not happy either about the disk. I tried to see if the index file was ok, and same problem. I tried other large tables, and they seemed ok. A mysql utility confirmed that it was just the GPS point database, even though that was pretty bad by itself.

So... I was pretty much hosed. Something had blown a hole into the file system and torched my biggest table -- some 15 Gig big. Fsck with some prodding discovered and repaired the file system, but couldn't salvage the files themselves. So it's restore from backup time. Ugh.

There were two options at this point: do a backup of the slave or just copy over the slave's data file. I wasn't entirely coherent (it was early), and decided to give the first a try. Here's where the first gigantic hole in our strategy came in: either method required that a huge file be copied from the slave server to the master. Except the slave is a machine on a home DSL uplink getting about 100 KB/sec throughput: scp was saying about 12 hours transfer time. Oops.

The long and short is that after about 12 hours, the file was copied over, the index was regenerated and Phresheez was up again no worse for wear as far as I can tell. A very long day for me, and a bunch of unhappy Phresheez users.

Post Mortem

So here's what I learned out of all of this.

First and foremost, the speed of recovery was completely dependent on the speed of copying a backup to the production server. This needs to be dealt with some way. First might be copying the backups to usb flash and finding somebody with a fast upstream to be able to copy stuff to the production machine. Better would be to spend more money per month and put the slave on its own server in the clould. But that costs money.
Large tables are not so good. I've heard this over and over, and have been uncomfortable about the GPS point table's size (~300M rows), but had been thinking about it more from a performance standpoint than a disaster standpoint. I've had a plan to shard that table, but wasn't planning on doing anything until the summer low season. However, since the downtime was purely a function of the size of the damaged table, this is really worth doing.

Disaster on the Cheap

The long and short of this is that when you have single points of failure, you get single points of failure. Duh. The real question is how to finesse this on the cheap. The first thing is that getting access to copy the backup over the net quickly would have cut the downtime about an order of magnitude in this case. Sharding would have also cut the downtime significantly, and for that table really needs to be done anyway.

However, this is really just nibbling at the edges of what a "real" system should be. Had the disk been completely cratered, it would have required a complete rebuild of the server and its contents and it would have still been hours, though maybe not the 12 hours of downtime we suffered. Throwing some money at the problem could significantly reduce the downtime though. Moving the replication to another server in the cloud instead of on home DSL would help quite a bit because the net copy would take minutes at most.

A better solution would be to set up two identical systems where you can switch the slave to being a master on a moment's notice. The nominal cost is 2x-3x or more because of the cost of storing the daily backups -- disk space costs on servers. The slave could be scaled down for CPU/RAM, but that only reduces cost to a point. Another strategy could be to keep the current situation where I replicate to cheap storage at home and keep the long term backups there, but keep a second live replication on another server in the cloud. The advantage of this is that it's likely that a meltdown on the master doesn't affect the slave (as was the case above), so a quick shutdown of the cloud slave to get a backup, or switching it over to be a master would lead to much better uptime. Keeping the long term backups on mtcc.com just becomes the third part of triple redundancy and is only for complete nightmare scenarios.

Is it worth it? I'm not sure. It may be that just getting a fast way to upload backups is acceptable at this point. One thing to be said is that introducing complexity makes the system more prone to errors, and even catastrophic ones. I use replication because it would be unacceptable to have 2 hours of nightly downtime to do backups. However, mysql replication is, shall we say, sort of brittle and it still makes me nervous. Likewise, adding a bunch of automated complexity to the system increases the chances of a giant clusterfuck at the worst possible time. So I'm cautious for now -- what's the smallest and safest thing I can do get my uptime after disaster into an acceptable zone. For now, that's finding a way to get backups onto that server pronto, and I'll think about the other costs/complexities before I rush headlong into it.

So What Happened to the File System Anyway?

At some level, shit just happens. It's not whether something will fail, but when it will fail and how quickly you can recover. But this is the second time in about a month or so that I've had problems that required fsck to come to the rescue. My provider had recently moved me to a new SAN because the previous one was oversubscribed. Did something happen in the xfer? Or is their SAN gear buggy leading to corruption? I dunno. All I know is that I haven't had any reboots of any kind since I moved to the new SAN so there shouldn't have been a problem unless it went back before the SAN move. I sort of doubt that the underlying Linux file system is the cause -- there's so much mileage on that software I'd be surprised. However, after fighting with my provider about horrible performance (100kb/second transfer for days on end) with their SAN's and now this... I'm thinking very seriously about the options.

Thursday, February 2, 2012

The First Two Minutes

I had an interesting email exchange with my friend J. at Twitter. Twitter had just rolled out a new version that was being hyped up as new and improved and specifically was trying to target "new" users by making Twitter more understandable to the uninitiated. I fit that bill to a Tee as I really don't "get" Twitter, so I head over to their web site to take a look expecting to be instantly gratified. It looked the same as the last time I looked though. I couldn't identify even one thing that looked different than the last time, and it was as dry and lifeless as it was the last time. I even loaded up their app. Same. So after a couple of minutes of searching for what the hype was about, I gave up and left.

Later, I sent J. some mail about this and after several exchanges he told me they don't normally care what people have to say until they've used it for about a week or so. Thud. That's the sound of my jaw dropping in disbelief. A week? You lost me in the first few minutes! There was no week for me to get used to anything because I -- like lots of people purportedly -- abandoned it. Again.

Twitter is hardly unique, of course. Every app probably has about a minute or two to engage you. If you don't, you'll be abandoned -- either outright deleted, or ignored until the user does a sweep of the deadwood cluttering up their home pages. Maybe you get one more chance, but that's probably it; some day I should try to grind those stats out for Phresheez. Since I was probably harsh on J., I decided to look at Phresheez from that perspective as well. It of course completely failed too. Phresheez is a particularly hard sell: people download it because they think it has something to do with skiing. Since Phresheez is a tracking app, unless you're already skiing at the time your gratification is not instant. From what I can tell with the data, lots of people load it either far in advance (ie in the months leading up to ski season), or load it days in advance (ie, leading up to a trip). Both require a pretty big leap of faith on the part of our users: that we'll entertain them if they use it. I don't doubt that quite a bit of our abandonment rate is nothing more than people forgetting about the app in their excitement to get on the mountain.

I don't think that explains all of it though. In fact, I suspect that the reason for abandonment of Phresheez is probably much like Twitter's. With Twitter, you are invited after joining to write something. So you do. Nothing happens. Why did I do that? This app sucks. So I never see the vast amount of content that Twitter actual has. Phresheez is sort of similar: you load the app and after you join it tells you to go outside. So you go outside and it gets a GPS fix. In pressing buttons, you'll see yourself on a map, and maybe some not terribly exciting charts and stats. Ok, maybe there's some there there, but you don't see all of the gaming, awards, cool charts, etc, etc, in full action. Maybe you see some potential and don't delete it outright, but there's a good chance you'll forget about it when the time comes.

So we're both guilty of the same sin: we failed to instantly gratify. Strangely, I have a much better idea about what Twitter should be doing than what Phresheez should be doing. Twitter has a huge amount of content, and it probably wouldn't take too much to coax some idea of the kind of content I'd like to see: Twitter is amazingly good at finding out about fires in SF in real time, for example. So I -- like lots of people -- am likely to be interested by default about what's trending in my home town. Why didn't they entertain me with that in the first few minutes? And why don't they try to figure out what might interest me either explicitly like the Netflix playlist suggestion machine, or making it obvious that searches are as important if not the most important in the first few minutes? Instead they make me feel stupid for typing in a tweet that I know that nobody will read. This is a huge advantage that Twitter has: it's not difficult to be passively entertained. Yet Twitter squanders that advantage. I don't get it.

As for Phresheez, it's still a tough problem. People do seem to respond to notification meatballs, so I added some pre-populated notifications upon join with cool days from our users. That seems to get some response. In an unreleased version, the app will also have as a top level option to view a video which shows you what the app can do. I'm not as optimistic about that because it's not all that different than the screen shots in the app store, but you never know. What I'd really like is for people who load it at home to just go out, take a bike ride, walk, etc and then check in again, but that seems to be asking for too much. Aside from the blinders of "it's a ski app!", asking for that violates the two minute rule.

One thing that really did make a big difference in the abandonment rate, however, is streamlining the join process. If you don't need or can postpone a join process, you're already ahead of the game: people hate typing, and are rightfully suspicious of giving up information. For Phresheez, we need a username, password and email address. As it happens, you can get the gmail address on Android so I use that for email, and use the local part to guess at a unique username. For the password, I just auto-generate it so that the only thing that the new user has to do is press the join button. There is no equivalent on iPhone, but I recently did a similar thing using Facebook to auto-populate these fields -- a "Join with Facebook" flow. For Android it was pretty obvious that it made a big difference, though I haven't quantified it. The "Join with Facebook" feature hasn't been released yet, but it will most likely help on iPhone.

In conclusion, my scolding of J. only managed to show my own inadequacies with Phresheez. This is a really tough problem for a Phresheez kind of app, or any sort of app that demands delayed gratification. The first step, however, is realizing how important the problem really is. So I grok that now. But still it's tough.

Wednesday, February 1, 2012

The Trouble with Templates

Templates have always perplexed me. They sure sound like a good idea: who likes the idea of all kinds of layout and content wrapped up in one big ball of code that plops it out? Wouldn't it be nicer to have a clean layer that separates layout, content, and code? Here's a piece of html layout, some json content blob and poof! A page pops out. Nothing can be easy though, so your majick needs to know how to bind the content to the layout.

So the first thing that comes to my mind is, well, howse about something that just binds values to particular places in some html template via some regex incantation? But that majick is weak and needs to know where to bind things. And JSON itself is just a way to marshal bits onto the wire, not a formatting transmogrifier. So it would be great if a couple of regex's were all that it really took to work, but it doesn't in the normal day to day world from what I've seen. Just take for example 1 vs. 2. Assuming your JSON blob just has a number in it, you often want to pluralize the word around the number itself. That requires either the ick factor of pluralizing the word server-side, or the embedded logic to do the same. And that's just one very small example... there are legions.

So what is a template anyway? I'd say that it's data possibly with embedded control logic. What's a normal script? Control logic possibly with embedded data. It seems to me that you pick your poison: do you want the HTML layout to drive the overall structure of the module, or do you want the programming structure to drive the layout of the page? All of the templating solutions I've seen are HTML with embedded control logic, usually with some obscure control language that somebody hacked together. With phpBB which I use for Phresheez forums, the templating is even weirder: they have built their own control language, one of whose controls is a PHP escape itself! So you have a PHP script interpreting a template which can then eval PHP. This is better than just doing an include() with embedded PHP itself? At the very least, the implicit eval in include() is going to be faster than any PHP parsing through the template (and then possibly calling eval again). So I just don't get it.

One thing seems pretty clear to me: for all of its virtues, HTML as a structured way of looking at programmatic source leaves a lot to be desired. Maybe it's me, but I find that I figure out what's going on in a module a lot faster if I look at a real source module rather than a bunch of isolated programming in a sea of markup. But then again, a bunch of html += '' are pretty heinous too. Neither method really makes a barrier if you want a separation of layout monkeys from code monkeys. They're both likely to have their fingers in the same modules, and they're certainly going to have to coordinate no matter what.

It seems possible, however, to have templates that are nothing more than markup though. If the pertinent nodes in the DOM are assigned ID's and js code can hunt them down, add the needed listeners, add the formatted data from the JSON blob, etc. How practical this is I don't know. For one, you're likely to be still generating some markup in code -- a table with a dynamic number of rows, for example. Honestly the thing that frightens me the most about that sort of mechanism is how you create, coordinate, and maintain the ID namespace between the layout monkeys and the code monkeys. That really seems like a nightmare.

I'm not really sure what the solution is, but maybe this might be a start of a conversation: don't shoehorn mock programming logic into html. Even on its best day, it's an orc-like counterfeit of a real programming language even in comparison to a the decidedly low-elf javascript. Maybe instead what would be good is to actually just *split* the HTML layout duties if that's how your shop works. That is, the layout monkey checks in a real piece of html -- without ID's as above -- and the code monkey translates that into working code that can deal with all of the subtleties that cause much consternation with an inadequately expressive templating language. Sure it takes two steps, but that's sort of a good thing in my view: each piece of code is owned by the proper stake holder. Diff is our friend: when changes happen to the layout, you just diff the html source and figure out how that affects the actual layout generating code.

So I guess the long and short of this is that I'm just not terribly convinced that solve more problems than they create. They sound good, but they always seem to keep coming up short from what I can see. Maybe it's yet another case of TANSTAAFL. In a new client centric world, maybe it's time to step back a bit and go back to first principles for a while though.

Monday, January 23, 2012

Call Me Rip Van Webble

Before I Fell Asleep
As best as I can tell, I fell asleep in the winter of 1998. The dot com craze was full of irrational exuberance and why shouldn’t it? In a few short years, the entire world went from being mainly disconnected with a few islands of lameness, to a global Internet where everybody was rushing to put up an online presence even if they weren’t entirely sure why that was a good thing. Clearly there was some there there though: through the magic of HTML forms you could click one button and get pizza, pet food and tickets to ride. Sure they looked a little janky and the web certainly wasn’t as friendly as a native app, but none of that really mattered. The choice wasn’t between native apps or the web, it was the choice between the web or nothing at all. HTML at first was a pretty simple beast. You could pick up the jist of how to do things in a matter of hours. But things were changing: Microsoft -- surprising for a big company -- sensed an existential threat and came out with Internet Explorer as its answer to Netscape. So ensued browser wars and with it the quest to one up each other. Back in the early days, CSS was barely being talked about and the web world wasn’t quite sure whether java plugins were the way to way to go, or something else. There was this funky thing called javascript, but it wasn’t what respectable people used because it was riddled with incompatibilities, was slower than molasses, and was generally useless. No, you did what nature intended: you built your pages on the web server. Duh. And you went to great pains to deal with the incompatibilities with browsers due to immaturity of the web and of course the politics of world domination. You also tried as hard as you could to make what got sent over the wire wasn’t too piggish: dial up modems were the norm even if some crazies were talking about using the forlorn dregs of the cable spectrum for something called broadband. Again javascript, like flash, was the target for disdain as being useless bloat. Life was simple in those days because expectations were so low: the amazing thing wasn’t that the pig sang well, it’s that the pig sang at all. The last thing I remember before nodding off to sleep was something about “push” being the Next Big Thing. It was that bit of hype, I’m convinced, that caused my eyes to glaze and then shut...

After I Awoke

When I awoke from my long sleep the web world had changed quite a lot. Many things were as familiar as ever, but many were different. My first introduction was through my project Phresheez. Phresheez is my app and site which does all kinds of fancy things to visualize your skiing/boarding days in new and innovative ways. Phresheez started as a trip to Kirkwood to ski in April of ‘08 with a Garmin hand held GPS in my pocket. When I got home, I uploaded all of those points and hacked together a crude bread crumb map using the Google Maps API. At the time I was rather confused as to exactly who should be creating the necessary javascript code: should it be in the context of the server PHP generating the javascript code or what? In retrospect that’s completely obvious, but it shows where I was before I went to sleep -- the server side did and drove everything. In any case, after seeing the Google map of my day I was hooked. This was seriously cool, and it could be done on any old browser! Wow! So off I went playing around with Phresheez mining the data sets from my last few days of skiing for all kinds of interesting goodies. At first, Phresheez was nothing more than a web site that required a rather onerous way to upload points via GPX files. But it was a start. Sites require all kinds of supporting logic for user management, admin, etc, etc so I went about designing those. Since I had fallen asleep, lots of things had been designed to make that kind of drudgery slightly less of a drudge. But I didn’t know about them, and when I asked around it seemed like the general consensus of people I asked was that “it’s best to roll your own” was much to my surprise the answer. That turns out to be not *entirely* correct, but I didn’t know any better. So to nobody’s surprise my first pages were ultra static server produced blobs that were as unwieldy as they were unbeautiful. If you took my pages to the Antiques Road Show, they would have correctly dated them circa 1995. There was a paradox on the site though: the map visualization page was quite alive in comparison, and I was constantly improving it: be able to see your avatar move around the resort? Sure! Be able to see your friends at the same time? No prob! Be able to add geotagged pics that pop up when your animated avatar gets there in time? Wow, that’s cool! So it was that the seeds were sown for a new to think about this. And it was all done because of, not in spite of javascript. While I was asleep, on the other hand, I had recurring nightmares of the horrors of browser incompatibility, buggy javascript, and above all else that javascript was dog slow at doing just about anything. But now that I was awake and was actually playing around it was remarkable to me how compatible everything actually was. Well, there was IE6 but who cares about an ancient piece of M$ crap anyway. So even with Firefox 1.5 things actually worked pretty well, and when I upgraded to Firefox 2.x the performance of javascript was noticeably faster. Very cool.

Smart Phones and Hybrid Apps

The other thing that happened around 2008 is that iPhone apps started really hitting the shelves and that it was obvious that this wasn’t a passing fad. And then Google announced Android which had very similar characteristics to the iPhone and it became pretty obvious to me at least that since M$ and RIM both had their heads up their butts, the players in the inevitable mobile death match were likely set. For Phresheez, iPhone & Android were a godsend because they both had GPS’s integrated in them: no more clunky uploading of GPX files; just send the points directly from the phone to the server like magic and enjoy. So off Phresheez went. At first the mobile apps were nothing more than glorified point uploaders with a bare bones set of things it would show you: server-side created charts and static maps with your tracks and a few other things. It became clear after using the app the winter of ‘08/’09 that the app had the potential for much more. The first foray into this was a friend finder. At first I thought about writing it using the native UI, but with iPhone and Android to support that seemed like a fool’s errand. So I learned about embedded web browser objects. After some fiddling around it turned out that not only were the embedded web browsers acceptable, they were pretty state of the art on top of it. So it was that the first bits of my hybrid web/native app were born: completely out of necessity, but a virtue as it turns out as well.

Just Using Javascript

At the same time, the features kept rolling in: being able to see your animations on the phone, being able to friend people and see what they were doing, etc, etc. Since I stole quite a bit of the ideas if not code from the main site’s javascript heavy Google Maps based pages, it became completely natural to just write all of the layout in javascript: why did the server side need to worry about that anyway? In fact, the server side was really sort of in the way: all of the data was demand loaded using AJAX so its piece in the display layout was pretty limited by necessity. The other thing that drove this is that since it was being built as the in-app user interface, you don’t have or want any of the normal browser chrome, so the web portion of the app needed to be both self-contained and long lived. At first I was very cautious about this because I had no idea just exactly how much javascript would cause the app to roll over and die. As it turns out, it must be quite a lot because I don’t think I’ve ever run into that limitation. So it was that the server side on most user facing pages was becoming nothing more than a boot loader for the javascript, CSS etc. And this was a Good Thing, though not any sort of virtue in my mind. Another curious thing was also happening: as I became more familiar with javascript, I found myself moving more and more layout from the server side into the javascript side generally. It was nice to have interactivity to the dowdy static pages. As I became more familiar, it was easier to envision just coding the content layer up in javascript rather than server-side. I’ll say that even to this day I still feel more comfortable starting out a new page as a server generated page. Why? Two reasons: 1) old patterns are hard to get past and 2) I still don’t have a framework where I can cut and paste a javascript/AJAX method template easier than print’ing ‘<html></html>’ server side. Both of these are my own failings though, not anything inherent.

HTML 5 Widgets

The main problem with a web app within a native app, however, is that it unsurprisingly needs to load it off the net if you want it to be demand loaded as opposed to statically included from the resources in the app. We wanted the dynamic load aspect because of our complete and justified aversion to dealing with the Apple App submission process and its twisty mazes of Fail. But start up times are seriously unhappy especially with crappy data service like you find in many ski resorts and most of San Francisco. Back then, we also had to contend with the iPhone 3G which didn’t have the ability to background apps, so each time you fired up the app, it would have to load its UI off the net. Fail. Something needed to be done about this. So on we go to the next season and I got wind of a thing that was floating around the HTML5 world: widgets. I really knew nothing about HTML5, but the widget idea was a simple one: package up all of the parts of a web page into a zip ball, write a manifest for it, and provide an interface to fetch it. Here was the solution to the nettlesome problem of not wanting to reload the UI fresh from the server each time (realize that browser caching on phones sucks for the most part). The other important part of the widget approach is that it buys a huge amount of agility. Going through the app approval process with Apple is nothing less than a Gothic horror. And even with Android, you still have to contend with the fact that if you put a new version out a sizable percentage of the user population won’t bother to upgrade. So the widget approach neatly solves these problems in a webby like way, but without a lot of the webby downsides. The other thing that was happening around this time is we discovered that what we were really building wasn’t a web app, per se. It was a replacement for the UI front end of a native app with a bidirectional RPC mechanism to affect change in both the web and native parts of the app. At first this was very tentative, but it has evolved to the point that almost the entire UI is dealt with in the webview which reaches out for native functionality as needed. It’s not a perfect RPC interface by any stretch of the imagination, so if you have something like a game or other needs that requires real-ish time components I really doubt that this webview architecture is appropriate. For now at least. But there are a huge number of perfectly interesting applications whose requirements are actually quite modest for which this architecture works just fine.

Where the World Actually Was

So I really had not much clue where the web world actually was. I had heard about things like Ruby on Rails but really didn’t understand how it would benefit me: it always seemed like a mental disconnect. Part of the reason I never investigated my disconnect was some good advice from Dan Scheinman at Cisco: Ruby programmers expensive, PHP programmers cheap. Yes, I knew that Rails runs on Python, PHP and probably most other languages these days but it didn’t occur to me to wonder whether I should care. So I didn’t. As it turns out, maybe I should have cared but not in an embrace “care” kind of way. The reason why it gave me such a disconnect is because I wasn’t approaching the Phresheez problem even remotely the same as the server-centric content generation way. Ruby and all of the other things like it grew up when javascript sucked and was something to use mostly at your peril, and then mostly to do flashy things that if they broke the site was still operable. To some people, the latter was and probably still is a big driver, but I didn’t care: get a semi-recent browser or get lost. With my mobile web app it was even less of a consideration because I knew what the capabilities and deficiencies were of the embedded browsers I had to deal with: webkit I <3 you. So where most of the world was dealing with server-side content layout generation, I was at this point doing nearly all of my content generation client side. Disconnect detected at least.

The World Has Changed

When I started thinking about this it finally became obvious why the world was the way it was: it was largely still operating on the assumptions from 7 or 8 years ago where javascript was an untrustworthy lout to be avoided if possible. This is hardly surprising as it takes a long time for new technology to percolate through and for people to grok that a shift has happened. I didn’t realize that it had happened because I was asleep and merrily went ahead using the web world as I found it because I expected change even if it took me a while to get it. But now I think I do. The old server-side layout pattern is a dead man walking. The speed of javascript has been increasing at an amazing clip, and the new goodies of HTML5 including the canvas tag’s vector graphics means that you can do a tremendous amount of work using somebody else’s CPU cycles: namely the client device. The server side in the new world is for two main purposes:

Create skeletons of a page’s static programs and data This is like the web equivalent of a boot loader
Serve up dynamic raw content in the form of json blobs from AJAX callbacks

The net of this is that world is coming back to something that is familiar and foreign at the same time: using javascript as the base programming language isn’t entirely different than using, oh say, Python with GTK. But it’s not really the same because there are all kinds of web and mobility considerations. In particular there is the nettlesome issue of startup performance which is especially acute for mobile apps. What occurs to me is that the widget approach is an instantiation of a particular kind of web based boot loader. We did this originally because it was pretty straightforward: zipping up a page with wget and unzipping it in an app is not terribly hard. It does require a native app context to store the elements of the page so the widget approach is practically limited to apps, typically of the mobile kind. That’s nothing to sneeze at, but HTML 5 widgets aren’t the only approach.

A New Approach to Local JS Modules

One of the nifty new features of HTML 5 which happens to be implemented on both the iPhone and Android (and mostly likely any other webkit based engine) is localStorage and openDatabase. So it seems quite possible that instead of creating a monolithic “webapp” zip ball that requires native app support to store and load, that we can decompose the individual components of the web page and store them locally in the context of the browser itself. HTML 5 localStorage is a very simple interface of key/value so any sort of complexity will require management of its namespace and limitations. It could be used to store javascript, CSS, templates, etc, etc, in the context of the app/site. The problem is that there’s meta data required to be stored with each file so you’d need to do something along the lines of the the old Apple resource and data fork for files for the client stub of the web boot loader to store resources (eg, javascript, CSS, etc) and the metadata associated with it (eg, current revision). On the other hand, the openDatabase interface -- which is nothing more than a very thin veneer over SQLITE as far as I can tell -- happily stores all of the metadata in the same row as the file blob. Normally I’d be cautious about such a thing, but SQLITE has been around for a long, long time now so hacking up the javascript shim should have been a pretty straightforward exercise for the browser guys without huge exposure to weird API bugs, incompatibilities, and other random fails. In the New Approach, the server-side boot loader serves up the skeleton of the page not with actually direct references to the file to load (eg <script>), but with an abstract notion of what the files are, the meta data including versioning information, and the javascript client stub of the loader. The client loader then just looks through its database to see if it needs to upgrade the particular files referenced in the manifest, or whether the local versions are fresh. This was a quite straightforward exercise. The beauty of this is that the client boot loader stub is very simple, and the server side should not need to change very much from its normal job of serving up versioned, packed and zipped page components. Image data -- which is a benefit of the Widget zip ball -- is problematic though: instantiating an image from local data is kind of possible using the data: URI scheme with the Image DOM object, but using them would not be nearly as natural because you’d be forced to use the javascript/DOM mechanism to generate content rather than the HTML <img> tag. Or so it seems at this point. So maybe there really is a case for both the HTML 5 widget mechanism, as well as the New Approach/SQL: they each have their strengths and weaknesses.

Why it Matters

When I woke up, I had expected that there would be standard web sdk’s for drawing all of the standard widgets that one normally needs to do... just about anything. Slider bars, dialogs, windows and all of that sort of thing -- ala Motif or something like that back before I went to sleep. There was not. Or more correctly, there wasn’t anything even remotely dominant. That really surprised me, but it started to make sense: downloading a boatload of javascript to form that sdk is expensive: no one wants to download 1MB of js bloat for a 5k nugget of web goodness. So everybody rolls their own. Given the boot loader I outlined above, however, we can start thinking about the way that operating systems approach this problem with massive graphical/windowing libraries: you share them and demand page them into memory as needed. In the web world we probably aren’t going to see the moral equivalent of a demand paged VM anytime soon, but demand paging is a solution to a specific problem: how to efficiently use RAM. The browser equivalent is not wanting to transfer and execute huge amounts of javascript. We can approximate that by hand coding a manifest of the files needed by the page and letting the client loader deal with them if they’re not currently loaded. This gives the ability to contemplate rather large and standardized libraries in web pages served by a site. Woopie! For uncomplicated libraries like, oh say, jquery a hand coded manifest is pretty straightforward. However, modules in real libraries have all kinds of dependencies and they may not be obvious. Operating systems deal with these sorts of dependencies as one of their functions -- look at ldd on Linux on a binary. I think we’re going to need an equivalent automated dependency graph generator to form a “real” javascript library loader. Here our memory “pages” are javascript modules (really they’re more like the old idea of segmented VM). So somehow - through some in-file declarations, or actually compiling and generating -- we want to suss the dependencies out and send them to the web page in an automated fashion. One of the implications here for engineers is Trees Good, Cyclic References Bad. That is, we really want the path to, say, the slide bar widget to be a linear path of modules from root to leaf rather than finding a cascade of dependencies that pull in half the massive library. I expect this is going to require some clever thought about how to factor (and refactor) modules. It may also necessitate module duplication to keep the tree property. This may sound like heresy, but it need not be: we’re talking about the tree structure as it appears to the client, not whether a module is strategically symlinked as needed. That said, this needs some real life validation. How will a really large library fare? What are the practical limits? How serious are cyclic references? Do we need a module GC to round up dead references in the boot loader client database (think of the effect of refactoring)? These libraries are shared, but only with the confines of a site. Should they be shared wider? Are there going to be site-wide collisions for large, heterogeneous sites? How would that be dealt with?

What’s Left for the New Web World

The larger observation here is that the world is shifting from server-side layout generation to client-side layout and that is going to shake the web world up quite a bit. This paper has mostly focused on what the new world looks like at a low level, and the basic problem of booting a web app up that works well in a mobile environment. Much, much is left unknown here. For example:

What is the solution to templating in the new world?
The boot loader solution is a win with hybrid mobile web apps, but is it a viable approach generally?
Could you get better performance still if browsers allowed javascript to be byte compiled and stored locally instead of the javascript source?
Should this entire idea of boot loading become standardized on browsers so that we can have the browser itself handled this as a browser “OS” level primitive, so that it could deal with multi-site sharing (ie, not have a jquery instance loaded into every site in the universe)?
There is a wealth of server side content authoring software that basically doesn’t exist for a client-side approach. What is actually needed? This is particularly acute with HTML 5 canvas vector graphics, but is really a general deficiency too.
What about SEO? I’ve read Google’s guidelines, but they are clearly written with 1998 in mind. You can fake some of this, but it’s still “fake” and they might notice that. Ultimately this is Google’s problem too.

I’m sure there’s plenty more but this missive isn’t intended to be exhaustive.

Conclusion

I’m sure that a lot of people will take exception to probably each and every thing I’ve written here: that’s cool, the world is a messy place and I’m sure there are a lot of very clueful people who get this and wonder why I think any of this is new, or those who think I’m just plain wrong. Nor is the point of writing this to chronicle general history: it’s to chronicle my own naivete and evolution. While I was asleep there was essentially a complete generation of web engineering methodology that I didn’t see. Unsurprisingly the engineering tools that were just starting to come about speciated and matured in the 10 years I slept. Things like Rails, Cold Fusion, and probably lots of tools that I’m still completely clueless about rule the day. But some forward thinking people also decided to take javascript and CSS at its word that it was both real, relatively incompatibility free and go for it. They were, undoubtedly, very brave. I on the other hand only witnessed their handiwork which made it manifestly obvious that it was not only possible, but an easier way to think of things. Why bother with the awkward situation of server content generation along side AJAX callback and its co-necessary manipulation of the DOM? Why not just move the content layout generation to the client altogether? So that’s what I did, and I haven’t suffered any lightning bolts from an angry Old Testament God for my audacity.