Wednesday, March 29, 2017

Recursively Recursing the Burstorm App

Where I've Been

It's been 3 years since I've written to this blog, but I haven't been asleep again. In fact, I've been working furiously on an web app for a startup called Burstorm. The Burstorm app implements a CAD-like graphical interface for designing IT infrastructure, for both traditional datacenter providers, but most especially for the new-world cloud providers like AWS, Google, Azure, etc. The idea of the app is that you can specify in fairly abstract terms what your requirements are for some deployment (eg, I want 2 Linux boxes with 4 cores, 16GB ram, and 200GB SSD), and the app with provide you with insight about what your choices are to take your model and actually spin it up.

The app has lots and lots of goodies like collaborative editing, importing of cloud provider billing files, the ability to do traditional provider quoting, as well as an entire provider-centric side for creating and managing products both programatically as well as UI-driven.

This post is mainly about me eating my own dog food and telling the world that it was, in fact, tasty. So here's the story. About 3 months ago, I got word that we needed to move providers because a customer was insisting that we run in a ISO-27001 compliant datacenter, and our then-current provider wasn't compliant. When I stopped being annoyed, I decided to use this as opportunity to do something with our app that I'd wanted to do for a long time: use our app to model, build and deploy itself on other cloud providers. Happily, I for other reasons I had almost all of the pieces built in the app to do just that, and this provided the perfect excuse do the last bits and dregs of code so that I could recursively recurse the Bustorm App.

IT Lifecycle

The Burstorm app models the life-cycle of an IT project. 
  • Models: when you want to just create a new design of some infrastructure and figure out who can provide it where and for how much, we call that modeling. Here, you lay out what your requirements and constraints are for your infrastructure, and you can then use the Burstorm Designer to get recommendations of providers that meet your requirements, what the cost is, and for many cloud providers even get price/performance specs on their various VM's through our performance benchmarking initiative.
  • Builds: when you see a provider's offering in the Designer that looks interesting, you can create a Build from it. A Build in the Burstorm lifecycle is much more specific: it includes pricing from a particular location for a particular provider. And for cloud providers that we support, it even includes information about how your Build can be spun up.
  • Contracts: you can also tell the app about infrastructure that you already have deployed with its pricing information, contract terms, etc. The app can give you help to figure out when it might be better to jump ship to another provider, as well as lots of other insights. Additionally, you can copy your current infrastructure into a Model, and start the whole process over again to redesign your current infrastructure with various tweaks you might want to make.

Our App Setup

Enough background, onward! The Burstorm app is a fairly standard web-like app using Ruby-on-Rails, Postgres and Linux on a bunch of different servers: some that are dedicated like our PG master, some that have dual roles like our postgres-slave/webhead, and some that are just webheads. We also use hosted Chef  to provision all of our servers, using Chef roles to determine what pieces of config/packages need to go on what servers. 

"Burstorm App" Model

So in order to achieve our goal of getting the Burstorm app to clone itself, we start by creating a model in the Burstorm app. This illustration shows a simplified view of our infrastructure.

Fig 1. Development and App model

In it, you can see a Design Scenario called "Main App" in the Model which contains the new app. You'll also notice that there's a separate Design Scenario called "Devbox" -- this models a development environment we spin up for our devs. 

By clicking on the Design Scenario "Main App", you can see that we've set its constraints to be within 5000 miles of Dallas, with month-month terms, with flexibility about upfront NRC.




We can now inspect one of our servers. You can see for pgs (our Postgres Slave) that it is set up to be a 2-core, 4GB ram, 200GB storage VM.



Note also, that you can see bunch of tags that are associated with this Objective. In fact, those are the Chef roles that will be used to provision this server should we decide to spin it up!


Here you can see a complete list of provisioning tags which correspond to all of the Chef roles we use for all of our app infrastructure. 

"Main App" Designer

The Burstorm Designer allows you to drill down on a particular design scenario and get all kind of interesting information including costs for various providers and how/where they'd deliver it, price performance information, various charts that show you provider information, histograms, etc. For the Burstorm App cloning the Burstorm App, this is the meat: we're going to use the designer to figure out which providers we are interested in.



In this view, you can see that AWS, and Google look pretty favorable. If we drill down on the details, we can see for both AWS and Google what VM instances they would use to deliver our servers, and how they differ from what we asked for: Google's instances, for example, provide quite a bit more RAM than we asked for, so it's not just the base cost that needs to be considered.



Note also that we're comparing these by Price/Performance. That is the BCU column for each provider. What's cool is that we can change that to sort things by Price alone, CPU performance, IO performance, etc so that we can really get an idea of how these VM's compare on other axes. 

Now that we've figured it all out, we can now move to the next step by copying these solutions to a Build.



Building Our App

We've now created a set of Build Specs which we can use to communicate with the various providers. In this example, we've chosen Amazon and Google, but when I was working on this I actually spun up clones of our app on Azure, Linode, and Digital Ocean too.

asdf

We do need a little setup here to tell the app the credentials to use, how to log into our VM once it's spun up, what provisioning system to use (if any), etc. Here we're spinning up an AWS  t2.medium instance, using Chef to bootstrap the box with the 'benchmark' SSH key to login with.

Spin-Up

So now we're ready to spin up the Burstorm app from within the Burstorm app. Each server objective contains a Run Spec which tells the app all of the relevant parameters, like what the instance name is, the provider's location spec, ssh keys to use, and whether to run Chef after the instance is created. Here we will spin up our AWS Build Spec, and clone the entire setup.


What the app actually does now is hands off a JSON blob describing the Build and Run specs off to a remote web server. This would be running in your infrastructure. This implementation doesn't  require us to have access to your AWS keys since they'd be running on your webserver. Other arrangements could be envisioned though to make this more seamless. Either way is viable, and we're still thinking about what the best way to support this would be (if not both).

Our general strategy has been to define an API endpoint for the Burstorm App that allows programmatic access to our models, builds, product sets, etc. Our Perfrun repo takes that approach, and is the basis of the code to do the actual instance-creation/provisioning. Looking at the repo, you will see that it's just using our API like any other API consumer. In this case, it's creating VM instances, and optionally running our performance benchmark code on them.


If you squint real hard, you can see that we are doing a normal Chef bootstrap. The key is that the Spinner-upper takes the Chef roles we set up with the Burstorm tags above and uses them to provision the servers with each specific environment. This all happens in parallel, by the way, so it only take 5 minutes or so till it's complete.

Todo

There's *tons* more to do. We have the ability to import AWS and other provider's billing files, but it would be interesting to correlate them with what we spun up. It might be fun to map our utilization too against the actual work load. There's all kinds of places this can go, but this is, after all, just a prototype which is intended to spark those kinds of discussions.

C'est fini

Once this is done, we have 4 servers up and running. In the subclassed version I use of the Spinner-upper code, I actually have it communicate with our DNS provider to set up the A and AAAA records, and thus can directly access each server by name. In reality, there are a few more bits and pieces I had to do -- like loading the database -- but almost everything was handled by our Chef.

So there you have it, I used our Burstorm app to clone itself.  The actual process of doing this start to finish probably took an hour or so, if that -- way less time than it took to write this over-long post!