20 years on AWS and never not my job

20 Years on AWS and Never Not My Job

I created my first AWS account at 10:31 PM on April 10th, 2006. I had seen the announcement of Amazon S3 and had been thinking vaguely about the problem of secure backups — even though I didn't start Tarsnap until several months later — and the idea of an online storage service appealed to me. The fact that it was a web service made it even more appealing; I had been building web services since 1998, when I decided that coordinating a world-record-setting computation of Pi over HTTP would be easier than doing it over email.

While I created my AWS account because I was interested in Amazon S3, that was not in fact immediately available to me: In the early days of AWS, you had to specifically ask for each new service to be enabled for your account. My new AWS account did come with two services enabled by default, though — Amazon Simple Queue Service, which most people know as "the first AWS service", and Amazon E-Commerce Service, an API which allowed Amazon affiliates to access Amazon.com's product catalogue — which was the real first AWS service, but which most people have never heard of and which has been quietly scrubbed from AWS history.

It didn't take long before I started complaining about things. By this point I was the FreeBSD Security Officer, so my first interest with anything in the cloud was security. AWS requests are signed with API keys providing both authentication and integrity protection — confirming not only that the user was authorized, but also that the request hadn't been tampered with. There is, however, no corresponding signature on AWS responses — and at this time it was still very common to make AWS requests over HTTP rather than HTTPS, so the possibility of response tampering was very real. I don't recall if anyone from Amazon showed any interest when I posted about this on the (long-disappeared) AWS Developer Forums, but I still think it would be a good thing to have: With requests going over TLS it is obviously less critical now, but end-to-end signing is always going to be better than transport-layer security.

Of course, as soon as Amazon EC2 launched I had a new target: I wanted to run FreeBSD on it! I reached out to Jeff Barr via his blog and he put me in touch with people inside Amazon, and in early 2007 I had my first Amazon NDA. (Funny story, in 2007 Amazon was still using fax machines — but I didn't have a fax machine, so my first briefing was delayed while I snail-mailed a wet-ink signature down to Seattle.) Among the features I was briefed on was "Custom Kernels"; much like how AWS Lambda works today, Amazon EC2 launched without any "bring your own kernel" support. Obviously, to bring FreeBSD support to EC2 I was going to need to use this functionality, and it launched in November 2007 when Amazon EC2 gained the ability to run Red Hat; soon after that announcement went out, my FreeBSD account was allowlisted for the internal "publish Amazon Kernel Images" API.

But I didn't wait for this functionality to be offered before providing more feedback about Amazon EC2. In March 2007 I expressed concerns to an Amazonian about the security of Xen — it was at the time still quite a new system and Amazon was the first to be deploying it in truly hostile environments — and encouraged them to hire someone to do a thorough security audit of the code. When the Amazonian I was speaking to admitted that they didn't know who to engage for this, I thought about the people I had worked with in my time as FreeBSD Security Officer and recommended Tavis Ormandy to them. Later that year, Tavis was credited with reporting two vulnerabilities in Xen (CVE-2007-1320 and CVE-2007-1321); whether there is any connection between those events, I do not know.

I also mentioned — in fact in one of Jeff Barr's AWS user meetups in Second Life — that I wanted a way for an EC2 instance to be launched with a read-only root disk and a guaranteed state wipe of all memory on reboot, in order to allow an instance to be "reset" into a known-good state; my intended use case for this was building FreeBSD packages, which inherently involves running untrusted (or at least not-very-trusted) code. The initial response from Amazonians was a bit confused (why not just mount the filesystem read-only) but when I explained that my concern was about defending against attackers who had local kernel exploits, they understood the use case. I was very excited when EC2 Instance Attestation launched 18 years later.

I ended 2007 with a blog post which I was told was quite widely read within Amazon: Amazon, Web Services, and Sesame Street. In that post, I complained about the problem of Eventual Consistency and argued for a marginally stronger model: Eventually Known Consistency, which still takes the "A" route out of the CAP theorem, but exposes enough internal state that users can also get "C" in the happy path. Amazon S3 eventually flipped from being optimized for Availability to being optimized for Consistency (while still having extremely high Availability), and of course DynamoDB is famous for giving users the choice between Eventual or Strongly consistent reads; but I still think the model of Eventually Known Consistency is the better theoretical model even if it is harder for users to reason about.

In early 2008, Kip Macy got FreeBSD working on Xen with PAE — while FreeBSD was one of the first operating systems to run on Xen, it didn't support PAE and I was at the time not competent to write such low-level kernel code, so despite being the driving force behind FreeBSD/EC2 efforts I had to rely on more experienced developers to write the kernel code at the time. I was perfectly comfortable with userland code though — so when Amazon sent me internal "AMI tools" code (necessary for using non-public APIs), I spent a couple weeks porting it to run on FreeBSD. Protip: While I'm generally a tools-not-policy guy, if you find yourself writing Ruby scripts which construct and run bash scripts, you might want to reconsider your choice of languages.

Unfortunately even once I got FreeBSD packaged up into an AKI (Amazon Kernel Image) and AMI (Amazon Machine Image) it wouldn't boot in EC2; after exchanging dozens of emails with Cape Town, we determined that this was due to EC2 using Xen 3.0, which had a bug preventing it from supporting recursive page tables — a cute optimization that FreeBSD's VM code used. The problem was fixed in Xen 3.1, but Xen didn't have stable ABIs at that point, so upgrading EC2 to run on Xen 3.1 would have broken existing AMIs; while it was unfortunate for FreeBSD, Amazon made the obvious choice here by sticking with Xen 3.0 in order to support existing customers.

... continue reading