Back in November last year, I started a new job at Intercom, and one of the first projects I got to work on was improving the Intercom monolith CI with some of my new colleagues.
Interestingly, I never got around to talking about CI on this blog, even though I consider it to be one of my main areas of expertise. That topic is way beyond the subject I’d like to talk about here, but just to give a bit of context, a key driver in CI performance and user experience is how fast you can get a Ruby process ready to run tests.
When working with very large test suites, it becomes essential to run tests in parallel. If you have a test suite that runs in say, 1 hour, on paper, you can run it in 15 minutes on 4 workers, or in 6 minutes on 10 workers, and 1 minute on 60 workers.
But that’s a bit too simplistic, in practice, a CI test runner has two phases.
First, a setup phase that all runners have to go through, which includes fetching the source code, getting backing services like the database ready, and booting the application. Once the setup phase is done, the workers can start doing the actually useful work of running tests.
So using the same 1-hour test suite, but now with a 1-minute setup phase, will now take 16 minutes if you are using 4 workers but 2 minutes if you are using 60 parallel workers. That’s a much worse user experience, but also means half of your compute isn’t spent doing the actual work, likely increasing your costs.
All this to say that parallelizing test suites has diminishing returns that are entirely tied to how costly setting up a worker is. The worker setup time is like a fixed cost toll, hence reducing it both improves user experience and reduces cost.
Given that the Intercom monolith CI runs with 1350 parallel workers by default, one second is optimized out of the setup time has 1350 times more impact than a second optimized out of a particular test, and saves over 20 minutes of compute per build.
Hence, while the team also worked on speeding up various slow tests and factories, I personally was very focused on reducing the setup time, shaving every second or even split seconds I could find.
As part of this effort, I looked into speeding up the application boot time, and if you’re a Rubyist, you probably know about Bootsnap.
... continue reading