Find Related products on Amazon

Shop on Amazon

Teaching a new way to prevent outages at Google

Published on: 2025-06-08 02:36:16

Teaching a new way to prevent outages at Google By Garrett Holthaus, Technical Writer From a young age, I enjoyed the detective work of diagnosing and fixing a broken system–electronics, in my case. There was something fulfilling about taking a silent radio and getting it playing again, sometimes with only a few dollars' worth of replacement parts. So, it wasn't a stretch to shift from post-failure analysis to pre-failure analysis in my first job out of college, as a microprocessor validation engineer. I ran tests on a simulator to find hardware bugs before the chip went into production and the cost to fix problems increased exponentially. I'll always remember the senior engineer who told me to put on my "evil" hat and try to break the chip by throwing the unexpected at it. But how do you come up with the unexpected? Better still, how do you know where to even start looking for possible issues? Now I'm at Google, where the system complexity is even greater, and Site Reliability Engin ... Read full article.