In order to succeed at production ownership, a team needs a roadmap for developing the necessary skills to run production systems. We don’t just need production ownership; we also need production excellence. Production excellence is a teachable set of skills that teams can use to adapt to changing circumstances with confidence. It requires changes to people, culture, and process rather than only tooling.
Even a perfect set of SLOs and instrumentation for observability do not necessarily result in a sustainable system. People are required to debug and run systems. Nobody is born knowing how to debug, so every engineer must learn that at some point. As systems and techniques evolve, everyone needs to continually update with new insights.
Standardizing technology is a powerful way to create leverage: improve tooling a bit and every engineer will get more productive. Adopting superior technology is, in the long run, an even more powerful force, with successes compounding over time. The tradeoffs and timing between standardizing on what works, exploring for superior technology, and supporting adoption of superior technology are at the core of engineering strategy.
An effective approach is to prioritize standardization, while explicitly pursuing a bounded number of explorations that are pre-validated to offer a minimum of an order of magnitude improvement over the current standard.
Ideas are funny things. It can take hours or days or months of noodling on a concept before you’re even able to start putting your thoughts into a shape that others will understand. And by then, you’ve explored the contours of the problem space enough that the end result of your noodling doesn’t seem interesting anymore: it seems obvious.
But as you get into more senior-type engineering roles, your most valuable contributions start to take the form not of concrete labor, but of conceptual labor. You’re able to draw on a rich mental library of abstractions, synthesizing and analyzing concepts in a way that only someone with your experience can do.
At The Economist, we take data visualisation seriously. Every week we publish around 40 charts across print, the website and our apps. With every single one, we try our best to visualise the numbers accurately and in a way that best supports the story. But sometimes we get it wrong. We can do better in future if we learn from our mistakes — and other people may be able to learn from them, too.
Once you’ve learned enough that there’s a certain distance between the current version of your product and the best version of that product you can imagine, then the right approach is not to replace your software with a new version, but to build something new next to it — without throwing away what you have.
This book surveys data storage and distributed systems and is a fantastic primer for all software developers.
It starts with naive approaches to storing data, quickly builds up to how transactions work, and works up to the complexities of building distributed systems.
I particularly enjoyed the chapter on stream processing and event sourcing. It contrasts stream processing to batch processing and highlights many of the challenges of these approaches and explores options for addressing them.
We should identify the error kernel. The error kernel of a system is that part which must be correct. That’s what the error kernel is. All the other code can be incorrect, it doesn’t matter. The error kernel is the part of the system that must be correct. If it’s incorrect, then all bets are off. The error kernel must be correct.
The rule of three gives a quick and dirty way to estimate these kinds of probabilities. It says that if you’ve tested N cases and haven’t found what you’re looking for, a reasonable estimate is that the probability is less than 3/N. So in our proofreading example, if you haven’t found any typos in 20 pages, you could estimate that the probability of a page having a typo is less than 15%.
I recently moved the hosting of my various blogs and websites off my own server to Netlify.
I was originally going to set up an S3 bucket and Cloudfront distribution for each of my sites but Netlify provides me the CDN and hosting features I need all bundled up already. You can upload files directly for serving or hook your site up to run a static site generator when you push to a branch of a Github repository.
In short, I’m not longer paying hosting costs and they handle all of the SSL certificate renewal from Let’s Encrypt for me.
Next up I plan to clean up the tooling I use for some of my sites and tweak things on here so I have more variety in my posts.