What do unicorns, flying pigs, and bug-free software have in common? It's simple: they're all mythic fantasies.
What's amazing is that we all know this, and yet almost no one takes this into account when evaluating or building systems. What will the effect of a bug being deployed to production be? How well will your system be able to recover, and if so how long will it take?
In the beginning of this talk, a seemingly straightforward problem will be presented. However, as will become readily apparent, the solution to this problem is not so easy – at least, not when you shackle yourself to traditional techniques to architecting systems. The traditional techniques we use – most readily exemplified by relational databases – lead to fundamental, unavoidable complexities. These complexities make bugs much more likely, and also make systems incredibly brittle to human error. And contrary to popular belief, the NoSQL movement has largely been more of the same and has not addressed any of the real complexities.
To make software human fault-tolerant you have to take a simpler approach, to both make bugs less likely and create a clear path to recovery when human error does occur. You'll see that the primary cause of complexity is trying to make one system do too many things. The only solution is to introduce fundamental simplicity into the architecture – that is, to make separate functions that were previously inexorably intertwined. By making these functions independent, you avoid the inevitable conflicts which make solutions difficult or impossible to build. And you will find that oftentimes the solution with more moving parts – the more complicated solution – is simpler, more robust, more performant, easier to maintain, and more bug-free*.
*But never completely bug-free