Jay Kreps - Getting Real About Distributed System Reliability (Highlights)

# Jay Kreps - Getting Real About Distributed System Reliability (Highlights) ![rw-book-cover|256](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) ## Metadata **Review**:: [readwise.io](https://readwise.io/bookreview/42853846) **Source**:: #from/readwise #from/reader **Zettel**:: #zettel/fleeting **Status**:: #x **Authors**:: [[Jay Kreps]] **Full Title**:: Getting Real About Distributed System Reliability **Category**:: #articles #readwise/articles **Category Icon**:: 📰 **URL**:: [blog.empathybox.com](https://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability) **Host**:: [[blog.empathybox.com]] **Highlighted**:: [[2024-08-03]] **Created**:: [[2024-08-03]] ## Highlights - the problem is the assumption that failures are [independent](https://href.li/?http://en.wikipedia.org/wiki/Independence_%28probability_theory%29) ([View Highlight](https://read.readwise.io/read/01j4aw4pf3syyehc6qqt941x4b)) ^753645723 - *PN* is an upper bound on reliability but one that you could never, never approach in practice. ([View Highlight](https://read.readwise.io/read/01j4aw5mz6t73pw158bb36pqry)) ^753645797 - The actual reliability of your system depends largely on how bug free it is, how good you are at monitoring it, and how well you have protected against the myriad issues and problems it has. This isn’t any different from traditional systems, except that the new software is far less mature. ([View Highlight](https://read.readwise.io/read/01j4aw6a321qwnj68d5wngzmyw)) ^753645811 - These kind of “semi-failures” are common and very hard to deal with. Correctly testing these kinds of issues in a realistic setting is brutally hard and the newer generation of software doesn’t have anything like the QA processes its more mature predecessors had. ([View Highlight](https://read.readwise.io/read/01j4aw8wwkb5qj37fb6kjf5d1h)) ^753646072