- Ideas that stood the test of times
- Ideas that were not feasible but now possible thanks to hardware improvement.
So, what's your recommendations for books and papers on these topics?
## Blogs:
- http://muratbuffalo.blogspot.com/
- https://bartoszsypytkowski.com/
- https://decentralizedthoughts.github.io/
- https://www.the-paper-trail.org/
- https://pathelland.substack.com/
## Other web resources
- https://aws.amazon.com/builders-library/ - set of resources from Amazon about building distributed systems
- https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H... - lecture series from Cambridge
## Books
- https://www.cl.cam.ac.uk/teaching/1213/PrincComm/mfcn.pdf - A great book on the maths of networking (probability, queuing theory etc...)
My former manager recommended it to me when I first started working in distributed systems and I found that it unlocked a huge variety of topics despite its simplicity. (Thanks Steve!)
https://www.microsoft.com/en-us/research/publication/time-cl...
It's a great book that goes into pretty much all of the commonly used strategies to scaling data-intensive applications. It's not incredibly deep on any of them but it will allow you to get a great overview of the entire space. For each component, there's usually references to places where you can read and study more about them.
Making reliable distributed systems in the presence of software errors
His paper, "Time, Clocks, and the Ordering of Events in a Distributed System" is still considered a serious read after 40 years.
https://www.youtube.com/playlist?list=PLNPUF5QyWU8O0Wd8QDh9K...
- Concise
- Approachable
- Entertaining
- Insightful
- Timeless
http://homepage.divms.uiowa.edu/~ghosh/ssDijkstra.pdf
Enjoy :)
Communicating Sequential Processes "CSP" by Tony Hoare[0] has a strong influence on Go and Clojure. He also published/contributed to other interesting and influential books and papers.
Making reliable distributed systems in the presence of software errors by Joe Armstrong[1] (Erlang, BEAM). An implementation of the actor model and functional programming to optimize for reliability.
Conflict-free Replicated Data Types by Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirsk, "CRDTs" [2]. Enable strong eventual consistency, which is typically useful (and implemented) for databases, p2p (chat) applications and other distributed systems.
[0] https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf
[1] https://www.cs.otago.ac.nz/coursework/cosc461/armstrong_thes...
[2] https://hal.inria.fr/hal-00932836/file/CRDTs_SSS-2011.pdf
Introduction to Reliable and Secure Distributed Programming (https://www.amazon.de/-/en/Christian-Cachin/dp/3642152597).
I took a class with Luis Rodrigues (one of the authors), the book introduces the fundamentals of distributed systems. For example, you would build leader election from first principles.
This was the first article that really made it all click for me
Distributed systems fans of HN, why are you reading about distributed systems?
Erlang isn't theoretical. It's practical engineering. It works because message passing is what distributed systems have to do and at scale portions of a distributed system will become unavailable.
There are very specific problems that require more detailed engineering like Lamport Clocks and Raft Consensus Protocol. But not the general case. The general case is "being good enough" as is the nature of engineering.
In Search of an Understandable Consensus Algorithm
https://raft.github.io/raft.pdf
I think raft has stood the test of time so far. A very popular implementation of raft is etcd, which is used as Kubernetes' backing store for all cluster data.[0]
[0] https://kubernetes.io/docs/concepts/overview/components/#etc...
Paul Baran's research on Distributed Communications that led to the Internet:
"Paul Baran and the Origins of the Internet" - https://www.rand.org/about/history/baran.html
"On Distributed Communications" - 1964 https://www.rand.org/pubs/research_memoranda/RM3767.html
imho the most useful book you can read.
[0] https://www.youtube.com/watch?v=cQP8WApzIQQ&list=PLrw6a1wE39...
Distributed Operating Systems Distributed Systems: Principles and Paradigms
https://vadosware.io/post/paxosmon-gotta-concensus-them-all/
https://riak.com/category/technical/ - Riak blog
https://www.allthingsdistributed.com/files/amazon-dynamo-sos... - Dynamo paper that Riak was in part based on
https://static.googleusercontent.com/media/research.google.c...