Literature Insights

Posted on Jan 23, 2023
tl;dr: One-sentence summaries and takeaways from papers, essays, and talks by people much smarter than me.

Updated Jan 31, 2023.

This is an “evergreen” page that I’m backfilling from reading notes and keeping updated moving forward. I distill and compress the #1 thing I learned or took away from various pieces of literature (except books, but feel free to check out my reading pipeline). It’s far from perfect: run-on sentences galore to fit into the arbitrary one-sentence restriction. Nonetheless, I hope this piques your interest and encourages you to read the source material. Items that are italicized are some of my favorites.

There are no affiliate links here… but there may be broken ones. Let me know!


Articles and Essays



Things (mostly papers) in my queue.

  • A comprehensive study of Convergent and Commutative Replicated Data Types (Shapiro 2011)
  • A few billion lines of code later: using static analysis to find bugs in the real world (Bessey 2010)
  • Availability in Globally Distributed Storage Systems (Ford 2010)
  • Basic Local Alignment Search Tool (Altschul 1990)
  • BeyondCorp: Design to Deployment at Google (Osborn 2016)
  • Big Ball of Mud (Foote 1999)
  • Bigtable: A Distributed Storage System for Structured Data (Chang 2006)
  • Borg, Omega, and Kubernetes (Burns 2016)
  • C-Store: A Column-oriented DBMS (Stonebraker 2005)
  • Canopy: An End-to-End Performance Tracing And Analysis System (Kaldor 2017)
  • CRDTs: Consistency without concurrency control (Letia 2009)
  • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (Sigelman 2010)
  • Design patterns for container-based distributed system (Burns 2016)
  • Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks (Isard 2007)
  • Dynamo: Amazon’s Highly Available Key-value Store (DeCandia 2007)
  • Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers (Ren 2010)
  • Hints for Computer System Design (Lampson 1983)
  • In Search of an Understandable Consensus Algorithm (Ongaro 2014)
  • Large-Scale Automated Refactoring Using ClangMR (Wright 2013)
  • Large-scale cluster management at Google with Borg (Verma 2015)
  • Life beyond Distributed Transactions: an Apostate’s Opinion (Helland 2007)
  • Logic and Lattices for Distributed Programming (Conway 2012)
  • MapReduce: Simplified Data Processing on Large Clusters (Dean 2008)
  • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center (Hindman 2011)
  • Omega: flexible, scalable schedulers for large compute clusters (Schwarzkopf 2013)
  • On Designing and Deploying (Hamilton 2007)
  • Online, Asynchronous Schema Change in F1 (Rae 2013)
  • Out of the Tar Pit (Moseley 2006)
  • Paxos Made Live - An Engineering Perspective (Chandra 2007)
  • Paxos Made Simple (Lamport 2001)
  • Profiling a warehouse-scale computer (Kanev 2015)
  • Rules of Machine Learning: Best Practices for ML Engineering (Zinkevich)
  • Searching for Build Debt: Experiences Managing Technical Debt at Google (Morgenthaler 2012)
  • Security Keys: Practical Cryptographic Second Factors for the Modern Web (Lang 2016)
  • Source Code Rejuvenation is not Refactoring (Pirkelbauer 2009)
  • Spanner: Google’s Globally-Distributed Database (Corbett 2012)
  • Still All on One Server: Perforce at Scale (Bloch 2011)
  • SWIM: Scalable Weakly-consistent Infection-style Process Group Membership (Das 2002)
  • The Chubby lock service for loosely-coupled distributed systems (Burrows 2006)
  • The Google File System (Ghemawat 2003)
  • The UNIX Time-Sharing System (Ritchie 1974)
  • Towards a Solution to the Red Wedding Problem (Meiklejohn 2018)
  • Unreliable Failure Detectors for Reliable Distributed Systems (Chandra 1996)
  • Wormhole: Reliable Pub-Sub to Support Geo-replicated Internet Services (Sharma 2015)
  • Zab: High-performance broadcast for primary-backup systems (Junqueira 2011)


Sources where I pick up things to consume.