About

These are my notes on Roberto Vitillo’s Understanding Distributed Systems. The book’s website can be found here.

Introduction

A distributed system is one in which the failure of a computer you didn’t even know exited can render your own computer unusable.

— Leslie Lamport

  • Motivations for building distributed systems include:
    • High availability: resilience to single-node failures
    • Large workloads that are too big to fit on a single node
    • Performance requirements (e.g. high resolution & low latency for video streaming)

Table of Contents

Part I: Communication

  • We can derive the maximum theoretical bandwidth of a network link by dividing the size of the congestion window by the round trip time:
\[ Bandwidth = WinSize / RTT \]
  • A secure communication link must make 3 guarantees:
    1. Encryption: asymmetric encryption and symmetric encryption (via TLS) are used to ensure that data can only be read by the communicating processes
    2. Authentication: the server and client should each authenticate that the other is who they claim to be, via certificates issued by certificate authorities (CAs)
    3. Integrity: TLS verifies the integrity of the data by calculating a message digest using a secure hash function

Chapter 4: Discovery

  • The Domain Name System (DNS) is a distributed, hierarchical, and eventually consistent key-value store

Chapter 5: APIs

  • A textual format like JSON is self-describing and human-readable, at the expense of increased verbosity and parsing overhead