About

These are my notes on Roberto Vitillo’s Understanding Distributed Systems. The book’s website can be found here.

Introduction

A distributed system is one in which the failure of a computer you didn’t even know exited can render your own computer unusable.

— Leslie Lamport

Motivations for building distributed systems include:
- High availability: resilience to single-node failures
- Large workloads that are too big to fit on a single node
- Performance requirements (e.g. high resolution & low latency for video streaming)

We can derive the maximum theoretical bandwidth of a network link by dividing the size of the congestion window by the round trip time:

\[ Bandwidth = WinSize / RTT \]

A secure communication link must make 3 guarantees:
1. Encryption: asymmetric encryption and symmetric encryption (via TLS) are used to ensure that data can only be read by the communicating processes
2. Authentication: the server and client should each authenticate that the other is who they claim to be, via certificates issued by certificate authorities (CAs)
3. Integrity: TLS verifies the integrity of the data by calculating a message digest using a secure hash function

The Domain Name System (DNS) is a distributed, hierarchical, and eventually consistent key-value store

A textual format like JSON is self-describing and human-readable, at the expense of increased verbosity and parsing overhead