The most expensive bugs are often design flaws, especially in the complex concurrent and distributed systems we build today.
This presentation introduces TLA+, a formal specification language created by Leslie Lamport, designed to find these flaws before a single line of code is written. Learn how major services at Amazon S3, Azure Cosmos DB, and Apache Kafka have used formal methods to validate their protocols, prevent outages, and save months of debugging.
The talk delves into the fundamentals of temporal logic, explaining how TLA+ models systems as state machines with actions defining state transitions. Through simple examples like traffic lights and a mutex, you will understand key concepts such as primed variables, stuttering steps, and the crucial difference between safety ("something bad never happens") and liveness ("something good eventually happens") properties. We'll explore how the TLC model checker automatically explores all possible system states to verify these properties and find bugs that traditional testing often misses.
Beyond the theory, this session provides practical guidance on when to use TLA+ (critical systems, early design phase) and how to integrate it into your development workflow, from specification writing to running checks in a CI pipeline. This third edition of the talk includes new content on the latest "Clarke" release of TLA+ tools, notes on the current capabilities of GenAI for writing specifications, and hard-won lessons from the speaker's personal experience.