Slide 1

Slide 1 text

The (beauty of) math behind distributed systems Verónica López @maria_fbonacci SCALECONF COLOMBIA 2018

Slide 2

Slide 2 text

whoami • Software Engineer @ Red Hat • Formerly: physics •Academia + Industry • Trying to make computers and science converge

Slide 3

Slide 3 text

Agenda • Distributed Systems • Graph Theory • Topology

Slide 4

Slide 4 text

People ask benefts of math What if I’m not good at math?

Slide 5

Slide 5 text

Math is very useful for programming and systems design, but that might not mean what you think it means.

Slide 6

Slide 6 text

Math in CS • Basic: e.g.arithmetic, geometry (frontend), logic • Specifc: e.g. linear algebra (machine learning, big data), calculus (simulations) • General: e.g. graph theory, topology

Slide 7

Slide 7 text

Math in CS • Basic: e.g.arithmetic, geometry (frontend), logic • Specifc: e.g. linear algebra (machine learning, big data), calculus (simulations) • General: e.g. graph theory, topology

Slide 8

Slide 8 text

All these concepts have connectivity in common

Slide 9

Slide 9 text

Agenda • Distributed Systems • Graph Theory • Topology

Slide 10

Slide 10 text

Famous -and overused- quote about distsys…

Slide 11

Slide 11 text

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable” Leslie Lamport

Slide 12

Slide 12 text

Ideal Distributed Systems • Fault-Tolerant • Highly available • Recoverable • Consistent • Scalable • Predictable Performance • Secure • Etc…

Slide 13

Slide 13 text

Design for Failure We can’t have it all (at the same time), so…

Slide 14

Slide 14 text

Design for Failure Probability Connectivity

Slide 15

Slide 15 text

Probability • If the probability of something happening is one in 10^13, how often would it happen? • “Real life”: Never • Physics: All the time. • Think about servers (infrastructure) at scale.

Slide 16

Slide 16 text

Connectivity

Slide 17

Slide 17 text

Agenda • Distributed Systems • Graph Theory • Topology

Slide 18

Slide 18 text

Graph Theory • The study of graphs: the mathematical structures used to model pairwise relations between objects. • Two concepts: nodes (vertices) & lines (edges) • Might be directed or indirected

Slide 19

Slide 19 text

The paper on the Seven Bridges of Königsberg (1736, Leonard Euler) is considered the frst paper in history of graph theory.

Slide 20

Slide 20 text

Graph Theory in Distsys • Design system’s connectivity • k-connectedness: how many nodes we need for a graph to disconnect a graph • Verify points of failure • Rearrange stuf

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Agenda • Distributed Systems • Graph Theory • Topology

Slide 25

Slide 25 text

Topology • The study of geometric properties and spatial relations unafected by the continuous change of shape or size of fgures (formal) • Interrelation (informal)

Slide 26

Slide 26 text

Topological Properties • Properties that remain invariant under continuous stretching and bending of the object • E.g. Path connectivity, higher dimensional analogs

Slide 27

Slide 27 text

The paper on the Seven Bridges of Königsberg (1736, Leonard Euler) is considered the frst paper in history of topology (too!).

Slide 28

Slide 28 text

Topologically identical objects

Slide 29

Slide 29 text

Pyramids —> Division (triangulation) —> Sphere

Slide 30

Slide 30 text

Combinatorial (Algebraic) Topology • Studies spaces that can be constructed with discretized pieces • Allows to have all the (system) perspectives (of a node) available at the same time. • Perspectives evolve with communication

Slide 31

Slide 31 text

Agenda • Distributed Systems • Graph Theory • Topology

Slide 32

Slide 32 text

Formal Verifcation of a Distributed System • Talk & Articles: check the ACM List of methods that prove properties about a system. • Very hard to get! (Expensive and/or slow) • Only a few: AWS, Verdi, etc. •

Slide 33

Slide 33 text

Verifcation of a Distributed System • Formal Specifcation Languages (defnition & correctness) <- AWS • Model Checking (execute all paths) • Composition (more than 1 method)

Slide 34

Slide 34 text

Even after being formally verifed many systems still have bugs! Although not as much as unverifed ones Also: diferent type See: An Empirical Study on the Correctness of Formally Verifed Distributed Systems (Fonseca, et al.)

Slide 35

Slide 35 text

If it’s so hard, then why are there so many successful Distributed Systems, Verónica?!

Slide 36

Slide 36 text

“Verifcation” IRL • Monitoring & Observability tools: • On-Call Engineers & paging tools • Chaos Engineering: breaking things on purpose before they break for real Prometheus, New Relic, etc Modern testing approaches: thorough e2e, production •

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Combinatorial Topology could be used to (help) formally verify a distributed system (through algebraic expressions)

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Representing our distributed systems as topological objects allows us to visibly describe them and study its connections by sections or “worlds”

Slide 41

Slide 41 text

Combinatorial Topology • Visibility of every partition • Placing together all the views = • Frozen representation of all possible Interleavings and failure scenarios

Slide 42

Slide 42 text

Combinatorial Topology • Invariants: connections preserved as computation unfolds. • Topological objects are subject to proofs. • Representing systems as theorems and proving them = verifed systems

Slide 43

Slide 43 text

There’s a book! Distributed Computing Through Combinatorial Topology)

Slide 44

Slide 44 text

Extra resources • Algebraic Topology and Distributed Computing, A Primer http://cs.brown.edu/~mph/HerlihyR96/sv.pdf • The Topology of Distributed Adversaries https:// link.springer.com/article/10.1007/s00446-013-0189-9 • The Topological Structure of Asynchronous Computability http://cs.brown.edu/~mph/HerlihyS99/p858-herlihy.pdf • The Verifcation of a Distributed System http:// queue.acm.org/detail.cfm?id=2889274