Slide 1

Slide 1 text

Deterministic Parallel and Distributed Programming with Clojure Quick Intro Alexey Kachayev, 2014

Slide 2

Slide 2 text

About me • CTO at Attendify.com • Clojure, Erlang, Go, Haskell • Fn.py library author • CPython & Storm contributor

Slide 3

Slide 3 text

Find me •@kachayev •github.com/kachayev •kachayev <$> gmail.com

Slide 4

Slide 4 text

Topic

Slide 5

Slide 5 text

Will talk •Parallel & Distributed •Determinism: why & when •Models and approaches

Slide 6

Slide 6 text

Clojure & Concurrency

Slide 7

Slide 7 text

Atom

Slide 8

Slide 8 text

Agent

Slide 9

Slide 9 text

STM

Slide 10

Slide 10 text

core.async

Slide 11

Slide 11 text

Deterministic

Slide 12

Slide 12 text

Non-Deterministic

Slide 13

Slide 13 text

Why Determinism? • easy to reason about • easy to maintain • less bugs • less bugs that you can’t reproduce on your machine • less data losses (no data losses?) • provable correctness

Slide 14

Slide 14 text

Why Determinism? • you should know why determinism is good if you listen to Clojure conference talks • we will talk about "ordering non-determinism" only (there’re many other reasons however)

Slide 15

Slide 15 text

Parallel & Distributed

Slide 16

Slide 16 text

Parallel • > 1 independent workers (actors?) • loose coordination • great opportunities • … at a high price

Slide 17

Slide 17 text

Distribution • More parallelism (!) • For lower latency • For storage replication • For HA • And more … but more non-determinism factors • Unpredictable latencies • "Random" workers crash

Slide 18

Slide 18 text

Deterministic distributed program? lets think…

Slide 19

Slide 19 text

Use consensus, Luke!

Slide 20

Slide 20 text

Consensus • Non-distributed: locks, semaphores etc • Distributed: 2p-commit, paxos, zab, raft • … but this price is too high in many cases! • … you’re trading availability in most cases

Slide 21

Slide 21 text

Is there any other way?

Slide 22

Slide 22 text

Monotonicity

Slide 23

Slide 23 text

Theory • "Logic and Lattices for Distributed Programming" http://goo.gl/5q7CJF • "CRDTs: Consistency without concurrency control" http://goo.gl/Ouu4sc • "A comprehensive study of Convergent and Commutative Replicated Data Types" http://goo.gl/I1alMi • "A Lattice-Theoretical Approach to Deterministic Parallelism with Shared State" http://goo.gl/cdv1UK

Slide 24

Slide 24 text

Theory • Monotonic logic • Bounded Join Semi-Lattices

Slide 25

Slide 25 text

Practice • mobile application • chat message stream • k-ordered message IDs • "latest viewed message" mark • offline-mode support

Slide 26

Slide 26 text

Practice

Slide 27

Slide 27 text

Practice

Slide 28

Slide 28 text

BTW…. we don’t know global events ordering in practice :(

Slide 29

Slide 29 text

Monotonicity!

Slide 30

Slide 30 text

LVar • Haskell library lvish • Monotonically growing, lattice-based data structures • determinism VS. quasi-determinism • threshold reads • freezing variables

Slide 31

Slide 31 text

LVar

Slide 32

Slide 32 text

CRDT(s)

Slide 33

Slide 33 text

CRDT • Conflict-Free Replicated Data Type • Convergent Replicated Data Type • Commutative Replicated Data Type

Slide 34

Slide 34 text

CRDT: The Idea

Slide 35

Slide 35 text

CRDT • Counters (G-Counter, PN-Counter) • Registers (LWW-Register, MV-Register) • Sets (G-Set, 2P-Set, PN-Set, OR-Set) • Graphs

Slide 36

Slide 36 text

Knockbox

Slide 37

Slide 37 text

Dataflow programming

Slide 38

Slide 38 text

Bloom • disorderly programming • state represented with lattices (few built-in) and collections (table, scratch, channel) • runtime implementation as Ruby DSL • static analysis tools (points of order) • visualisation tools

Slide 39

Slide 39 text

Bloom

Slide 40

Slide 40 text

Derflow • Deterministic dataflow programming • Growing set of single-assignment variables • Operations: declare, bind, read/wait • Streams: produce/consume • Erlang implementation: http://goo.gl/lnnfVd

Slide 41

Slide 41 text

Summary

Slide 42

Slide 42 text

Why Clojure? • strong concurrency primitives (atom) • immutable data types • CRDT library "knockbox" (dead?) • not that much done for distributed computing (riak_core in Erlang) • one can use Akka/Pulsar • aphyr/jepsen for testing partition tolerance • a big room for experiments

Slide 43

Slide 43 text

Links • "Eventually Consistent Data Structures" http://goo.gl/HgtIzY • "Knockbox, an Eventual Consistency Toolkit" http://goo.gl/r5XxRH • "LVars: lattice-based data structures for deterministic parallelism" http:// goo.gl/IJljEQ • "MVar, IVar, and LVar programs in Haskell" http://goo.gl/dF36k4 • "Distributed deterministic dataflow programming for Erlang" http://goo.gl/ y2oH0P • "Sync Free" http://goo.gl/qXZHnb

Slide 44

Slide 44 text

Learn Clojure For Great Good

Slide 45

Slide 45 text

Learn Haskell For Great Good

Slide 46

Slide 46 text

Q/A thanks for your attention,