Slide 1

Slide 1 text

Building massively distributed systems with OSS Mateusz ‘Serafin’ Gajewski allegro.tech meeting v8 2015

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Web Scale source: reactivemanifest.org

Slide 4

Slide 4 text

Distributed top-down architecture, computing, messaging, databases (No/New SQL), data processing, file systems, resource management, infrastructure.

Slide 5

Slide 5 text

Distributed toolbox dynamic flow control, rate limiting, exponential back-offs, automatic failover, hinted-handoffs, data scrubbing, CRDTs, backpressure, circuit breakers, bulk heads vector clocks, two-phase commit, consensus algorithms, gossip protocols, leader election, distributed coordination, eventual consistency, data replication, OCC, MVCC...

Slide 6

Slide 6 text

Thesis: Building distributed & correct systems is very hard. Proof through: Jepsen :)

Slide 7

Slide 7 text

Thesis: Most of our problems/needs can be addressed using existing Open Source Software. Proof through: a lot of companies i.e. Allegro ;)

Slide 8

Slide 8 text

Just four OSS examples with concepts behind them

Slide 9

Slide 9 text

Apache Cassandra · 2008

Slide 10

Slide 10 text

Architecture

Slide 11

Slide 11 text

SSTable

Slide 12

Slide 12 text

Read/write path

Slide 13

Slide 13 text

Will it scale?

Slide 14

Slide 14 text

Yes it will!

Slide 15

Slide 15 text

Apache Kafka · 2011

Slide 16

Slide 16 text

Architecture

Slide 17

Slide 17 text

Partition structure source: kafka.apache.org

Slide 18

Slide 18 text

Will it scale?

Slide 19

Slide 19 text

Apache Spark · 2009

Slide 20

Slide 20 text

Components source: spark.apache.org

Slide 21

Slide 21 text

RDD abstraction

Slide 22

Slide 22 text

Architecture source: spark.apache.org

Slide 23

Slide 23 text

Does it scale? source: databricks.com

Slide 24

Slide 24 text

Apache Mesos · 2009

Slide 25

Slide 25 text

Mesos architecture source: mesos.apache.org

Slide 26

Slide 26 text

Offers source: mesos.apache.org

Slide 27

Slide 27 text

Mesos ecosystem source: mesosphere.com

Slide 28

Slide 28 text

Does it scale?

Slide 29

Slide 29 text

All you need is... Scalable system = Cassandra as data storage + Spark as data processing engine + Mesos as resource scheduler + Kafka as core messaging.

Slide 30

Slide 30 text

Good news: we use it all!

Slide 31

Slide 31 text

but... OSS cons & pros for your consideration

Slide 32

Slide 32 text

OSS cons ● immature (not production-ready), ● bugs, ● poor or misleading documentation, ● learning curve, ● few or no experts on the market, ● slow adoption rate, ● dependencies on other OSS, ● (sometimes) lack of support

Slide 33

Slide 33 text

OSS pros ● “there is OSS for that” ;) ● licensing, ● sources, ● speeds up time-to-market ● helps recruiting

Slide 34

Slide 34 text

OSS tips ● stay up-to-date, ● don’t trust docs - deep dive instead, ● engage with community, ● remove OSS barriers - contribute back, ● release your software - share, ● grow experts in your company - educate, ● evaluate-hold-adopt cycle - experiment, ● know your hardware & OS - tune, ● be patient ;)

Slide 35

Slide 35 text

Q/A?

Slide 36

Slide 36 text

Thank you!

Slide 37

Slide 37 text

Key facts ● partitioned, nested, sorted map, ● AP system (with tunable C), ● masterless architecture (p2p) with gossip protocol, ● multi dc (a)synchronous replication, ● consistent hashing (with virtual nodes), ● support CQL (query language similar to SQL), ● modeled after Dynamo, BigTable.

Slide 38

Slide 38 text

Key facts ● general purpose, distributed data-processing engine, ● extends Map/Reduce & Dryad data flow programming models, ● fault tolerance via RDDs, ● supports iterative algorithms, map/reduce, stream processing, relational queries & hybrid models, ● partial DAG execution

Slide 39

Slide 39 text

Key facts ● distributed, fault tolerant resource scheduler, ● provides performance isolation, ● leader election with ZooKeeper, ● master maintains soft-state.

Slide 40

Slide 40 text

Key facts ● partitioned, immutable, linearizable append-only log, ● CA system (can lost data during partition), ● (a)synchronous replication (tunable), ● at-least-once delivery semantics, ● ZooKeeper for partition leader election, ● ISR (in-sync-replicas set) concept, ● relies heavily on OS caches.