Data Storage affects the way systems are modelled. This slide set shall give a short justification why the data storage and -processing layer needs an overhaul to overcome its current limitations.
of immutable & absolute entities • have a fixed beginning and ending • music sheet = music essence; “Music’s NoSQL DB” •Music • Making is a process. You can record but not copy music. • flows with the rhythm • lives by the interactions • is uniquely determined in space and time: the Music’s context
of immutable & absolute entities • have a fixed beginning and ending • music sheet = music essence; “Music’s NoSQL DB” •Data Stores • store facts. • facts are fix and absolute • facts are uniquely determined by key / ID • Data Stores are the source of “truth” • contain what has happened.
a Data Store gets de-contextualized. • You don’t get to know the origin of data but just the fact itself. • irrecoverable information loss! • There is a severe social impedance mismatch
a Data Store gets de-contextualized. • You don’t get to know the origin of data but just the fact itself. • irrecoverable information loss! • There is a severe social impedance mismatch
(Nimbus) • Workers run distributed & are supervised • Online State is persisted into Zookeeper • Every component may fail Nimbus ZK ZK ZK Worker Worker
in-memory using Bolts • Continuously persist state into stable storage • Towards real-time context to every request Spout Consolidated Event-Stream User User User Recom- mender Trending Stuff / global stats Anti- Spam
Storm is petri dish for real-time computation and coordination tasks • Topology changes: stop-start-cycle required • There is no Pig Latin / Hive for Storm • Advanced Topics are added with every release (e.g. Transactional Semantics)
data store won’t help you. • You have to add some magic to your stack. • Storm has the potential to become the Next Big Thing after Hadoop • Use Storm to fix the Social Impedance Mismatch Issue
Queries Data Focus Dataset Size Domain Pull Push Run Once Run Continuously Historic Live Retrieval & Storage Format Efficiency Throughput & Latency 10^9 10^6 Volume Velocity