after 9 days, 6 Servers, 50 Drawings per second • Week 6 – 11M users, 90 Servers, 3K Drawings per second, 2B total drawings – Sold to Zynga for an estimated $210 million US dollars • Guesstimate: 42 TB of drawing data after 6 weeks or ~580GB per server Assumption: 10KB average size of drawing data Big Mobile Data – Big as in (ELASTIC?) SCALE Source: http://www.mactalk.com.au/content/how-social-gaming-app-draw-something-blew-up-infographic-2245/ 1ms
scale? “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” Source: Edd Dumbill, Forbes - http://onforb.es/A6YE5o
round trip time rules out High Frequency Trading applications. Not on the critical path! Source: Me, former life @StreamBase • http://slidesha.re/guZOVe
million Black Scholes Merton put and call option prices • GPU – calc 1M put & 1M call prices in a single batch, on a single host thread • CPU – calc 1 put & 1 call in a single step on a single thread • GPU wins w.r.t. Throughput hands down. CPU wins w.r.t. Latency hands down.
is a new technology for extracting information from distributed message-based systems” Dr. David Luckham & Brian Frasca Program Analysis and Verification Group, Computer Systems Lab Stanford University August 18, 1998 Source: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.876 4 aspects: 1. A DSL 2. Continuous Query 3. Windows 4. Combinators
B C D Data Sources Publisher Services Edge Publishers Endpoints /Services/VideoOnDemand /Services/LiveBroadcasts /Services/NewsPress /Services/NewsSocial /Services/MarketData /Services/MarketAnalytics /Services/AnalystReports /Services/AnalystReports
Pool Security :- Authorisation & Permissions Client Management Topic Management Message Management State Management Publishers: System and user defined System Monitoring Web Service User Services Media Service QoS SLA Telemetry Compression Throttling Publish/Subscribe Topic Aliasing Hierarchic Topic Space Snapshot Delta + - with: bang for the byte without: blah blah bloat optional essential Runtime Facilities Features
FATty Skinny Big Data • FAT data – Why not just send a patch? • Slim data – Diffusion is data agnostic – JSON – Binary – Structured • Records & Fields – Unstructured – Video/Audio on Demand – Live Video/Audio
Schemaless, KV – Integrity – MVCC based, ACID – Memcache (caching) • Asynchronous persistence • Working Set > Cache? Yes • Tx? Not in the Encina XA sense. • Clustering? Based on TAP protocol • TAP can be (ab)used to stream CUD events
(BBO) x30 x30 x1 Tier xN App Dist www mobi int Portal Dist Internal Dist AJAX HTTP iOS/Android Native Java C++ .NET Native Store & Forward MQ FX Provider EURUSD FX BBO EURUSD FX Tiers Tier 1 hub hub Dist FX EURUSD hub MQ store store store fwd fwd fwd
(BBO) x30 x30 x1 Tier xN App Dist www mobi int Portal Dist Internal Dist AJAX HTTP iOS/Android Native Java C++ .NET Native FX Provider EURUSD FX BBO EURUSD FX Tiers Tier 1 hop hop Dist FX EURUSD hop KV Store / Data Grid + Continuous Query. Flat ‘namespace’
Data smarts is all nuance/tradeoffs Use RESTful for Resources Stop RPCing Streams Observe (Monitor) Orient (Measure) Decide (Just Do It) Act (+Telemetry) Smarts is about considered (nuance, tradeoff, advantage, disadvantage) of Bigness, Structure, Mobility …