users to become the publishers of their own daily newspapers.” Canonical use case: daily newspaper on an interest Feeds on a wide variety of “social media” sources @pyr Lead architect at Smallrivers, I like to build big stuff Long time involvement in distributed systems, scalability Recent FP and big data convert
Indices Articles found in sources, ordered by time of appearance Logs and events Events happening on a source Analytics View hits, contributor appearances
values use a serializer UTF-8 String UUID (TimeUUID) Long Composite BYO Keyspaces The equivalent of databases Column Families The equivalent of tables no fixed amount of columns in rows (wide rows) column metadata can exist
Property ONE Response from the closest replica QUORUM (Replication Factor / 2 ) + 1 replicas must agree ALL All replicas must agree Write consistency Property ANY The write reached one node ONE The write must have been succesfully performed on at least one of the replicas QUORUM (Replication Factor / 2 ) + 1 replicas must have must have succesfully performed the write ALL All replicas must have performed the write
the data you are handling Storing entities is sensitive Ensure a high consistency level, e.g: read and write at QUORUM Regular CF snapshots Storing events can sustain consistency mishaps writing at ONE should be sufficient
(peak) On average 200M per day associated social counters updated for analytics associated log event storage for scheduler input more than 3000 articles computed per second (peak) 600k paper editions per day each pulling from wide rows to filter, rank and output an edition