Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microsoft Azure Meetup

Microsoft Azure Meetup

Intro to Cassandra and deploying DataStax on Azure

Joel Jacobson

December 02, 2015
Tweet

More Decks by Joel Jacobson

Other Decks in Programming

Transcript

  1. User Defined Functions JSON Commit Log compression Materialised views Version

    1.1 Mixed ssd/spinning disk Self tuning row cache Row level isolation Version 0.7/0.8/1.0 Cassandra Query Language Secondary Indexes Live Schema changes Compression Levelled Compaction Version 0.6 Top level Apache project. Integrated Caching Hadoop M/R 2015 2013 2011 2010 Developed at FaceBook. Used for inbox search feature 2008 history of cassandra
  2. geo replication multi datacenter highly available no spof n6 n7

    n8 n9 n10 n1 n2 n3 n4 n5 London New York
  3. more storage? horizontal scale compute power? n5 n1 n2 n3

    n4 n6 n7 n8 n9 n10 Linear scaling ops/s 00k 200k 400k 600k 800k n nodes 0 50 100 150
  4. consistency tUneable consistency vs latency n5 n1 n2 n3 n4

    Write request Consistency level = Quorum Replication Factor = 3 Consistency levels one all Quorum Local Quorum Each Quorum
  5. consistency tUneable consistency vs availability n5 n1 n2 n3 n4

    Read request Consistency level = Quorum Replication Factor = 3
  6. n6 n7 n8 n9 n10 n1 n2 n3 n4 n5

    London New York tUneable consistency consistency vs availability
  7. Tables reside cassandra Keyspace Replication Single / Multi DC CREATE

    KEYSPACE demo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'London': 3, 'New York': 3); n6 n7 n8 n9 n10 n1 n2 n3 n4 n5 London New York
  8. denomalized cassandra Table query driven data types CREATE TABLE monthly_transactions_by_customer

    ( cust_id text, date text, type text, time timestamp, cust_name text, location text, amount double, PRIMARY KEY ((cust_id, date), type, time) ) WITH CLUSTERING ORDER BY (type asc, time desc);
  9. playlists & personalisation • Leading streaming music provider with over

    40M+ active monthly users • Over 1 billion playlists created and managed in real time • More than 40,000 requests/second • 500+ nodes in 4 data centers
  10. profiles & recommendations • 95% of all Netflix data stored

    in DSE • Does 1 trillion transactions/day with DSE • Replaced Oracle in six data centers
  11. internet of things • Connected Homes, a new business unit,

    handles IoT-based customer systems • Provides remote control over thermostats and boilers via smart phones/tablets • Delivers analytics on energy usage to customers • Will be using predictive analysis to forecast things like boiler failures • DSE for transactional data consumption and real-time analytics
  12. cassandra in Banking • real time banking systems • highly

    available architecture • no spof • Predictable scaling • Performance
  13. operational cloud services • 100,000+ nodes • 10s of petabytes

    • million ops/s • largest single cluster 1000+
  14. 1-10 nodes active / passive high availability disaster recovery enterprise

    apps, dynamics, etc. ms sql server doesn't microsoft sell a database? 10-10,000 nodes active / Active Geo-distributed continuous availability disaster avoidance web, mobile, iot datastax enterprise
  15. supported datastax enterprise certified multi-workload Offline Application OpsCenter Services Monitoring

    Operations Operational Application Real Time Search Real Time Analytics Batch Analytics RDBMS Analytics Transformations Certified Apache Cassandra – No single point of failure – Linear Scalability – Disaster Avoidance Security In-Memory