Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cabs, Cassandra, and Hailo (at Cassandra EU 2013)

Cabs, Cassandra, and Hailo (at Cassandra EU 2013)

My talk from #CassandraEU covering Hailo's use of Cassandra including insight from developers, operations and management, plus lessons learned.

Dave Gardner

October 17, 2013
Tweet

More Decks by Dave Gardner

Other Decks in Technology

Transcript

  1. #CASSANDRAEU
    Cabs, Cassandra, and Hailo
    David Gardner, Architect at Hailo
    CASSANDRASUMMITEU

    View Slide

  2. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  3. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  4. #CASSANDRAEU CASSANDRASUMMITEU
    •  1,352 changed files with 235,413 additions and 47,487 deletions
    •  7,429 commits
    •  1,653 tickets completed
    https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2
    https://github.com/apache/cassandra/blob/trunk/CHANGES.txt
    0.6 to 1.2

    View Slide

  5. #CASSANDRAEU CASSANDRASUMMITEU
    Cassandra adoption at Hailo from three perspectives:
    1.  Development
    2.  Operational
    3.  Management
    What this talk is about

    View Slide

  6. #CASSANDRAEU CASSANDRASUMMITEU
    What is Hailo?
    Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.

    View Slide

  7. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  8. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  9. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  10. #CASSANDRAEU CASSANDRASUMMITEU
    •  The world’s highest-rated taxi app – over 11,000 five-star reviews
    •  Over 500,000 registered passengers
    •  A Hailo hail is accepted around the world every 4 seconds
    •  Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in
    nearly 2 years of operation
    What is Hailo?

    View Slide

  11. #CASSANDRAEU CASSANDRASUMMITEU
    •  Hailo is a marketplace that facilitates over $100M in run-rate
    transactions and is making the world a better place for passengers
    and drivers
    •  Hailo has raised over $50M in financing from the world's best
    investors including Union Square Ventures, Accel, the founder of
    Skype (via Atomico), Wellington Partners (Spotify), Sir Richard
    Branson, and our CEO's mother, Janice
    Hailo is growing

    View Slide

  12. #CASSANDRAEU CASSANDRASUMMITEU
    The history
    The story behind Cassandra adoption at Hailo

    View Slide

  13. #CASSANDRAEU CASSANDRASUMMITEU
    Hailo launched in London in November 2011
    •  Launched on AWS
    •  Two PHP/MySQL web apps plus a Java backend
    •  Mostly built by a team of 3 or 4 backend engineers
    •  MySQL multi-master for single AZ resilience

    View Slide

  14. #CASSANDRAEU CASSANDRASUMMITEU
    Why Cassandra?
    •  A desire for greater resilience – “become a utility”
    Cassandra is designed for high availability
    •  Plans for international expansion around a single consumer app
    Cassandra is good at global replication
    •  Expected growth
    Cassandra scales linearly for both reads and writes
    •  Prior experience
    I had experience with Cassandra and could recommend it

    View Slide

  15. #CASSANDRAEU CASSANDRASUMMITEU
    The path to adoption
    •  Largely unilateral decision by developers – a result of a startup
    culture
    •  Replacement of key consumer app functionality, splitting up the
    PHP/MySQL web app into a mixture of global PHP/Java services
    backed by a Cassandra data store
    •  Launched into production in September 2012 – originally just
    powering North American expansion, before gradually switching
    over Dublin and London

    View Slide

  16. #CASSANDRAEU CASSANDRASUMMITEU
    One year on...
    •  Further breakdown of functionality into Go/Java SOA
    •  Migrating all online databases to Cassandra

    View Slide

  17. #CASSANDRAEU CASSANDRASUMMITEU
    Development perspective

    View Slide

  18. #CASSANDRAEU CASSANDRASUMMITEU
    “Cassandra just works”
    Dom W, Senior Engineer

    View Slide

  19. #CASSANDRAEU CASSANDRASUMMITEU
    Use cases
    1.  Entity storage
    2.  Time series data

    View Slide

  20. #CASSANDRAEU CASSANDRASUMMITEU
    CF = customers
    126007613634425612:
    createdTimestamp: 1370465412
    email: [email protected]
    givenName: Dave
    familyName: Gardner
    locale: en_GB
    phone: +447911111111

    View Slide

  21. #CASSANDRAEU CASSANDRASUMMITEU
    Considerations for entity storage
    •  Do not read the entire entity, update one property and then write
    back a mutation containing every column
    •  Only mutate columns that have been set
    •  This avoids read-before-write race conditions

    View Slide

  22. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  23. #CASSANDRAEU CASSANDRASUMMITEU
    CF = stats_db
    2013-06-01:
    55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…
    a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…
    b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…
    bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…

    View Slide

  24. #CASSANDRAEU CASSANDRASUMMITEU
    CF = stats_db
    LON123456:
    13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…
    20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…
    2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…
    338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…

    View Slide

  25. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  26. #CASSANDRAEU CASSANDRASUMMITEU
    Considerations for time series storage
    •  Choose row key carefully, since this partitions the records
    •  Think about how many records you want in a single row
    •  Denormalise on write into many indexes

    View Slide

  27. #CASSANDRAEU CASSANDRASUMMITEU
    Client libraries
    •  Gossie (Go)
    •  Astyanax (Java)
    •  phpcassa (PHP)

    View Slide

  28. #CASSANDRAEU CASSANDRASUMMITEU
    Analytics
    •  With Cassandra we lost the ability to carry out analytics
    eg: COUNT, SUM, AVG, GROUP BY
    •  We use Acunu Analytics to give us this abilty in real time, for pre-
    planned query templates
    •  It is backed by Cassandra and therefore highly available, resilient
    and globally distributed
    •  Integration is straightforward

    View Slide

  29. NSQ Acunu C*
    events
    #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  30. #CASSANDRAEU CASSANDRASUMMITEU
    AQL
    SELECT
    SUM(accepted),
    SUM(ignored),
    SUM(declined),
    SUM(withdrawn)
    FROM Allocations
    WHERE timestamp BETWEEN '1 week ago' AND 'now’
    AND driver='LON123456789’
    GROUP BY timestamp(day)

    View Slide

  31. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  32. #CASSANDRAEU CASSANDRASUMMITEU
    Operational perspective

    View Slide

  33. #CASSANDRAEU CASSANDRASUMMITEU
    “Allows a team of 2 to achieve things they wouldn’t
    have considered before Cassandra existed”
    Chris H, Operations Engineer

    View Slide

  34. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  35. #CASSANDRAEU CASSANDRASUMMITEU
    3 clusters
    6 machines per region
    3 regions
    (stats cluster is a long story)
    Operational
    Cluster
    Stats
    Cluster
    ap-southeast-1 us-east-1 eu-west-1
    us-east-1 eu-west-1

    View Slide

  36. AZ1
    eu-west-1
    AZ1
    AZ2 AZ2
    AZ3 AZ3
    AZ1
    us-east-1
    AZ1
    AZ2 AZ2
    AZ3 AZ3
    AZ1
    ap-southeast-1
    AZ1
    AZ2 AZ2
    AZ3 AZ3
    #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  37. #CASSANDRAEU CASSANDRASUMMITEU
    AWS VPCs with Open
    VPN links
    3 AZs per region
    m1.large machines
    Provisoned IOPS EBS
    Operational
    Cluster
    Stats
    Cluster
    ~ 1TB/node
    ~ 200GB/node

    View Slide

  38. #CASSANDRAEU CASSANDRASUMMITEU
    Backups
    •  SSTable snapshot
    •  Used to upload to S3, but this was taking >6 hours and consuming
    all our network bandwidth
    •  Now take EBS snapshot of the data volumes

    View Slide

  39. #CASSANDRAEU CASSANDRASUMMITEU
    Encryption
    •  Requirement for NYC launch
    •  We use dmcrypt to encrypt the entire EBS volume
    •  Chose dmcrypt because it is uncomplicated
    •  Our tests show a 1% performance hit in disk performance, which
    concurs with what Amazon suggest

    View Slide

  40. #CASSANDRAEU CASSANDRASUMMITEU
    Datastax Ops Centre is a quick win

    View Slide

  41. #CASSANDRAEU CASSANDRASUMMITEU
    Multi DC
    •  Something that Cassandra makes trivial
    •  Would have been very difficult to accomplish active-active inter-DC
    replication with a team of 2 without Cassandra
    •  Rolling repair needed to make it safe (we use LOCAL_QUORUM)
    •  We schedule “narrow repairs” on different nodes in our cluster each
    night

    View Slide

  42. #CASSANDRAEU CASSANDRASUMMITEU
    Compression
    •  Our stats cluster was running at ~1.5TB per node
    •  We didn’t want to add more nodes
    •  With compression, we are now back to ~600GB
    •  Easy to accomplish
    •  `nodetool upgradesstables` on a rolling schedule

    View Slide

  43. #CASSANDRAEU CASSANDRASUMMITEU
    Management perspective

    View Slide

  44. #CASSANDRAEU CASSANDRASUMMITEU
    “The days of the quick and dirty are over”
    Simon V, EVP Operations

    View Slide

  45. #CASSANDRAEU CASSANDRASUMMITEU
    Technically, everything is fine…
    •  Our COO feels that C* is “technically good and beautiful”, a
    “perfectly good option”
    •  Our EVPO says that C* reminds him of a time series database in
    use at Goldman Sachs that had “very good performance”
    …but there are concerns

    View Slide

  46. #CASSANDRAEU CASSANDRASUMMITEU
    People who can
    attempt to query
    MySQL
    People who can
    attempt to
    query Cassandra

    View Slide

  47. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  48. #CASSANDRAEU CASSANDRASUMMITEU
    Lessons learned

    View Slide

  49. #CASSANDRAEU CASSANDRASUMMITEU
    There might be a gulf in experience

    View Slide

  50. #CASSANDRAEU CASSANDRASUMMITEU
    10 Average years experience
    per team member
    MySQL Cassandra

    View Slide

  51. #CASSANDRAEU CASSANDRASUMMITEU
    Lesson learned
    •  Have an advocate - get someone who will sell the vision internally
    •  Learn the theory - teach each team member the fundamentals
    •  Make an effort to get everyone on board

    View Slide

  52. #CASSANDRAEU CASSANDRASUMMITEU
    Things can drift into failure

    View Slide

  53. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  54. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  55. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  56. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  57. #CASSANDRAEU CASSANDRASUMMITEU

    View Slide

  58. #CASSANDRAEU CASSANDRASUMMITEU
    Lesson learned
    •  Be pro-active with Cassandra, even if it seems to be running
    smoothly
    •  Peer-review data models, take time to think about them
    •  Big rows are bad - use cfstats to look for them
    •  Mixed workloads can cause problems - use cfhistograms and look
    out for signs of data modeling problems
    •  Think about the compaction strategy for each CF

    View Slide

  59. #CASSANDRAEU CASSANDRASUMMITEU
    EBS is terrible

    View Slide

  60. #CASSANDRAEU CASSANDRASUMMITEU
    Lessons learned
    •  EBS is nearly always the cause of Amazon outages
    •  EBS is a single point of failure (it will fail everywhere in your cluster)
    •  EBS is slow
    •  EBS is expensive
    •  EBS is unnecessary!

    View Slide

  61. #CASSANDRAEU CASSANDRASUMMITEU
    Management need to know the trade offs

    View Slide

  62. #CASSANDRAEU CASSANDRASUMMITEU
    Lessons learned
    •  Keep the business informed – explain the tradeoffs in simple terms
    •  Sing from the same hymn sheet
    •  Make sure there solutions in place for every use case from the
    beginning

    View Slide

  63. #CASSANDRAEU CASSANDRASUMMITEU
    People who can
    attempt to query
    MySQL
    People who can
    attempt to
    query Cassandra

    View Slide

  64. #CASSANDRAEU CASSANDRASUMMITEU
    Conclusions

    View Slide

  65. #CASSANDRAEU CASSANDRASUMMITEU
    We like Cassandra
    •  Solid design
    •  HA characteristics
    •  Easy multi-DC setup
    •  Simplicity of operation

    View Slide

  66. #CASSANDRAEU CASSANDRASUMMITEU
    Lessons for successful adoption
    •  Have an advocate, sell the dream
    •  Learn the fundamentals, get the best out of Cassandra
    •  Invest in tools to make life easier
    •  Keep management in the loop, explain the trade offs

    View Slide

  67. #CASSANDRAEU CASSANDRASUMMITEU
    The future
    •  We will continue to invest in Cassandra as we expand globally
    •  We will hire people with experience running Cassandra
    •  We will focus on expanding our reporting facilities
    •  We aspire to extend our network (1M consumer installs, wallet)
    beyond cabs
    •  We will continue to hire the best engineers in London, NYC and Asia

    View Slide

  68. #CASSANDRAEU CASSANDRASUMMITEU
    Questions?

    View Slide