Cabs, Cassandra, and Hailo (at Cassandra EU 2013)

#CASSANDRAEU Cabs, Cassandra, and Hailo David Gardner, Architect at Hailo
CASSANDRASUMMITEU

#CASSANDRAEU CASSANDRASUMMITEU

#CASSANDRAEU CASSANDRASUMMITEU •  1,352 changed ﬁles with 235,413 additions and
47,487 deletions •  7,429 commits •  1,653 tickets completed https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2 https://github.com/apache/cassandra/blob/trunk/CHANGES.txt 0.6 to 1.2

#CASSANDRAEU CASSANDRASUMMITEU Cassandra adoption at Hailo from three perspectives: 1. 
Development 2.  Operational 3.  Management What this talk is about

#CASSANDRAEU CASSANDRASUMMITEU What is Hailo? Hailo is The Taxi Magnet.
Use Hailo to get a cab wherever you are, whenever you want.

#CASSANDRAEU CASSANDRASUMMITEU •  The world’s highest-rated taxi app – over
11,000 ﬁve-star reviews •  Over 500,000 registered passengers •  A Hailo hail is accepted around the world every 4 seconds •  Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation What is Hailo?

#CASSANDRAEU CASSANDRASUMMITEU •  Hailo is a marketplace that facilitates over
$100M in run-rate transactions and is making the world a better place for passengers and drivers •  Hailo has raised over $50M in ﬁnancing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice Hailo is growing

#CASSANDRAEU CASSANDRASUMMITEU The history The story behind Cassandra adoption at
Hailo

#CASSANDRAEU CASSANDRASUMMITEU Hailo launched in London in November 2011 • 
Launched on AWS •  Two PHP/MySQL web apps plus a Java backend •  Mostly built by a team of 3 or 4 backend engineers •  MySQL multi-master for single AZ resilience

#CASSANDRAEU CASSANDRASUMMITEU Why Cassandra? •  A desire for greater resilience
– “become a utility” Cassandra is designed for high availability •  Plans for international expansion around a single consumer app Cassandra is good at global replication •  Expected growth Cassandra scales linearly for both reads and writes •  Prior experience I had experience with Cassandra and could recommend it

#CASSANDRAEU CASSANDRASUMMITEU The path to adoption •  Largely unilateral decision
by developers – a result of a startup culture •  Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store •  Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London

#CASSANDRAEU CASSANDRASUMMITEU One year on... •  Further breakdown of functionality
into Go/Java SOA •  Migrating all online databases to Cassandra

#CASSANDRAEU CASSANDRASUMMITEU Development perspective

#CASSANDRAEU CASSANDRASUMMITEU “Cassandra just works” Dom W, Senior Engineer

#CASSANDRAEU CASSANDRASUMMITEU Use cases 1.  Entity storage 2.  Time series
data

#CASSANDRAEU CASSANDRASUMMITEU CF = customers 126007613634425612: createdTimestamp: 1370465412 email: [email protected]
givenName: Dave familyName: Gardner locale: en_GB phone: +447911111111

#CASSANDRAEU CASSANDRASUMMITEU Considerations for entity storage •  Do not read
the entire entity, update one property and then write back a mutation containing every column •  Only mutate columns that have been set •  This avoids read-before-write race conditions

#CASSANDRAEU CASSANDRASUMMITEU CF = stats_db 2013-06-01: 55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…
b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“action”:”… bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“action”:”…

#CASSANDRAEU CASSANDRASUMMITEU CF = stats_db LON123456: 13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… 20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…
2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”… 338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“action”:”…

#CASSANDRAEU CASSANDRASUMMITEU Considerations for time series storage •  Choose row
key carefully, since this partitions the records •  Think about how many records you want in a single row •  Denormalise on write into many indexes

#CASSANDRAEU CASSANDRASUMMITEU Client libraries •  Gossie (Go) •  Astyanax (Java)
•  phpcassa (PHP)

#CASSANDRAEU CASSANDRASUMMITEU Analytics •  With Cassandra we lost the ability
to carry out analytics eg: COUNT, SUM, AVG, GROUP BY •  We use Acunu Analytics to give us this abilty in real time, for pre- planned query templates •  It is backed by Cassandra and therefore highly available, resilient and globally distributed •  Integration is straightforward

NSQ Acunu C* events #CASSANDRAEU CASSANDRASUMMITEU

#CASSANDRAEU CASSANDRASUMMITEU AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations
WHERE timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day)

#CASSANDRAEU CASSANDRASUMMITEU Operational perspective

#CASSANDRAEU CASSANDRASUMMITEU “Allows a team of 2 to achieve things
they wouldn’t have considered before Cassandra existed” Chris H, Operations Engineer

#CASSANDRAEU CASSANDRASUMMITEU 3 clusters 6 machines per region 3 regions
(stats cluster is a long story) Operational Cluster Stats Cluster ap-southeast-1 us-east-1 eu-west-1 us-east-1 eu-west-1

AZ1 eu-west-1 AZ1 AZ2 AZ2 AZ3 AZ3 AZ1 us-east-1 AZ1
AZ2 AZ2 AZ3 AZ3 AZ1 ap-southeast-1 AZ1 AZ2 AZ2 AZ3 AZ3 #CASSANDRAEU CASSANDRASUMMITEU

#CASSANDRAEU CASSANDRASUMMITEU AWS VPCs with Open VPN links 3 AZs
per region m1.large machines Provisoned IOPS EBS Operational Cluster Stats Cluster ~ 1TB/node ~ 200GB/node

#CASSANDRAEU CASSANDRASUMMITEU Backups •  SSTable snapshot •  Used to upload
to S3, but this was taking >6 hours and consuming all our network bandwidth •  Now take EBS snapshot of the data volumes

#CASSANDRAEU CASSANDRASUMMITEU Encryption •  Requirement for NYC launch •  We
use dmcrypt to encrypt the entire EBS volume •  Chose dmcrypt because it is uncomplicated •  Our tests show a 1% performance hit in disk performance, which concurs with what Amazon suggest

#CASSANDRAEU CASSANDRASUMMITEU Datastax Ops Centre is a quick win

#CASSANDRAEU CASSANDRASUMMITEU Multi DC •  Something that Cassandra makes trivial
•  Would have been very difﬁcult to accomplish active-active inter-DC replication with a team of 2 without Cassandra •  Rolling repair needed to make it safe (we use LOCAL_QUORUM) •  We schedule “narrow repairs” on different nodes in our cluster each night

#CASSANDRAEU CASSANDRASUMMITEU Compression •  Our stats cluster was running at
~1.5TB per node •  We didn’t want to add more nodes •  With compression, we are now back to ~600GB •  Easy to accomplish •  `nodetool upgradesstables` on a rolling schedule

#CASSANDRAEU CASSANDRASUMMITEU Management perspective

#CASSANDRAEU CASSANDRASUMMITEU “The days of the quick and dirty are
over” Simon V, EVP Operations

#CASSANDRAEU CASSANDRASUMMITEU Technically, everything is ﬁne… •  Our COO feels
that C* is “technically good and beautiful”, a “perfectly good option” •  Our EVPO says that C* reminds him of a time series database in use at Goldman Sachs that had “very good performance” …but there are concerns

#CASSANDRAEU CASSANDRASUMMITEU People who can attempt to query MySQL People
who can attempt to query Cassandra

#CASSANDRAEU CASSANDRASUMMITEU Lessons learned

#CASSANDRAEU CASSANDRASUMMITEU There might be a gulf in experience

#CASSANDRAEU CASSANDRASUMMITEU 10 Average years experience per team member MySQL
Cassandra

#CASSANDRAEU CASSANDRASUMMITEU Lesson learned •  Have an advocate - get
someone who will sell the vision internally •  Learn the theory - teach each team member the fundamentals •  Make an effort to get everyone on board

#CASSANDRAEU CASSANDRASUMMITEU Things can drift into failure

#CASSANDRAEU CASSANDRASUMMITEU Lesson learned •  Be pro-active with Cassandra, even
if it seems to be running smoothly •  Peer-review data models, take time to think about them •  Big rows are bad - use cfstats to look for them •  Mixed workloads can cause problems - use cfhistograms and look out for signs of data modeling problems •  Think about the compaction strategy for each CF

#CASSANDRAEU CASSANDRASUMMITEU EBS is terrible

#CASSANDRAEU CASSANDRASUMMITEU Lessons learned •  EBS is nearly always the
cause of Amazon outages •  EBS is a single point of failure (it will fail everywhere in your cluster) •  EBS is slow •  EBS is expensive •  EBS is unnecessary!

#CASSANDRAEU CASSANDRASUMMITEU Management need to know the trade offs

#CASSANDRAEU CASSANDRASUMMITEU Lessons learned •  Keep the business informed –
explain the tradeoffs in simple terms •  Sing from the same hymn sheet •  Make sure there solutions in place for every use case from the beginning

#CASSANDRAEU CASSANDRASUMMITEU People who can attempt to query MySQL People
who can attempt to query Cassandra

#CASSANDRAEU CASSANDRASUMMITEU Conclusions

#CASSANDRAEU CASSANDRASUMMITEU We like Cassandra •  Solid design •  HA
characteristics •  Easy multi-DC setup •  Simplicity of operation

#CASSANDRAEU CASSANDRASUMMITEU Lessons for successful adoption •  Have an advocate,
sell the dream •  Learn the fundamentals, get the best out of Cassandra •  Invest in tools to make life easier •  Keep management in the loop, explain the trade offs

#CASSANDRAEU CASSANDRASUMMITEU The future •  We will continue to invest
in Cassandra as we expand globally •  We will hire people with experience running Cassandra •  We will focus on expanding our reporting facilities •  We aspire to extend our network (1M consumer installs, wallet) beyond cabs •  We will continue to hire the best engineers in London, NYC and Asia

#CASSANDRAEU CASSANDRASUMMITEU Questions?

Cabs, Cassandra, and Hailo (at Cassandra EU 2013)

Cabs, Cassandra, and Hailo (at Cassandra EU 2013)

More Decks by Dave Gardner

Other Decks in Technology

Featured

Transcript