11,000 five-star reviews • Over 500,000 registered passengers • A Hailo hail is accepted around the world every 4 seconds • Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation What is Hailo?
$100M in run-rate transactions and is making the world a better place for passengers and drivers • Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice Hailo is growing
Launched on AWS • Two PHP/MySQL web apps plus a Java backend • Mostly built by a team of 3 or 4 backend engineers • MySQL multi-master for single AZ resilience
– “become a utility” Cassandra is designed for high availability • Plans for international expansion around a single consumer app Cassandra is good at global replication • Expected growth Cassandra scales linearly for both reads and writes • Prior experience I had experience with Cassandra and could recommend it
by developers – a result of a startup culture • Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store • Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London
the entire entity, update one property and then write back a mutation containing every column • Only mutate columns that have been set • This avoids read-before-write race conditions
to carry out analytics eg: COUNT, SUM, AVG, GROUP BY • We use Acunu Analytics to give us this abilty in real time, for pre- planned query templates • It is backed by Cassandra and therefore highly available, resilient and globally distributed • Integration is straightforward
use dmcrypt to encrypt the entire EBS volume • Chose dmcrypt because it is uncomplicated • Our tests show a 1% performance hit in disk performance, which concurs with what Amazon suggest
• Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra • Rolling repair needed to make it safe (we use LOCAL_QUORUM) • We schedule “narrow repairs” on different nodes in our cluster each night
~1.5TB per node • We didn’t want to add more nodes • With compression, we are now back to ~600GB • Easy to accomplish • `nodetool upgradesstables` on a rolling schedule
that C* is “technically good and beautiful”, a “perfectly good option” • Our EVPO says that C* reminds him of a time series database in use at Goldman Sachs that had “very good performance” …but there are concerns
if it seems to be running smoothly • Peer-review data models, take time to think about them • Big rows are bad - use cfstats to look for them • Mixed workloads can cause problems - use cfhistograms and look out for signs of data modeling problems • Think about the compaction strategy for each CF
cause of Amazon outages • EBS is a single point of failure (it will fail everywhere in your cluster) • EBS is slow • EBS is expensive • EBS is unnecessary!
sell the dream • Learn the fundamentals, get the best out of Cassandra • Invest in tools to make life easier • Keep management in the loop, explain the trade offs
in Cassandra as we expand globally • We will hire people with experience running Cassandra • We will focus on expanding our reporting facilities • We aspire to extend our network (1M consumer installs, wallet) beyond cabs • We will continue to hire the best engineers in London, NYC and Asia