Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Hailo fuels its growth using NoSQL Storage and Analytics

How Hailo fuels its growth using NoSQL Storage and Analytics

Hailo is building the world's best taxi app -- we're already in 9 cities worldwide, have 300,000 registered passengers, and are growing (30%+) every month. Of course, that presents a serious infrastructure challenge.
I'll explain how we've built our service around tools that have three key NoSQL characteristics -- they're all distributed, resilient and operationally simple. The particular goals we set ourselves were around making it easy to replicate our architecture as we launch in new cities, to scale as we grow in each city, while all the time being able to coordinate that setup in a straightforward way.

Dave Gardner

August 22, 2013

More Decks by Dave Gardner

Other Decks in Technology


  1. How Hailo fuels its growth using NoSQL Storage and Analytics

    David Gardner, Architect @ Hailo #NoSQLNow
  2. #NoSQLNow •  The world’s highest-rated taxi app – over 10,000

    five-star reviews •  Over 500,000 registered passengers •  A Hailo e-hail is accepted by a driver every four seconds around the world •  Hailo operates in ten cities from Tokyo to Toronto in just over eighteen months of operation What is Hailo?
  3. #NoSQLNow •  Hailo is a marketplace that facilitates over $100M

    in run-rate transactions and is making the world a better place for passengers and drivers •  Hailo has raised over $50M in financing from the world's best investors including Union Square Ventures, Accel, the founder of Skype (via Atomico), Wellington Partners (Spotify), Sir Richard Branson, and our CEO's mother, Janice Hailo is growing
  4. #NoSQLNow •  Why Hailo are using NoSQL •  How we

    use Cassandra •  How we use Acunu Analytics •  Challenges of NoSQL What this talk is about
  5. #NoSQLNow “NoSQL DBs trade off traditional features to better support

    new and emerging use cases” Andy Gross, Riak http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-solutions-to-hard-problems
  6. #NoSQLNow •  More widely used, tested and documented software • 

    Ad-hoc querying •  Talent pool with direct experience What are we trading off?
  7. #NoSQLNow Hailo launched in London in November 2011 •  Launched

    on AWS •  Two PHP/MySQL web apps plus a Java backend •  Mostly built by a team of 3 or 4 backend engineers •  MySQL multi-master for single AZ resilience
  8. #NoSQLNow Why Cassandra? •  A desire for greater resilience –

    “become a utility” Cassandra is designed for high availability •  Plans for international expansion around a single consumer app Cassandra is good at global replication •  Expected growth Cassandra scales linearly for both reads and writes •  Prior experience I had experience with Cassandra and could recommend it
  9. #NoSQLNow The path to adoption •  Largely unilateral decision by

    developers – a result of a startup culture •  Replacement of key consumer app functionality, splitting up the PHP/MySQL web app into a mixture of global PHP/Java services backed by a Cassandra data store •  Launched into production in September 2012 – originally just powering North American expansion, before gradually switching over Dublin and London
  10. #NoSQLNow Considerations for entity storage •  Do not read the

    entire entity, update one property and then write back a mutation containing every column •  Only mutate columns that have been set •  This avoids read-before-write race conditions
  11. #NoSQLNow Considerations for time series storage •  Choose row key

    carefully, since this partitions the records •  Think about how many records you want in a single row •  Denormalise on write into many indexes
  12. #NoSQLNow 2 clusters 6 machines per region 3 regions (stats

    cluster pending addition of third DC) Operational Cluster Stats Cluster ap-southeast-1 us-east-1 eu-west-1 us-east-1 eu-west-1
  13. #NoSQLNow AWS VPCs with Open VPN links 3 AZs per

    region m1.large machines Provisoned IOPS EBS Operational Cluster Stats Cluster ~ 600GB/node ~ 100GB/node
  14. #NoSQLNow Multi DC •  Something that Cassandra makes trivial • 

    Would have been very difficult to accomplish active-active inter-DC replication with a team of 2 without Cassandra •  Rolling repair needed to make it safe (we use LOCAL_QUORUM) •  We schedule “narrow repairs” on different nodes in our cluster each night
  15. #NoSQLNow Analytics •  With Cassandra we lost the ability to

    carry out analytics eg: COUNT, SUM, AVG, GROUP BY •  We use Acunu Analytics to give us this abilty in real time, for pre- planned query templates •  It is backed by Cassandra and therefore highly available, resilient and globally distributed •  Integration is straightforward
  16. #NoSQLNow AQL SELECT SUM(accepted), SUM(ignored), SUM(declined), SUM(withdrawn) FROM Allocations WHERE

    timestamp BETWEEN '1 week ago' AND 'now’ AND driver='LON123456789’ GROUP BY timestamp(day)
  17. #NoSQLNow Lessons learned •  Have an advovate - get someone

    who will sell the vision internally •  Teach team members the fundamentals of how the solution works •  Don’t cause yourself a “big data” problem unnecessarily •  Explain trade-offs in choosing NoSQL to all parts of the business •  Provide solutions!
  18. #NoSQLNow We like Cassandra •  Solid design •  HA characteristics

    •  Easy multi-DC setup •  Simplicity of operation
  19. #NoSQLNow The future •  We will continue to invest in

    Cassandra as we expand globally •  We will hire people with experience running Cassandra •  We will focus on expanding our reporting facilities •  We aspire to extend our network (1M consumer installs, wallet) beyond cabs •  We will continue to hire the best engineers in London, NYC and Asia