Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Hailo fuels its growth using NoSQL Storage and Analytics

How Hailo fuels its growth using NoSQL Storage and Analytics

Hailo is building the world's best taxi app -- we're already in 9 cities worldwide, have 300,000 registered passengers, and are growing (30%+) every month. Of course, that presents a serious infrastructure challenge.
I'll explain how we've built our service around tools that have three key NoSQL characteristics -- they're all distributed, resilient and operationally simple. The particular goals we set ourselves were around making it easy to replicate our architecture as we launch in new cities, to scale as we grow in each city, while all the time being able to coordinate that setup in a straightforward way.

Dave Gardner

August 22, 2013
Tweet

More Decks by Dave Gardner

Other Decks in Technology

Transcript

  1. How Hailo fuels its growth using NoSQL Storage and Analytics
    David Gardner, Architect @ Hailo
    #NoSQLNow

    View Slide

  2. #NoSQLNow

    View Slide

  3. #NoSQLNow

    View Slide

  4. #NoSQLNow

    View Slide

  5. #NoSQLNow
    •  The world’s highest-rated taxi app – over 10,000 five-star reviews
    •  Over 500,000 registered passengers
    •  A Hailo e-hail is accepted by a driver every four seconds around the
    world
    •  Hailo operates in ten cities from Tokyo to Toronto in just over
    eighteen months of operation
    What is Hailo?

    View Slide

  6. #NoSQLNow
    •  Hailo is a marketplace that facilitates over $100M in run-rate
    transactions and is making the world a better place for passengers
    and drivers
    •  Hailo has raised over $50M in financing from the world's best
    investors including Union Square Ventures, Accel, the founder of
    Skype (via Atomico), Wellington Partners (Spotify), Sir Richard
    Branson, and our CEO's mother, Janice
    Hailo is growing

    View Slide

  7. #NoSQLNow
    •  Why Hailo are using NoSQL
    •  How we use Cassandra
    •  How we use Acunu Analytics
    •  Challenges of NoSQL
    What this talk is about

    View Slide

  8. #NoSQLNow
    Why choose NoSQL?

    View Slide

  9. #NoSQLNow
    “NoSQL DBs trade off traditional features to better
    support new and emerging use cases”
    Andy Gross, Riak
    http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-solutions-to-hard-problems

    View Slide

  10. #NoSQLNow
    •  More widely used, tested and documented software
    •  Ad-hoc querying
    •  Talent pool with direct experience
    What are we trading off?

    View Slide

  11. #NoSQLNow
    •  High availability
    •  Scalability
    •  Operational simplicity
    What do we get back in return?

    View Slide

  12. #NoSQLNow
    The path to adoption at Hailo

    View Slide

  13. #NoSQLNow
    Hailo launched in London in November 2011
    •  Launched on AWS
    •  Two PHP/MySQL web apps plus a Java backend
    •  Mostly built by a team of 3 or 4 backend engineers
    •  MySQL multi-master for single AZ resilience

    View Slide

  14. #NoSQLNow
    Why Cassandra?
    •  A desire for greater resilience – “become a utility”
    Cassandra is designed for high availability
    •  Plans for international expansion around a single consumer app
    Cassandra is good at global replication
    •  Expected growth
    Cassandra scales linearly for both reads and writes
    •  Prior experience
    I had experience with Cassandra and could recommend it

    View Slide

  15. #NoSQLNow
    The path to adoption
    •  Largely unilateral decision by developers – a result of a startup
    culture
    •  Replacement of key consumer app functionality, splitting up the
    PHP/MySQL web app into a mixture of global PHP/Java services
    backed by a Cassandra data store
    •  Launched into production in September 2012 – originally just
    powering North American expansion, before gradually switching
    over Dublin and London

    View Slide

  16. #NoSQLNow
    Cassandra at Hailo

    View Slide

  17. #NoSQLNow
    “Cassandra just works”
    Dom W, Senior Engineer

    View Slide

  18. #NoSQLNow
    Use cases
    1.  Entity storage
    2.  Time series data

    View Slide

  19. #NoSQLNow
    CF = customers
    126007613634425612:
    createdTimestamp: 1370465412
    email: [email protected]
    givenName: Dave
    familyName: Gardner
    locale: en_GB
    phone: +447911111111

    View Slide

  20. #NoSQLNow
    Considerations for entity storage
    •  Do not read the entire entity, update one property and then write
    back a mutation containing every column
    •  Only mutate columns that have been set
    •  This avoids read-before-write race conditions

    View Slide

  21. #NoSQLNow
    CF = comms
    2013-06-01:
    55374fa0-ce2b-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
    a48bd800-ce2b-11e2-8b8b-0800200c9a66: {“to”:”foo@ex…
    b0e15850-ce2b-11e2-8b8b-0800200c9a66: {“to”:”bar@ho …
    bfac6c80-ce2b-11e2-8b8b-0800200c9a66: {“to”:”baz@fo…

    View Slide

  22. #NoSQLNow
    CF = comms
    [email protected]:
    13b247f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
    20f70a40-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
    2b44d3b0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…
    338a22f0-ce2c-11e2-8b8b-0800200c9a66: {“to”:”dave@c…

    View Slide

  23. #NoSQLNow
    Considerations for time series storage
    •  Choose row key carefully, since this partitions the records
    •  Think about how many records you want in a single row
    •  Denormalise on write into many indexes

    View Slide

  24. #NoSQLNow
    Client libraries
    •  Astyanax (Java)
    •  phpcassa (PHP)
    •  github.com/carloscm/gossie (Go)

    View Slide

  25. #NoSQLNow

    View Slide

  26. #NoSQLNow
    2 clusters
    6 machines per region
    3 regions
    (stats cluster pending addition
    of third DC)
    Operational
    Cluster
    Stats
    Cluster
    ap-southeast-1 us-east-1 eu-west-1
    us-east-1 eu-west-1

    View Slide

  27. #NoSQLNow
    AWS VPCs with Open
    VPN links
    3 AZs per region
    m1.large machines
    Provisoned IOPS EBS
    Operational
    Cluster
    Stats
    Cluster
    ~ 600GB/node
    ~ 100GB/node

    View Slide

  28. #NoSQLNow
    Multi DC
    •  Something that Cassandra makes trivial
    •  Would have been very difficult to accomplish active-active inter-DC
    replication with a team of 2 without Cassandra
    •  Rolling repair needed to make it safe (we use LOCAL_QUORUM)
    •  We schedule “narrow repairs” on different nodes in our cluster each
    night

    View Slide

  29. #NoSQLNow

    View Slide

  30. #NoSQLNow
    Acunu Analytics at Hailo

    View Slide

  31. #NoSQLNow
    Analytics
    •  With Cassandra we lost the ability to carry out analytics
    eg: COUNT, SUM, AVG, GROUP BY
    •  We use Acunu Analytics to give us this abilty in real time, for pre-
    planned query templates
    •  It is backed by Cassandra and therefore highly available, resilient
    and globally distributed
    •  Integration is straightforward

    View Slide

  32. #NoSQLNow
    NSQ Acunu C*
    events

    View Slide

  33. #NoSQLNow
    AQL
    SELECT
    SUM(accepted),
    SUM(ignored),
    SUM(declined),
    SUM(withdrawn)
    FROM Allocations
    WHERE timestamp BETWEEN '1 week ago' AND 'now’
    AND driver='LON123456789’
    GROUP BY timestamp(day)

    View Slide

  34. #NoSQLNow

    View Slide

  35. #NoSQLNow

    View Slide

  36. #NoSQLNow
    Challenges

    View Slide

  37. #NoSQLNow
    10 Average years experience
    per team member
    MySQL Cassandra

    View Slide

  38. #NoSQLNow
    People who can
    attempt to query
    MySQL
    People who can
    attempt to
    query Cassandra

    View Slide

  39. #NoSQLNow

    View Slide

  40. #NoSQLNow
    Lessons learned
    •  Have an advovate - get someone who will sell the vision internally
    •  Teach team members the fundamentals of how the solution works
    •  Don’t cause yourself a “big data” problem unnecessarily
    •  Explain trade-offs in choosing NoSQL to all parts of the business
    •  Provide solutions!

    View Slide

  41. #NoSQLNow
    People who can
    attempt to query
    MySQL
    People who can
    attempt to
    query Cassandra

    View Slide

  42. #NoSQLNow
    Conclusion

    View Slide

  43. #NoSQLNow
    We like Cassandra
    •  Solid design
    •  HA characteristics
    •  Easy multi-DC setup
    •  Simplicity of operation

    View Slide

  44. #NoSQLNow
    The future
    •  We will continue to invest in Cassandra as we expand globally
    •  We will hire people with experience running Cassandra
    •  We will focus on expanding our reporting facilities
    •  We aspire to extend our network (1M consumer installs, wallet)
    beyond cabs
    •  We will continue to hire the best engineers in London, NYC and Asia

    View Slide

  45. Thank you
    #NoSQLNow
    Come and work with NoSQL full time: jobs.hailocab.com
     

    View Slide