Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hailo Tech Platform

Hailo Tech Platform

A technical deep dive presented to prospective investors in New York in July 2014. Covered the entire Hailo technology stack and explained how it had evolved.

Dave Gardner

July 01, 2014
Tweet

More Decks by Dave Gardner

Other Decks in Technology

Transcript

  1. What this talk will cover 1. Philosophy and inspiration behind

    Hailo’s technology 2. High level architecture overview 3. Web, mobile and backend technology stacks and tools 4. Microservice architecture and infrastructure stack 5. Build and release process 6. Automated testing
  2. Let people build things, quickly • Anti-fragile - embrace failure

    • Cloud native • Safety net - testing automation • Freedom and responsibility
  3. Product Owner • Independent self-organising teams covering a range of

    skills • Embedded QA and data science • SCRUM process with 2 week sprints • Focus on speed to market followed by data-driven iteration Scrum Master Android Engineer iOS Engineer Web Engineer Backend Engineer Data Analyst
  4. Hailo architecture • Native iOS and Android clients for drivers

    and passengers (4 apps) • H2 platform for running backend services and desktop/mobile web apps • Fully cloud-based hosting on AWS • “Microservices” architecture • RabbitMQ message bus transporting protobuf encoded messages
  5. us-east-1 C* C* C* eu-west-1 ELB Go “Thin” API RabbitMQ

    Message Bus
 (federated clusters per AZ) Go Service Go Service Java Service C* C* C* ELB Go “Thin” API RabbitMQ Message Bus
 (federated clusters per AZ) Go Service Go Service Java Service 1. 2. 3.
  6. ELB Go “Thin” API 1. H1 Driver API H2 “API”

    Service H2 Orch. Service H2 Core Service v1-api-driver-london.elasticride.com api-driver-london.elasticride.com http Message Bus RMQ = Hailo’s RabbitMQ Message Bus Message Bus Message Bus • Elastic Load Balancer terminates SSL connections and balances between instances in a region • Rule-based router built into “thin API” can send traffic to old and new backends • “API” Service acts as a translation layer between legacy interfaces and newer protobuf-defined interfaces
  7. 2. RMQ RMQ cluster RMQ RMQ cluster RMQ RMQ cluster

    Service haproxy Services always connect to localhost. HAProxy sends to the same AZ, unless that AZ is down, in which case it “fails over” to a different AZ. RMQ runs in clusters of 2, within each AZ Each exchange is federated to the other AZs
  8. Handler 3. Logic Storage go-platform-layer go-service-layer Self-configuring external service adapters

    Library for building services that talk via RMQ Services get for free: • Service discovery • Monitoring • Authentication/authorisation • Provisioning • AB testing • Self-configuring connectivity to third-party services
  9. Web technologies and languages • JavaScript • Node, Angular, React,

    Backbone, Require, Grunt, Bower, Mocha, Qunit, Phantom (plus many more client libs) • Ruby • SASS, Jekyll • Hailo Web Platform • Fully integrated with the H2 build and deployment system via Jenkins CI
  10. Hailo Web Platform • RPC over HTTP web API, Websocket

    API for event streaming • JS library to authenticate with and use the Hailo APIs • Fully mobile custom UI framework using SASS • Web modules and components libraries for UI widgets, maps and graphing • Automated CI build and test, plus one-click deploy to any environment
  11. Hailo Web Platform, continued • Internationalization framework, integrated with CrowdIn

    • Client error logging • User browser tracking • A/B testing and reporting framework • Webapp manifests for cross-app deep linking • Homescreen for webapp discoverability
  12. • Hailo Web UI is our version of “bootstrap” and

    makes it easy to build web projects with a common look and feel • Hailo web toolkit provides common libraries for making API calls and managing session state • Designed to be reactive – scaling up from mobile to desktop clients
  13. Mobile stack • Java for Android • Objective-C and C

    for iOS • Some components built in C++ and shared between platforms • Eclipse and Xcode for software development • Cucumber and Calabash for testing • Integrated with Jenkins CI for packaging and beta-deploy
  14. Backend stack • Mainly Go with some use of Java

    • Various open source middleware for distributed storage, coordination, search and caching • Sublime text for software development • Fully integrated with the H2 build and deployment system via Jenkins CI
  15. Backend open source middleware • Apache Cassandra Multi-region distributed database

    • Apache Zookeeper Per-region distributed coordination • RabbitMQ Per-region message bus • NSQ Distributed durable message queue • Memcache Per-region in-memory KV store • Elastic Search Per-region distributed search index
  16. ETA Service Routing Service Phone Service Profile Service State Service

    Charge Service Near Drivers Service Tow Truck Service Restau- rant Service Place Service /v1/customer/neardrivers API TIER ORCHESTRATION TIER CORE TIER
  17. High level infrastructure architecture • Everything run out of 2

    AWS regions (EU-WEST-1 and US-EAST-1) • META VPC in each region which hosts shared services, and terminates our client VPNs • Each "environment" also has its own VPC (LVE, STG, TST). These are peered to the META VPCs • Each VPC has 3 sets of everything, in line with the idea of "QUORUM" - we could lose one of anything (instance, subnet, AZ) but we'd still have more than 50% of our full capacity available
  18. What makes up a VPC • NAT gateway • "External"

    subnet with 512 IP addresses available (used for things that require an external IP address) • "Internal" subnet with 512 IP addresses available • "Secure" subnet with 512 addresses available (used for things that need to communicate over the site-to-site VPN) • We've also got a spare 512 addresses in each AZ in case we need it, and the ability to allocate up to 8192 addresses per VPC
  19. 1. Branch 2. Write code + tests 3. Push code

    Jenkins automatically builds branch 4. Create Pull Request Status from CI fed back to PR UI 5. Review + merge 6. Deploy to staging 7. Automated QA App UAT, robomon, load testing 8. Deploy to production 9. Monitor status
  20. Testing • Automated User Acceptance Testing (UAT) for each app

    build • Automated test suites for backend services • Autonomous agents for testing backend services under load (robot drivers and passengers) • Integrated failure testing to assert anti-fragility
  21. Backend testing at scale, Thursday 24th July • 50,000 completed

    jobs • 15,000 jobs/hour peak • 20,000,000 driver location updates • 1,600 updates/second • 12,000 drivers on shift This is a quiet day
  22. Original ambitions • Provide a simple framework for us to

    build an efficient, resilient, second generation Hailo • Allow Hailo to scale the business along three axis: adding features to our current business, adding cities and brand new stuff • Solve pain points in our current architecture • Be productive
  23. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Servers needed per-city + cities
  24. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 City configuration in many places + cities PHP array YAML Config service plists built into app plists built into app XML XML Config service PHP array YAML
  25. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Coordination changing three apps + features
  26. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Unclear responsibilities + features Eg: payment or cancellation
  27. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Broad but inflexible services + features, + brand new
  28. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Separate push deployment models + productive rsync conan/cap rsync conan/cap conan/cap
  29. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Different auth models - pain none none IP whitelist plus token turned off IP whitelist plus login service
  30. PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver

    API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 SPOFs - pain
  31. PHP Cust API PHP Cust API PHP Cust API ELB

    ELB Load balancing broken and complex - pain PHP Cust HAP ELB PHP Credits HAP ELB Java Pay HAP Phone ELB API ELB HAProxy Service ELB HAProxy Service us-east-1 eu-west-1 us-east-1
  32. Key features • Lose PHP, adopt Go – gains in

    efficiency of developer time and compute resource • Eliminate all SPOFs – adopt a cloud native approach to build a working whole out of ephemeral and often broken parts • Scale engineering output in line with additional resource – services with few, clearly defined responsibilities reduce friction • Increase reusability – develop features by composing fine-grained services that are agnostic to Hailo’s current operation
  33. Discovery Service Binding Service Config Service Login Service • Keeps

    track of every running instance of a service within a single region • Stores this information as ephemeral nodes in Zookeeper, keeping a watch to ensure strong consistency between all instances of the discovery service within a region • Sends heartbeats to services periodically via RMQ and removes dead instances • Self-healing system because instances that don’t receive heartbeats will try to reconnect and failing that die
  34. Discovery Service Binding Service Config Service Login Service • Creates

    bindings within RabbitMQ for all running services, leveraging information in the discovery service • Reacts to services coming up and down by creating and destroying bindings • Bindings establish a connection between an exchange and a queue • Stores and manages control plane data in order to provide advanced bindings such as “send 10% of traffic to this particular version of the service”
  35. Discovery Service Binding Service Config Service Login Service • Stores

    application configuration data as JSON • Able to store JSON under any arbitrary key • Can combine many keys, on request, to serve up “compiled” config
  36. Discovery Service Binding Service Config Service Login Service • Credential

    and session/token store for all applications • The only thing that is able to issue and sign (with private key) tokens • Applications can exchange a session ID for a token, which they can then use to establish authorisation