A technical deep dive presented to prospective investors in New York in July 2014. Covered the entire Hailo technology stack and explained how it had evolved.
What this talk will cover 1. Philosophy and inspiration behind Hailo’s technology 2. High level architecture overview 3. Web, mobile and backend technology stacks and tools 4. Microservice architecture and infrastructure stack 5. Build and release process 6. Automated testing
Product Owner • Independent self-organising teams covering a range of skills • Embedded QA and data science • SCRUM process with 2 week sprints • Focus on speed to market followed by data-driven iteration Scrum Master Android Engineer iOS Engineer Web Engineer Backend Engineer Data Analyst
us-east-1 C* C* C* eu-west-1 ELB Go “Thin” API RabbitMQ Message Bus (federated clusters per AZ) Go Service Go Service Java Service C* C* C* ELB Go “Thin” API RabbitMQ Message Bus (federated clusters per AZ) Go Service Go Service Java Service 1. 2. 3.
ELB Go “Thin” API 1. H1 Driver API H2 “API” Service H2 Orch. Service H2 Core Service v1-api-driver-london.elasticride.com api-driver-london.elasticride.com http Message Bus RMQ = Hailo’s RabbitMQ Message Bus Message Bus Message Bus • Elastic Load Balancer terminates SSL connections and balances between instances in a region • Rule-based router built into “thin API” can send traffic to old and new backends • “API” Service acts as a translation layer between legacy interfaces and newer protobuf-defined interfaces
2. RMQ RMQ cluster RMQ RMQ cluster RMQ RMQ cluster Service haproxy Services always connect to localhost. HAProxy sends to the same AZ, unless that AZ is down, in which case it “fails over” to a different AZ. RMQ runs in clusters of 2, within each AZ Each exchange is federated to the other AZs
Handler 3. Logic Storage go-platform-layer go-service-layer Self-configuring external service adapters Library for building services that talk via RMQ Services get for free: • Service discovery • Monitoring • Authentication/authorisation • Provisioning • AB testing • Self-configuring connectivity to third-party services
Web technologies and languages • JavaScript • Node, Angular, React, Backbone, Require, Grunt, Bower, Mocha, Qunit, Phantom (plus many more client libs) • Ruby • SASS, Jekyll • Hailo Web Platform • Fully integrated with the H2 build and deployment system via Jenkins CI
Hailo Web Platform • RPC over HTTP web API, Websocket API for event streaming • JS library to authenticate with and use the Hailo APIs • Fully mobile custom UI framework using SASS • Web modules and components libraries for UI widgets, maps and graphing • Automated CI build and test, plus one-click deploy to any environment
Hailo Web Platform, continued • Internationalization framework, integrated with CrowdIn • Client error logging • User browser tracking • A/B testing and reporting framework • Webapp manifests for cross-app deep linking • Homescreen for webapp discoverability
• Hailo Web UI is our version of “bootstrap” and makes it easy to build web projects with a common look and feel • Hailo web toolkit provides common libraries for making API calls and managing session state • Designed to be reactive – scaling up from mobile to desktop clients
Mobile stack • Java for Android • Objective-C and C for iOS • Some components built in C++ and shared between platforms • Eclipse and Xcode for software development • Cucumber and Calabash for testing • Integrated with Jenkins CI for packaging and beta-deploy
Backend stack • Mainly Go with some use of Java • Various open source middleware for distributed storage, coordination, search and caching • Sublime text for software development • Fully integrated with the H2 build and deployment system via Jenkins CI
ETA Service Routing Service Phone Service Profile Service State Service Charge Service Near Drivers Service Tow Truck Service Restau- rant Service Place Service /v1/customer/neardrivers API TIER ORCHESTRATION TIER CORE TIER
High level infrastructure architecture • Everything run out of 2 AWS regions (EU-WEST-1 and US-EAST-1) • META VPC in each region which hosts shared services, and terminates our client VPNs • Each "environment" also has its own VPC (LVE, STG, TST). These are peered to the META VPCs • Each VPC has 3 sets of everything, in line with the idea of "QUORUM" - we could lose one of anything (instance, subnet, AZ) but we'd still have more than 50% of our full capacity available
What makes up a VPC • NAT gateway • "External" subnet with 512 IP addresses available (used for things that require an external IP address) • "Internal" subnet with 512 IP addresses available • "Secure" subnet with 512 addresses available (used for things that need to communicate over the site-to-site VPN) • We've also got a spare 512 addresses in each AZ in case we need it, and the ability to allocate up to 8192 addresses per VPC
Testing • Automated User Acceptance Testing (UAT) for each app build • Automated test suites for backend services • Autonomous agents for testing backend services under load (robot drivers and passengers) • Integrated failure testing to assert anti-fragility
Backend testing at scale, Thursday 24th July • 50,000 completed jobs • 15,000 jobs/hour peak • 20,000,000 driver location updates • 1,600 updates/second • 12,000 drivers on shift This is a quiet day
Original ambitions • Provide a simple framework for us to build an efficient, resilient, second generation Hailo • Allow Hailo to scale the business along three axis: adding features to our current business, adding cities and brand new stuff • Solve pain points in our current architecture • Be productive
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Servers needed per-city + cities
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 City configuration in many places + cities PHP array YAML Config service plists built into app plists built into app XML XML Config service PHP array YAML
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Coordination changing three apps + features
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Unclear responsibilities + features Eg: payment or cancellation
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Broad but inflexible services + features, + brand new
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Separate push deployment models + productive rsync conan/cap rsync conan/cap conan/cap
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 Different auth models - pain none none IP whitelist plus token turned off IP whitelist plus login service
PHP Cust API eu-west-1 Java Hailo Engine MySQL PHP Driver API PHP Cust API PHP Cust API ELB ELB Java Hailo Engine PHP Driver API ELB MySQL Java Hailo Engine MySQL PHP Driver API ELB C* C* C* PHP Cust service PHP Credits service Java Pay service eu-west-1 us-east-1 SPOFs - pain
Key features • Lose PHP, adopt Go – gains in efficiency of developer time and compute resource • Eliminate all SPOFs – adopt a cloud native approach to build a working whole out of ephemeral and often broken parts • Scale engineering output in line with additional resource – services with few, clearly defined responsibilities reduce friction • Increase reusability – develop features by composing fine-grained services that are agnostic to Hailo’s current operation
Discovery Service Binding Service Config Service Login Service • Keeps track of every running instance of a service within a single region • Stores this information as ephemeral nodes in Zookeeper, keeping a watch to ensure strong consistency between all instances of the discovery service within a region • Sends heartbeats to services periodically via RMQ and removes dead instances • Self-healing system because instances that don’t receive heartbeats will try to reconnect and failing that die
Discovery Service Binding Service Config Service Login Service • Creates bindings within RabbitMQ for all running services, leveraging information in the discovery service • Reacts to services coming up and down by creating and destroying bindings • Bindings establish a connection between an exchange and a queue • Stores and manages control plane data in order to provide advanced bindings such as “send 10% of traffic to this particular version of the service”
Discovery Service Binding Service Config Service Login Service • Stores application configuration data as JSON • Able to store JSON under any arbitrary key • Can combine many keys, on request, to serve up “compiled” config
Discovery Service Binding Service Config Service Login Service • Credential and session/token store for all applications • The only thing that is able to issue and sign (with private key) tokens • Applications can exchange a session ID for a token, which they can then use to establish authorisation