Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Twilio Cloud: Continuos Delivery under High Availability constraints

Twilio Cloud: Continuos Delivery under High Availability constraints

Twilio opens up the black box of Telecom for developers,
making it incredibly easy to build apps that can communicate via
voice or sms. However, due to the nature of Telecom, this imposes
very strict availability constraints and can affect how fast Twilio can
ship code to production as every change can affect uptime. Learn how
Twilio solves this problem by following a set of engineering principles
to build fault tolerant and highly available services in the cloud.

F000c9b4dd0656f60de1dc9e75f7386c?s=128

Bulat Shakirzyanov

April 20, 2012
Tweet

Transcript

  1. Twilio Cloud Continuous Delivery under High Availability constraints BULAT SHAKIRZYANOV

    INFRASTRUCTURE ENGINEER
  2. Twilio

  3. Developer End User Carriers Inbound Calls Outbound Calls Mobile/Browser VoIP

    Voice SMS Phone Numbers Send To/From Phone Numbers Short Codes Dynamically Purchase Phone Numbers Web service APIs to automate Voice and SMS communications Overview
  4. Control the telecom network with any web language Access the

    power of our platform with 5 easy verbs Add VoIP to web or mobile apps with 5 lines of code Toolbox
  5. Business Process Automation Click-to-Call Interactive Voice Response Social Networking Contact

    Center Marketing Campaigns Some examples from our customers Call-Tracking/Lead-Gen
  6. Globally available 24x7

  7. Engineering principles • Contain failures • Fail fast • Retry

    on failure • Be idempotent • Be stateless • Relax consistency
  8. Contain failures

  9. master host web server queue Contain failures

  10. master host web server queue Contain failures POST /data

  11. slave host web server load balancer Contain failures slave slave

  12. slave host web server load balancer Contain failures GET /data

    slave slave
  13. Fail fast

  14. Fail fast ASSERT(expected == actual);

  15. Fail fast class Foo def initialize(stream) @stream = stream end

    def print @stream.puts(inspect) end end
  16. f = Foo.new(nil) f.print Fail fast

  17. f = Foo.new(nil) f.print Fail fast $ ruby fail_fast.rb foo.rb:7:in

    `print': undefined method `puts' for nil:NilClass (NoMethodError) from fail_fast.rb:2:in `<main>'
  18. f = Foo.new(nil) f.print Fail fast $ ruby fail_fast.rb foo.rb:7:in

    `print': undefined method `puts' for nil:NilClass (NoMethodError) from fail_fast.rb:2:in `<main>' Problem caused
  19. class Foo def initialize(stream) @stream = stream end def print

    @stream.puts(inspect) end end Fail fast Problem surfaced
  20. class Foo def initialize(io) @stream = io Assert.stream(@stream) end def

    print @stream.puts(inspect) end end Fail fast Surface the cause
  21. Fail fast module Assert extend self def stream(io) unless io.kind_of?(IO)

    raise ArgumentError, "#{io.inspect} is not IO", caller end end end
  22. f = Foo.new(nil) f.print Fail fast $ ruby fail_fast.rb foo.rb:4:in

    `initialize': nil is not IO (ArgumentError) from fail_fast.rb:1:in `new' from fail_fast.rb:1:in `<main>'
  23. f = Foo.new(nil) f.print Fail fast $ ruby fail_fast.rb foo.rb:4:in

    `initialize': nil is not IO (ArgumentError) from fail_fast.rb:1:in `new' from fail_fast.rb:1:in `<main>' Problem caused and surfaced
  24. Retry on failure

  25. Failed, Retrying in 3 seconds... Failed, Retrying in 6 seconds...

    Failed, Retrying in 12 seconds... Retry on failure
  26. Be idempotent

  27. Be idempotent + 1 + 1 = 1

  28. Be stateless

  29. web server web server web server session database browser Be

    stateless
  30. web server web server web server session database browser session

    id session id session data response Be stateless
  31. web server web server web server browser Be stateless session

    data response
  32. Relax consistency

  33. Relax consistency database server database client database client database client

  34. Relax consistency database server database client database client database client

    database server database server
  35. Automation • Build and test • Configuration management • Monitoring

    • Orchestration • Scaling
  36. None
  37. 1000x Website Content CMS 100x Website Code PHP/Ruby etc. 10x

    REST API Python/Java etc. 1x Big DB Schema SQL Log Scale Deployment Frequency(Risk) 4 buckets
  38. • Build and deployment system - boot entire Twilio stack

    with one key press • Host configuration - versioned code & config • Host orchestration - load balancing • Monitoring and alerting - nagios • Multi-datacenter deployment & analytics BoxConfig
  39. Configuration host

  40. Configuration host base ami latest build role service service service

    service role service service service service
  41. Configuration version Configuration

  42. Monitoring Realtime stats

  43. Manage traffic Orchestration

  44. Orchestration load balancer host host host incoming traffic

  45. Orchestration load balancer host host host host incoming traffic

  46. Bulat Shakirzyanov @avalanche123 twilio