Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inside the End-to-End Provider Testing Suite

Inside the End-to-End Provider Testing Suite

All about the PagerDuty alert routing decision testing system, which leverages a lab full of cell phones to measure alert deliverability across various cellular and telephony providers. Data from this system is used to make routing decisions, such as "switch to an alternate SMS-provider because the current is slow for this carrier".

Evan Gilman

March 10, 2015
Tweet

More Decks by Evan Gilman

Other Decks in Technology

Transcript

  1. 4/17/15 Reliability INSIDE THE END-TO-END PROVIDER TESTING SUITE STATUS PAGES

    … "url" => "http://status.twilio.com/api/v1/services/rest-api/events/ ahNzfnR3aWxpby1zdGF0dXMtaHJkchILEgVFdmVudBiAgIDAwviwCgw", "timestamp" => "Mon, 09 Mar 2015 23:02:33 GMT", "sid" => "ahNzfnR3aWxpby1zdGF0dXMtaHJkchILEgVFdmVudBiAgIDAwviwCgw", "message" => "We have identified messages that are reporting incorrect statuses and sent-dates in the API for some accounts. The process of correcting these fields is now in progress. The API is operating normally at this time.", "informational" => false }, …
  2. 4/17/15 Reliability INSIDE THE END-TO-END PROVIDER TESTING SUITE ERROR HANDLING

    • Retries enqueued immediately • Successful callback no-ops enqueued retries • Status page updates evaluated by a human
  3. 4/17/15 E2EPT INSIDE THE END-TO-END PROVIDER TESTING SUITE ANDROID SOFTWARE

    • Timestamped ruok/imok • Out-of-band result transmission • Statsd emissions
  4. 4/17/15 E2EPT INSIDE THE END-TO-END PROVIDER TESTING SUITE LOTS OF

    PATHS TO THE CUSTOMER • Four cellular carriers • Three SMS providers • Two short codes - every two minutes • 15+ long codes - every fifteen minutes
  5. 4/17/15 E2EPT INSIDE THE END-TO-END PROVIDER TESTING SUITE FAILURE MODES

    • Carrier failure • Phones of a single carrier experience delays/failures across all providers • Provider failure • All phones experience delays/failures from a single provider • Provider-Carrier failure • Phones of a single carrier experience delays/failures from a single provider
  6. 4/17/15 E2EPT INSIDE THE END-TO-END PROVIDER TESTING SUITE THE NUMBERS:

    FAILURE FREQUENCY • Semi-monthly provider events • Yearly catastrophic provider outages