Taming the rising complexity of event-driven APIs

Taming the rising complexity of event-driven APIs

This talk was presented by Matthew O'Riordan at API World in San Francisco, Oct 2019.

In this talk we talk about what an event-driven API is, why it adds complexity compared to more traditional REST APIs, why in spite of this additional complexity we need to tackle event-driven APIs due to demand from developers, and then finally we look at 5 common areas of unexpected but significant complexity one needs to consider when publishing event-driven APIs. The goal of the talk was to inform users about complexity that exists, and how to navigate that complexity once you understand it.

223395b4fdae9ca2b8512f8c1378d705?s=128

Matthew O'Riordan

October 09, 2019
Tweet

Transcript

  1. Taming the rising complexity of event- driven APIs Useful insights

    for API publishers
  2. My journey with event-driven APIs Technical co-founder of Ably Built

    and scaled event-driven APIs for developers. Board member OpenAPI initiative
  3. Page 3 Ably - Serious realtime infrastructure What is an

    event-driven API? Delivery person Logistics provider Retailer Parcel arrived! (on device) Customer Ping
  4. Page 4 Ably - Serious realtime infrastructure Pull (request/response) is

    stateless & simpler Data source Worker Server Consumers
  5. Page 5 Ably - Serious realtime infrastructure Pull (request/response) is

    stateless & simpler Data source Worker Server Consumers
  6. Page 6 Ably - Serious realtime infrastructure Event-driven is more

    demanding on producers Worker hot spots Additional complexity: Backoffice scheduling & management Short value window Delivery logistics Acme Fresh Food
  7. Page 7 Ably - Serious realtime infrastructure Complexity inversion with

    event-driven APIs Pull (REST) API Event-driven API Consumer Producer Trigger requests and maintain state API Gateway / CDN Fan out push, maintain state, and fault tolerant. Complexity Wait for updates Stateless response
  8. Page 8 Ably - Serious realtime infrastructure If complexity is

    evil - can we avoid it? Probably not. Event-driven data exceeds all data produced in 2019 (circa 40Zb)
  9. Page 9 Ably - Serious realtime infrastructure Event-driven data demand

    is here, now By 2021, cumulative event- driven data exceeds all previous event-driven data produced and consumer, ever
  10. Page 10 Ably - Serious realtime infrastructure Criticality of this

    data Compound Annual Growth Rate All data Potentially Critical Critical Hyper-Critical 30% 37% 39% 54%
  11. Page 11 Ably - Serious realtime infrastructure Source of new

    event-driven data
  12. Page 12 Ably - Serious realtime infrastructure Terminology: The Realtime

    API family Pub/sub Messaging pattern Streaming Consumer pattern Event-driven Architectural pattern Push Producer pattern Realtime APIs
  13. Publishing a realtime API

  14. Page 14 Ably - Serious realtime infrastructure Infrastructure Publishing Documentation

    / Spec Three distinct functions when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Manage Access control Instrumentation Billing Monetization
  15. Page 15 Ably - Serious realtime infrastructure Three distinct functions

    when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Publishing Infrastructure Documentation / Spec Manage Access control Instrumentation Billing Monetization
  16. Page 16 Ably - Serious realtime infrastructure Three distinct functions

    when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Publishing Infrastructure Documentation / Spec Manage Access control Instrumentation Billing Monetization
  17. Distribution layer complexity 1. Integrity vs Latency 2. Throughput 3.

    Push vs Pull subscriptions 4. Downstream reliability 5. Durability Five commonly unforeseen but important areas of complexity in event-driven API delivery.
  18. Distribution layer complexity #1 Integrity vs Latency

  19. Page 19 Ably - Serious realtime infrastructure Integrity vs Latency

    Happy Path Your servers message bus API Interface 15 updates a minute
  20. Page 20 Ably - Serious realtime infrastructure Integrity vs Latency:

    Congestion & connectivity issues Backpressure Your servers message bus Less Happy Path
  21. Page 21 Ably - Serious realtime infrastructure Solution #1 -

    Backpressure control Time: 0s 10s 20s 30s 40s 50s 60s 70s 80s Backpressure control Bandwidth limited
  22. Managing backpressure Server-side • TCP buffer size with high watermark

    Network layer • ACK from subscribing clients Application layer Client-side • Stream polling
  23. Page 23 Ably - Serious realtime infrastructure Integrity vs Latency:

    Conflation Pricing Scores GPS locations Time: 1s 2s 3s 4s Great for:
  24. Page 24 Ably - Serious realtime infrastructure Simplified example of

    conflation on a stream
  25. Page 25 Ably - Serious realtime infrastructure Considerations when latency

    takes priority • Ordering of streams is typically not required Less complexity • Backpressure management or conflation preferred Additional complexity • Capacity planning critical Additional operational complexity • Persistent subscriber transport preferred Improved latencies as reduced round-trip overhead
  26. Page 26 Ably - Serious realtime infrastructure Integrity vs Latency:

    Integrity prioritized Ordered message stream Auditors & Legal message bus
  27. Page 27 Ably - Serious realtime infrastructure Integrity vs Latency:

    Integrity prioritized Latency focussed Integrity focussed Latency & Integrity agnostic Ordered message stream Auditors & Legal message bus
  28. Page 28 Ably - Serious realtime infrastructure Considerations when integrity

    takes priority • Ordering of streams is important Significant complexity increased in a stateful design (serial numbers) • Backpressure management still needed Buffers cannot build up indefinitely. Some increased complexity • Persistent subscriber transport preferred Removes complexity in the subscriber as TCP can be relied upon for integrity • Reliable publishing ACKs, persistent connections & idempotency
  29. Page 29 Ably - Serious realtime infrastructure Publisher integrity with

    idempotent publishing Post A Cloud Service Connection Disconnected Response No Response, Retry Post A Response Duplicate A A
  30. Page 30 Ably - Serious realtime infrastructure Integrity vs Latency

    in summary • Consumers decide on integrity vs latency • Producers of data can choose both latency and integrity ◦ To gain integrity, idempotency and persistent connections preferred • Latency over integrity is generally simpler • Backpressure needs to be handled always
  31. Distribution layer complexity #2 Throughput

  32. Page 32 Ably - Serious realtime infrastructure Throughput - narrow

    pipe Multiple producers Fat message bus Single stream Consumer 500 msgs p/s Max 1,000 msgs p/s per stream
  33. Page 33 Ably - Serious realtime infrastructure Throughput: #1 solved

    using queues Advantages: Relatively simple Scales easily Disadvantages: No ordering At least once delivery Potentially two design patterns in play - pub/sub and queueing. Consumer workers Queue 500 msgs p/s Fat message bus Multiple producers 500 msgs p/s
  34. Page 34 Ably - Serious realtime infrastructure Advantages: Data stream

    integrity (ordering) Single pattern pub/sub Exactly-once delivery Disadvantages: Sharding is complex Expanding or contracting shards is hard Consumer state may be needed Throughput: #2 solved with sharding Shard consumers Consumer group 625 msg p/s 1 2 4 3 Throughput sharding Fat message bus Multiple producers
  35. Page 35 Ably - Serious realtime infrastructure • Everything has

    limits - distribution is your friend Practical limits in terms of what a single server, connection or shard can sustain. • Keep complexity away from your consumers It’s your job to keep it simple for consumers. • Queues are simple Yet lack ordering and exactly-once delivery guarantees • Shards are hard Providesordering and exactly-once delivery guarantees Throughput in summary
  36. Distribution layer complexity #3 Push vs Pull Subscriptions

  37. Page 37 Ably - Serious realtime infrastructure Push vs Pull

    subscriptions AMQP API Gateway Push service Client initiated Server initiated Pull subscription protocols Push subscription protocols SSE Websockets MQTT HTTP WebSub Your message bus
  38. Page 38 Ably - Serious realtime infrastructure Examples of Push

    subscription protocols Stateless over HTTP Queue based Stream based Webhooks AMQP Kafka WebSub AWS SQS AWS Kinesis Serverless function invocation MQTT Integrations (Zapier etc)
  39. Page 39 Ably - Serious realtime infrastructure When to use

    push vs pull subscriptions? Push subscriptions • High throughput Target can be load balanced. • Reduced consumer complexity Creates complexity for producer, and you need to address durability and downstream failures • Always online Unsuitable for devices that are not always online. • Unintentional DoS risk Control rate of downstream requests. Pull subscriptions • On demand Generally better suited for devices such as mobiles, desktops, browsers where data is needed on demand. • Simple Simple for consumers and producers. • Low throughput per subscriber Not suitable for high throughput, without sharding or queueing. • Capacity planning harder Unpredictable load.
  40. Distribution layer complexity #4 Downstream reliability (for push subscriptions)

  41. Page 41 Ably - Serious realtime infrastructure Downstream reliability -

    push subscriptions Your message bus Push service Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP
  42. Page 42 Ably - Serious realtime infrastructure Downstream reliability -

    coping with failure Poison message 50x Faulty connection Your message bus Push service Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP
  43. Page 43 Ably - Serious realtime infrastructure Dead letter queues

    Dead letter queue Failed messages Your message bus Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP
  44. Page 44 Ably - Serious realtime infrastructure • Manage throughput

    to avoid unintentional DoS attacks Avoid unintentional floods of requests to downstream endpoints. Use rate limiting, and incremental back-offs. • Push subscriptions may prefer latency over integrity Latency prioritized traffic may allow messages to be discarded when there is a failure. • Dead letter queue poison / bad messages Don’t delete data. • Tooling and alerts Ensure downstream providers have necessary tooling and alerts to manage failures. Downstream reliability considerations
  45. Distribution layer complexity #5 Durability

  46. Page 46 Ably - Serious realtime infrastructure Durability Resume from

    #1 (-2 hours) Persistent ordered index log stream Resume from #6 (-1 minute) 10 9 8 7 6 5 4 3 2 1
  47. Page 47 Ably - Serious realtime infrastructure Durability considerations •

    Use a dedicated log stream storage solution Trying to get traditional databases to store log streams is hard. There are many solutions that provide efficient append-only stream storage. • Complexity of adding storage is probably worth it Freedom to later improve reliability, reduce backpressure issues, better continuity of streams. • Latency and financial cost The financial impact can be managed to some degree by modifying TTLs for storage.
  48. Event-driven APIs do introduce complexity But it’s not rocket science

    :)
  49. Page 49 Ably - Serious realtime infrastructure What next? #1

    Give developers what they want - better event-driven integrations Event driven integration frustration up 2% this year Source: State of API Integration Cloud Elements
  50. Page 50 Ably - Serious realtime infrastructure What next? #2

    Avoid complexity • Focus on necessary complexity • Delay complexity that is not needed to day, but plan for it • Open source and cloud solutions exist - don’t reinvent the wheel
  51. Page 51 Ably - Serious realtime infrastructure #3 Focus on

    developer experience • Don’t downstream complexity to consumers • Let consumers choose push or pull protocols, integrity or latency • Documentation and developer portals are important • An API is a contract and commitment to your consumers What next?
  52. Page 52 Ably - Serious realtime infrastructure #4 Help build

    an open realtime connected world • Provide event-driven APIs • Set your data free What next?
  53. Thank you Taming the rising complexity of event-driven APIs www.ably.io

    Me: @mattheworiordan @ablyrealtime Shameless plugs: go.ably.io/open-data-streams go.ably.io/api-management Illustrations by Leonie Wharton - leoniewharton.com