Slide 1

Slide 1 text

Taming the rising complexity of event- driven APIs Useful insights for API publishers

Slide 2

Slide 2 text

My journey with event-driven APIs Technical co-founder of Ably Built and scaled event-driven APIs for developers. Board member OpenAPI initiative

Slide 3

Slide 3 text

Page 3 Ably - Serious realtime infrastructure What is an event-driven API? Delivery person Logistics provider Retailer Parcel arrived! (on device) Customer Ping

Slide 4

Slide 4 text

Page 4 Ably - Serious realtime infrastructure Pull (request/response) is stateless & simpler Data source Worker Server Consumers

Slide 5

Slide 5 text

Page 5 Ably - Serious realtime infrastructure Pull (request/response) is stateless & simpler Data source Worker Server Consumers

Slide 6

Slide 6 text

Page 6 Ably - Serious realtime infrastructure Event-driven is more demanding on producers Worker hot spots Additional complexity: Backoffice scheduling & management Short value window Delivery logistics Acme Fresh Food

Slide 7

Slide 7 text

Page 7 Ably - Serious realtime infrastructure Complexity inversion with event-driven APIs Pull (REST) API Event-driven API Consumer Producer Trigger requests and maintain state API Gateway / CDN Fan out push, maintain state, and fault tolerant. Complexity Wait for updates Stateless response

Slide 8

Slide 8 text

Page 8 Ably - Serious realtime infrastructure If complexity is evil - can we avoid it? Probably not. Event-driven data exceeds all data produced in 2019 (circa 40Zb)

Slide 9

Slide 9 text

Page 9 Ably - Serious realtime infrastructure Event-driven data demand is here, now By 2021, cumulative event- driven data exceeds all previous event-driven data produced and consumer, ever

Slide 10

Slide 10 text

Page 10 Ably - Serious realtime infrastructure Criticality of this data Compound Annual Growth Rate All data Potentially Critical Critical Hyper-Critical 30% 37% 39% 54%

Slide 11

Slide 11 text

Page 11 Ably - Serious realtime infrastructure Source of new event-driven data

Slide 12

Slide 12 text

Page 12 Ably - Serious realtime infrastructure Terminology: The Realtime API family Pub/sub Messaging pattern Streaming Consumer pattern Event-driven Architectural pattern Push Producer pattern Realtime APIs

Slide 13

Slide 13 text

Publishing a realtime API

Slide 14

Slide 14 text

Page 14 Ably - Serious realtime infrastructure Infrastructure Publishing Documentation / Spec Three distinct functions when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Manage Access control Instrumentation Billing Monetization

Slide 15

Slide 15 text

Page 15 Ably - Serious realtime infrastructure Three distinct functions when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Publishing Infrastructure Documentation / Spec Manage Access control Instrumentation Billing Monetization

Slide 16

Slide 16 text

Page 16 Ably - Serious realtime infrastructure Three distinct functions when publishing a realtime API Distribute Deploy Authentication Pull & push protocols Onboarding + dev tooling Scaling & performance Publishing Infrastructure Documentation / Spec Manage Access control Instrumentation Billing Monetization

Slide 17

Slide 17 text

Distribution layer complexity 1. Integrity vs Latency 2. Throughput 3. Push vs Pull subscriptions 4. Downstream reliability 5. Durability Five commonly unforeseen but important areas of complexity in event-driven API delivery.

Slide 18

Slide 18 text

Distribution layer complexity #1 Integrity vs Latency

Slide 19

Slide 19 text

Page 19 Ably - Serious realtime infrastructure Integrity vs Latency Happy Path Your servers message bus API Interface 15 updates a minute

Slide 20

Slide 20 text

Page 20 Ably - Serious realtime infrastructure Integrity vs Latency: Congestion & connectivity issues Backpressure Your servers message bus Less Happy Path

Slide 21

Slide 21 text

Page 21 Ably - Serious realtime infrastructure Solution #1 - Backpressure control Time: 0s 10s 20s 30s 40s 50s 60s 70s 80s Backpressure control Bandwidth limited

Slide 22

Slide 22 text

Managing backpressure Server-side ● TCP buffer size with high watermark Network layer ● ACK from subscribing clients Application layer Client-side ● Stream polling

Slide 23

Slide 23 text

Page 23 Ably - Serious realtime infrastructure Integrity vs Latency: Conflation Pricing Scores GPS locations Time: 1s 2s 3s 4s Great for:

Slide 24

Slide 24 text

Page 24 Ably - Serious realtime infrastructure Simplified example of conflation on a stream

Slide 25

Slide 25 text

Page 25 Ably - Serious realtime infrastructure Considerations when latency takes priority ● Ordering of streams is typically not required Less complexity ● Backpressure management or conflation preferred Additional complexity ● Capacity planning critical Additional operational complexity ● Persistent subscriber transport preferred Improved latencies as reduced round-trip overhead

Slide 26

Slide 26 text

Page 26 Ably - Serious realtime infrastructure Integrity vs Latency: Integrity prioritized Ordered message stream Auditors & Legal message bus

Slide 27

Slide 27 text

Page 27 Ably - Serious realtime infrastructure Integrity vs Latency: Integrity prioritized Latency focussed Integrity focussed Latency & Integrity agnostic Ordered message stream Auditors & Legal message bus

Slide 28

Slide 28 text

Page 28 Ably - Serious realtime infrastructure Considerations when integrity takes priority ● Ordering of streams is important Significant complexity increased in a stateful design (serial numbers) ● Backpressure management still needed Buffers cannot build up indefinitely. Some increased complexity ● Persistent subscriber transport preferred Removes complexity in the subscriber as TCP can be relied upon for integrity ● Reliable publishing ACKs, persistent connections & idempotency

Slide 29

Slide 29 text

Page 29 Ably - Serious realtime infrastructure Publisher integrity with idempotent publishing Post A Cloud Service Connection Disconnected Response No Response, Retry Post A Response Duplicate A A

Slide 30

Slide 30 text

Page 30 Ably - Serious realtime infrastructure Integrity vs Latency in summary ● Consumers decide on integrity vs latency ● Producers of data can choose both latency and integrity ○ To gain integrity, idempotency and persistent connections preferred ● Latency over integrity is generally simpler ● Backpressure needs to be handled always

Slide 31

Slide 31 text

Distribution layer complexity #2 Throughput

Slide 32

Slide 32 text

Page 32 Ably - Serious realtime infrastructure Throughput - narrow pipe Multiple producers Fat message bus Single stream Consumer 500 msgs p/s Max 1,000 msgs p/s per stream

Slide 33

Slide 33 text

Page 33 Ably - Serious realtime infrastructure Throughput: #1 solved using queues Advantages: Relatively simple Scales easily Disadvantages: No ordering At least once delivery Potentially two design patterns in play - pub/sub and queueing. Consumer workers Queue 500 msgs p/s Fat message bus Multiple producers 500 msgs p/s

Slide 34

Slide 34 text

Page 34 Ably - Serious realtime infrastructure Advantages: Data stream integrity (ordering) Single pattern pub/sub Exactly-once delivery Disadvantages: Sharding is complex Expanding or contracting shards is hard Consumer state may be needed Throughput: #2 solved with sharding Shard consumers Consumer group 625 msg p/s 1 2 4 3 Throughput sharding Fat message bus Multiple producers

Slide 35

Slide 35 text

Page 35 Ably - Serious realtime infrastructure ● Everything has limits - distribution is your friend Practical limits in terms of what a single server, connection or shard can sustain. ● Keep complexity away from your consumers It’s your job to keep it simple for consumers. ● Queues are simple Yet lack ordering and exactly-once delivery guarantees ● Shards are hard Providesordering and exactly-once delivery guarantees Throughput in summary

Slide 36

Slide 36 text

Distribution layer complexity #3 Push vs Pull Subscriptions

Slide 37

Slide 37 text

Page 37 Ably - Serious realtime infrastructure Push vs Pull subscriptions AMQP API Gateway Push service Client initiated Server initiated Pull subscription protocols Push subscription protocols SSE Websockets MQTT HTTP WebSub Your message bus

Slide 38

Slide 38 text

Page 38 Ably - Serious realtime infrastructure Examples of Push subscription protocols Stateless over HTTP Queue based Stream based Webhooks AMQP Kafka WebSub AWS SQS AWS Kinesis Serverless function invocation MQTT Integrations (Zapier etc)

Slide 39

Slide 39 text

Page 39 Ably - Serious realtime infrastructure When to use push vs pull subscriptions? Push subscriptions ● High throughput Target can be load balanced. ● Reduced consumer complexity Creates complexity for producer, and you need to address durability and downstream failures ● Always online Unsuitable for devices that are not always online. ● Unintentional DoS risk Control rate of downstream requests. Pull subscriptions ● On demand Generally better suited for devices such as mobiles, desktops, browsers where data is needed on demand. ● Simple Simple for consumers and producers. ● Low throughput per subscriber Not suitable for high throughput, without sharding or queueing. ● Capacity planning harder Unpredictable load.

Slide 40

Slide 40 text

Distribution layer complexity #4 Downstream reliability (for push subscriptions)

Slide 41

Slide 41 text

Page 41 Ably - Serious realtime infrastructure Downstream reliability - push subscriptions Your message bus Push service Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP

Slide 42

Slide 42 text

Page 42 Ably - Serious realtime infrastructure Downstream reliability - coping with failure Poison message 50x Faulty connection Your message bus Push service Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP

Slide 43

Slide 43 text

Page 43 Ably - Serious realtime infrastructure Dead letter queues Dead letter queue Failed messages Your message bus Concurrent HTTP requests Ordered stream Unordered stream HTTP Kafka AMQP

Slide 44

Slide 44 text

Page 44 Ably - Serious realtime infrastructure ● Manage throughput to avoid unintentional DoS attacks Avoid unintentional floods of requests to downstream endpoints. Use rate limiting, and incremental back-offs. ● Push subscriptions may prefer latency over integrity Latency prioritized traffic may allow messages to be discarded when there is a failure. ● Dead letter queue poison / bad messages Don’t delete data. ● Tooling and alerts Ensure downstream providers have necessary tooling and alerts to manage failures. Downstream reliability considerations

Slide 45

Slide 45 text

Distribution layer complexity #5 Durability

Slide 46

Slide 46 text

Page 46 Ably - Serious realtime infrastructure Durability Resume from #1 (-2 hours) Persistent ordered index log stream Resume from #6 (-1 minute) 10 9 8 7 6 5 4 3 2 1

Slide 47

Slide 47 text

Page 47 Ably - Serious realtime infrastructure Durability considerations ● Use a dedicated log stream storage solution Trying to get traditional databases to store log streams is hard. There are many solutions that provide efficient append-only stream storage. ● Complexity of adding storage is probably worth it Freedom to later improve reliability, reduce backpressure issues, better continuity of streams. ● Latency and financial cost The financial impact can be managed to some degree by modifying TTLs for storage.

Slide 48

Slide 48 text

Event-driven APIs do introduce complexity But it’s not rocket science :)

Slide 49

Slide 49 text

Page 49 Ably - Serious realtime infrastructure What next? #1 Give developers what they want - better event-driven integrations Event driven integration frustration up 2% this year Source: State of API Integration Cloud Elements

Slide 50

Slide 50 text

Page 50 Ably - Serious realtime infrastructure What next? #2 Avoid complexity ● Focus on necessary complexity ● Delay complexity that is not needed to day, but plan for it ● Open source and cloud solutions exist - don’t reinvent the wheel

Slide 51

Slide 51 text

Page 51 Ably - Serious realtime infrastructure #3 Focus on developer experience ● Don’t downstream complexity to consumers ● Let consumers choose push or pull protocols, integrity or latency ● Documentation and developer portals are important ● An API is a contract and commitment to your consumers What next?

Slide 52

Slide 52 text

Page 52 Ably - Serious realtime infrastructure #4 Help build an open realtime connected world ● Provide event-driven APIs ● Set your data free What next?

Slide 53

Slide 53 text

Thank you Taming the rising complexity of event-driven APIs www.ably.io Me: @mattheworiordan @ablyrealtime Shameless plugs: go.ably.io/open-data-streams go.ably.io/api-management Illustrations by Leonie Wharton - leoniewharton.com