Slide 1

Slide 1 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data as a Service: APIs for Big Data Applications Alexander Günsche Senior Solutions Architect lxg@amazon.com APIDAYS PARIS · 3/4/5 DECEMBER 2024

Slide 2

Slide 2 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is special about Big Data APIs?

Slide 3

Slide 3 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Unprecedented Growth of Data 010101010101010101101010101010101010101010 110101010101010101010101010101010101101010 101010101010100101010010101010101010101101 010101010101010101010110101010101010101010 101010101010101010101010101010101101010101 010101010101010110101010101010101010101010 101010101010101010101010101101010101010101 010101010110101010101010101010101010101010 101010101010101010101010101010101101010101 010101010101010110101010101010101010101010 101010101010100101010101000101010100101010 101001010101010010101010101001010101010100 101010010101010100101010101010010101000010 101010100101010101010100101010101010010100 100101010100101010101001010101010100101010 000101010101001010101010101001010101010100 101001001010101010100110001001100101001001 There is more data and more diversity of data than people think Data growth To live for To scale 15+ 1,000x years >10x every 5 years Data platforms needs IDC, “Data Age 2025”

Slide 4

Slide 4 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Recency Real time Seconds Minutes Hours Days Months Time critical decisions Business Intelligence Predictive Actionable Reactive Historical

Slide 5

Slide 5 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Lake fundamentals OLTP IoT BI Data sources Storage Data access Consumers AI/ML Ingestion Infrastructure Metadata/Governance

Slide 6

Slide 6 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost drivers Compute Storage Data Transfer With Terrabytes of total volumes and Gigabytes per transaction, be aware of typical cost drivers …

Slide 7

Slide 7 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. API Paradigms

Slide 8

Slide 8 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Everything HTTP! Verbs Ressource locators Extensible (e.g. WebDav) Proxy support Multipart responses Content Negotiation Streaming (Websockets) Multiplexing Caching Status codes Compression Broad adoption HTTP+UDP (QUIC) Encryption Custom Headers

Slide 9

Slide 9 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The challenge “Get orders from Berlin in March 2023 or later” { "id": "123", "user": { "id": "987", "name": "Jack Johnson", "email": "jj@example.com", "city": "Aachen" }, "items": [ { "p_id": "456", "qty": 1, "price": 37.81 }, { "p_id": "567", "qty": 2, "price": 42.35 } ], "datetime": "2024-05-23T18:23:18Z" }

Slide 10

Slide 10 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The usual suspects GraphQL Prompt REST gRPC/Protobuf OData expensive NLP non-deterministic error-prone Low adoption/maturity expensive object mapping over-/underfetching Unflexible queries Complex translation logic

Slide 11

Slide 11 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. SELECT c.*, o.*, oi.* FROM customers c JOIN orders o ON c.id = o.id JOIN items oi ON o.id = oi.id WHERE c.city = 'Berlin' AND o.date >= DATE '2023-03-01' SQL as a data access layer Extremely flexible query format incl. filters, ranges, pagination Single fetch even for complex data structures PROs Very permissive by default Caching hardly possible CONs

Slide 12

Slide 12 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Response formats

Slide 13

Slide 13 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. The usual suspects JSON XML Graph Protobuf CSV expensive response parsing only scalar values expensive object mapping expensive resolution/ transformation

Slide 14

Slide 14 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. JSONL { "id": "123", "user": { "id": "987", "name": "Jack Johnson", "email": " jj@example.com", "city": "Aachen" }, "items": [ {"p_id": "456", "qty": 1, "price": 37.81}, {"p_id": "567", "qty": 2, "price": 42.35} ], "datetime": "2024-05-23T18:23:18Z" } { "id": "124", "user": { "id": "988", "name": "John Jackson", "email": "jj@example.net", "city": "Berlin" }, "items": [ { "p_id": "678", "qty": 3, "price": 43.19 }, { "p_id": "789", "qty": 25, "price": 2.88 } ], "time": "2024-05-24T03:48:10Z" } …

Slide 15

Slide 15 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Flat database (binary) row-based column-based

Slide 16

Slide 16 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Implementation

Slide 17

Slide 17 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query engine SELECT c.*, o.*, oi.* FROM customers c JOIN orders o ON c.id = o.id JOIN items oi ON o.id = oi.id WHERE c.city = 'Berlin' AND o.date >= DATE '2023-03-01' Amazon Athena Query Engine

Slide 18

Slide 18 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data API Security

Slide 19

Slide 19 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Column-level Access

Slide 20

Slide 20 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Row-level Access

Slide 21

Slide 21 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Native Integration Data Catalog Data Lake Query Engine Business Intelligence IAM

Slide 22

Slide 22 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Sync API API Gateway IAM FaaS Data Catalog Data Lake Query Engine

Slide 23

Slide 23 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Async API API Gateway IAM FaaS Data Catalog Data Lake Query Engine Queue Temp Storage Notification

Slide 24

Slide 24 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Takeaways Everything is HTTP, but there’s more than REST. Understand the nature of your data and the needs of your customers. Be aware of cost drivers and performance killers.

Slide 25

Slide 25 text

© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! Alexander Günsche Senior Solutions Architect lxg@amazon.com