Slide 1

Slide 1 text

Building Scalable and Flexible API by Leveraging GraphQL and BigTable Andi N. Dirgantara Lead Data Engineer at Traveloka

Slide 2

Slide 2 text

My Profile ● More than 6 years as software engineer ● Last 4 years focused at data engineering (big data) ● Lead Data Engineer at Traveloka ● Lead Facebook Developer Circles Malang ● Working remotely from Malang ● Urban and Regional Planning Graduate ● Owner The Bros Coffee and Coworking Space (@thebros_co) ● Owner Cahayu Aesthetic and Slimming Center (@cahayu.clinic)

Slide 3

Slide 3 text

The Problems

Slide 4

Slide 4 text

We have MySQL installed, exposed by our application via REST API Everything went well until… We faced 5,000 rows written per seconds (1.8 millions rows per hour) Storage consume more than 100GB each day Single query can takes more than 5 hours Single row can have up to 1000 columns The system hit by 10,000 RPS What should we do?

Slide 5

Slide 5 text

Breaking Down The Problems 5,000 rows written per seconds 100GB each days Query takes >5 hours 1000 columns 10,000 RPS Storage (Database) Problems API Problems We need scalable storage and flexible API

Slide 6

Slide 6 text

Solutions

Slide 7

Slide 7 text

Use distributed system to leverage horizontal scalability

Slide 8

Slide 8 text

We Choose BigTable for Distributed Storage ● Low latency distributed storage ● Eventually consistent to leverage high throughput and high availability through replication (it can be set to strong consistency too if we want) ● Columnar storage which able to store millions of columns More informations go to https://cloud.google.com/bigtable/ Machine 1 Data 1 Write Read Machine 2 Data 2 Machine n Data n

Slide 9

Slide 9 text

Now our system already scalable but we have 30 products each product have at least 10 columns some products have 50 columns how much REST endpoint should we provide? who will maintain each endpoints? what if some system need to consume more than 5 endpoints? We need queryable API...

Slide 10

Slide 10 text

GraphQL Come to The Rescue ● Model business case as graph ● Just query what you need ● Has dashboard playground ● Reduces network requests to 1 for multiple “endpoints” requests More informations visit https://graphql.org/

Slide 11

Slide 11 text

GraphQL + BigTable = Profit! query { customer(profileId: "123456") { hotel { edges { node { name address checkInTime } } } flight { edges { node { bookingId origin destination } } } } } RowKey 123456 hotel name hotel address hotel checkInTime flight bookingId flight origin flight destination Only scanning and delivering the necessary data

Slide 12

Slide 12 text

What’s Next?

Slide 13

Slide 13 text

The Tradeoff and Room to Improve Room to Improve ● Optimizing/ leveraging optional data storage for more complex use cases (CockroachDB, Spanner, CitusDB, etc.) ● Access control system per column ● … suggestion? ... Tradeoff ● The learning curve is steep ● Hard to find talent which experienced in our tech. stack (GoLang, GraphQL, BigTable, etc.) ● BigTable cluster is relatively expensive, so proper data modelling is necessary to avoid wasting resources

Slide 14

Slide 14 text

Thank you and let’s keep in touch! fb.me/andi.n.dirgantara andi_dirgantara hellowin Do you want to propose better solution? Our team is hiring...