[Dmytro Timofeev] BigTable as a data source for massively scalable computations

Proprietary + Confidential Google Cloud Bigtable: petabyte-scale NoSQL database for
massively scalable computation Dima Timofeev Tools Specialist Google

Proprietary + Confidential Agenda Introduction Cloud Bigtable Up and running
Processing data Q&A

Proprietary + Confidential Introduction

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis
non erat sem Motivation Variety of structured/semi-structured data • URL content and associated metadata (contents, crawl metadata, links, anchors, pagerank, etc.) • Geographic locations (physical entities: shops, restaurants, etc.; roads, satellite image data, user annotations, etc.) Scalable system for dealing with high volume and high velocity • Concurrent support for services supporting billions of users • Petabyte-scale data size

non erat sem What is Bigtable? • NoSQL database • Large datasets • High throughput

Proprietary + Confidential Google Cloud Bigtable

Cloud Bigtable is the same service Google uses

How does Cloud Bigtable work? Clients Processing Storage Colossus file
system Bigtable node Bigtable node Bigtable node

Cloud Bigtable learns access patterns... Clients Processing Storage Colossus file
system Bigtable node Bigtable node Bigtable node A B C D E

… and rebalances data accordingly Clients Processing Storage Colossus file
system Bigtable node Bigtable node Bigtable node A B C D E

Throughput can be controlled by node count Node Node Node
Nodes 80,000 60,000 40,000 20,000 QPS Bigtable Nodes 6 4 2 0 0

Throughput can be controlled by node count 400,000 300,000 200,000
100,000 QPS Bigtable Nodes 20 10 0 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node

Throughput can be controlled by node count 4,000,000 3,000,000 2,000,000
1,000,000 QPS Bigtable Nodes 400 300 200 100 0 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node

Proprietary + Confidential Up and running

non erat sem Entry point https://cloud.google.com/bigtable/

non erat sem Creating a cluster

non erat sem Libraries • Java client — compatible with Apache HBase API • Go client • Python client • cbt command-line tool

non erat sem Python connection example from google.cloud import bigtable client = bigtable.Client(project=project_id, admin=True) instance = client.instance(instance_id) table = instance.table(table_id) table.create() column_family_id = 'cf1' cf1 = table.column_family(column_family_id) cf1.create()

Proprietary + Confidential Processing data

non erat sem Writing / reading

non erat sem Cloud Bigtable and Cloud Dataflow https://cloud.google.com/bigtable/docs/dataflow-hbase

non erat sem Cloud Bigtable and Cloud Dataproc

THANK YOU

[Dmytro Timofeev] BigTable as a data source for...

[Dmytro Timofeev] BigTable as a data source for massively scalable computations

Google Developers Group Lviv

More Decks by Google Developers Group Lviv

Other Decks in Technology

Featured

Transcript

Proprietary + Confidential Google Cloud Bigtable: petabyte-scale NoSQL database for

Proprietary + Confidential Agenda Introduction Cloud Bigtable Up and running

Proprietary + Confidential Introduction

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Proprietary + Confidential Google Cloud Bigtable

Cloud Bigtable is the same service Google uses

How does Cloud Bigtable work? Clients Processing Storage Colossus file

Cloud Bigtable learns access patterns... Clients Processing Storage Colossus file

… and rebalances data accordingly Clients Processing Storage Colossus file

Throughput can be controlled by node count Node Node Node

Throughput can be controlled by node count 400,000 300,000 200,000

Throughput can be controlled by node count 4,000,000 3,000,000 2,000,000

Proprietary + Confidential Up and running

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Proprietary + Confidential Processing data

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

THANK YOU