Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Dmytro Timofeev] BigTable as a data source for massively scalable computations

[Dmytro Timofeev] BigTable as a data source for massively scalable computations

Presentation from GDG DevFest Ukraine 2017 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

Google Developers Group Lviv

October 14, 2017
Tweet

More Decks by Google Developers Group Lviv

Other Decks in Technology

Transcript

  1. Proprietary + Confidential Google Cloud Bigtable: petabyte-scale NoSQL database for

    massively scalable computation Dima Timofeev Tools Specialist Google
  2. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Motivation Variety of structured/semi-structured data • URL content and associated metadata (contents, crawl metadata, links, anchors, pagerank, etc.) • Geographic locations (physical entities: shops, restaurants, etc.; roads, satellite image data, user annotations, etc.) Scalable system for dealing with high volume and high velocity • Concurrent support for services supporting billions of users • Petabyte-scale data size
  3. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem What is Bigtable? • NoSQL database • Large datasets • High throughput
  4. How does Cloud Bigtable work? Clients Processing Storage Colossus file

    system Bigtable node Bigtable node Bigtable node
  5. Cloud Bigtable learns access patterns... Clients Processing Storage Colossus file

    system Bigtable node Bigtable node Bigtable node A B C D E
  6. … and rebalances data accordingly Clients Processing Storage Colossus file

    system Bigtable node Bigtable node Bigtable node A B C D E
  7. Throughput can be controlled by node count Node Node Node

    Nodes 80,000 60,000 40,000 20,000 QPS Bigtable Nodes 6 4 2 0 0
  8. Throughput can be controlled by node count 400,000 300,000 200,000

    100,000 QPS Bigtable Nodes 20 10 0 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node
  9. Throughput can be controlled by node count 4,000,000 3,000,000 2,000,000

    1,000,000 QPS Bigtable Nodes 400 300 200 100 0 0 Nodes Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node
  10. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Entry point https://cloud.google.com/bigtable/
  11. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Libraries • Java client — compatible with Apache HBase API • Go client • Python client • cbt command-line tool
  12. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Python connection example from google.cloud import bigtable client = bigtable.Client(project=project_id, admin=True) instance = client.instance(instance_id) table = instance.table(table_id) table.create() column_family_id = 'cf1' cf1 = table.column_family(column_family_id) cf1.create()
  13. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Cloud Bigtable and Cloud Dataflow https://cloud.google.com/bigtable/docs/dataflow-hbase
  14. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis

    non erat sem Cloud Bigtable and Cloud Dataproc