Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Bigtable @ GCPUG Taipei #2

Cloud Bigtable @ GCPUG Taipei #2


Ian Lewis

June 06, 2015

More Decks by Ian Lewis

Other Decks in Technology


  1. Google Cloud Bigtable Ian Lewis, Developer Advocate, Google

  2. Agenda 1 2 Bigtable and HBase - 15 minutes Google

    Cloud Bigtable - 15 minutes
  3. Cloud Bigtable

  4. “Organize the world’s information and make it universally accessible and

    useful.” - Google’s Mission Statement
  5. To organize big data... … you need a BIG database.

  6. Thus The White Paper in 2006 • Jeff Dean and

    Sanjay Ghemawat set out to figure out what this database looks like • And came up with...
  7. 2002 2004 2006 2008 2010 2012 Colossus MapReduce Spanner Bigtable

    Dremel GFS Other Google Innovations 2013 2014 Dataflow Kubernetes
  8. Bigtable Plus Hundreds of Internal Services Bigtable as Inspiration and

    Applications within Google Google is not affiliated or endorsed by any of these companies. Apache HBase, Apache Cassandra and Apache Accumulo are trademarks are of The Apache Software Foundation. Hypertable is the trademark of Hypertable Inc.
  9. 9 The Bigtable Data Model Google Cloud Bigtable Bigtable (and

    HBase)... • is a NoSQL (no-join) distributed key-value store, designed to scale-out • has only one index (the row-key) • supports atomic single-row transactions
  10. Put, Increment, Append Bigtable Replication Full Scan, Map Reduce +Filters

    Gets, Short Scan +Filters Bulk Import Low Latency High Throughput Bigtable Replication 10 Basic Functional Usage Google Cloud Bigtable
  11. 3 Generations - #1: Original Bigtable • Jeff and Sanjay

    decided to build a database service that could scale linearly across thousands and thousands of commodity servers ◦ Systems will fail, retain performance at scale • Leave the traditional relational model to achieve goals • The first generation was about: ◦ Prototyping and build the service to do its first scaling ◦ Migrate initial applications to Bigtable ◦ Invent replication and first multi-tenant version of Bigtable ◦ Painful rediscovery
  12. 3 Generations - #2: Bigtable Stabilized • Not only analytics

    - now web serving as well ◦ Making it very low latency and bringing in the 99th % of requests [this is a hard problem] • Perfecting the Bigtable service ◦ What is that: a multi-tenant shared service model for a single database on a common set of resources [this is a hard problem] ▪ Spikes in CPU happen quickly and reacting to abusive usage is difficult to do effectively ▪ Hard-capping leaves resources on the table, and you lose the agility and efficiency you were looking for
  13. Other Neat Bigtable Innovations • Memory heavy clusters, especially if

    we think we can get a pretty high cache hit rate with a modest increase in memory • Mixed media clusters - mixture of SSD + HDD storage and an ability to specify an affinity • Tabletserver failure - Target is recovery in 1 second or less rather than 10s of seconds or minutes = appears to customer as latency if at all • Effortless Bigtable replication either in multiple zones for higher availability or across the world for better latency
  14. 3 Generations - #3: Google Cloud Bigtable • Offered as

    a fully-managed service, simplifying operations and management of applications • Cloud Bigtable allows developers to quickly build applications to an industry standard API with no need to focus on infrastructure • Simple pricing model with serve resources and storage resources separated • High performance, and low latency, and low cost, and little to no configuration
  15. Cloud Bigtable Data API Data can be read from and

    written to Cloud Bigtable through a RESTful or RPC-based data service layer. Typically this will be to serve data to applications, dashboards and other microservices. Streaming Data can be streamed in (written event by event) through a variety of popular stream processing frameworks. Batch Processing Data can be read from and written to Cloud Bigtable through batch processing systems (either MapReduce based or analytical). Often, summarized or newly calculated data is written back to Cloud Bigtable or to a downstream database. Review Typical Access Patterns
  16. Interface/API: Standardized • Cloud Bigtable is compatible with the HBase

    1.0+ API/Client • While HBase is a separate system from Bigtable we have close ties to the community • We like the community - lots of voices, moving together, reps from many major tech giants, very widely adopted • Semantics and operations are very similar ◦ Want it to be easy to understand, transition to, develop against • Release tools that work with Cloud Bigtable and HBase and vice versa ◦ Grow the whole community so that all benefit
  17. Pricing Model: Simple • In Cloud Bigtable you can provision

    and change the serving resources with a single button with a single per-hour pricing ◦ What are Bigtable nodes? ◦ This is just the raw compute power that makes up the serve path - separate from the persisted storage tier • You’re billed separately for the amount of storage you use of whatever medium you choose (SSD or HDD) • This makes it super simple to plan for your workload and understand what your costs are
  18. Pricing Model Google Cloud Bigtable Bigtable nodes Each node will

    deliver up to 10,000 QPS and 10 MB/s of throughput Cost per hour Minimum number of nodes per cluster $0.65 3 Storage SSD storage (GB/mo) HDD storage (GB/mo) (coming soon) $0.17 $0.026 On creation of a Bigtable cluster, customers provision throughput for their workload in the form of Bigtable nodes. Storage is charged on a per-use basis.
  19. Create/Configure UI: Easy

  20. Management: Easy • Who in the audience have used HBase

    before? • Things you will not see in Cloud Bigtable: ◦ Compactions ◦ Pre-splitting ◦ Lots of configuration settings ◦ 1 minute regionserver outages ◦ Coprocessors (for now)
  21. Financial Services Faster risk analysis, credit card fraud/abuse Marketing/ Digital

    Media User engagement, clickstream analysis, real-time adaptive content Internet of Things Sensor data dashboards and anomaly detection Telecommunications Sampled traffic patterns, metric collection and reporting Energy Oil well sensors, anomaly detection, predictive modeling Biomedical Genomics sequencing data analysis Cloud Bigtable Use Cases
  22. TLDR: Serious Machinery

  23. Cloud Bigtable Roadmap • Integrations, Integrations, Integrations! • HDD Bigtable

    (at 0.026 per GB-month) • Configurable automatic/manual replication • Additional clients • Throughput auto-scaling • Snapshots and restores • Report card ◦ No you’re not in trouble… well you may be.
  24. Thank you! Ian Lewis Developer Advocate Google Cloud Platform ianlewis@google.com