time working on repeatable architectural patterns and guidance for people interested in using Google Cloud Platform in the form of papers, code, architectures. Typically spend time talking about Big Data and Containers. Before Google, I was at MongoDB, Ravel, 21CT, Affinegy, Apple. I’ve been in Austin for ~12 years so I get to complain about everything. Find me on Twitter @crcsmnky
2010 GFS MapReduce BigTable Colossus Dremel Flume Megastore Spanner Millwheel PubSub F1 Google Research in Data Technologies Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html
2010 GFS MapReduce BigTable Colossus Dremel Flume Megastore Spanner Millwheel PubSub F1 Open Source Ecosystem Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html
here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html 2012 2013 2002 2004 2006 2008 2010 Cloud Storage Dataproc Bigtable Cloud Storage BigQuery Dataflow Datastore Spanner Dataflow PubSub F1 Cloud Platform Data Infrastructure
Batched read/write Custom labels Push & Pull Auto expiration Cloud Pub/Sub Pub A Pub B Pub C Topic C Sub A Sub B Sub C1 Sub C2 Cloud Pub/Sub Subscriber X Subscriber Y Subscriber Z Message 1 Message 2 Topic A Topic B Message 3 Message 1 Message 2 Message 3 Message 3
Preemptible VMs are 70% cheaper Spin up clusters of any size in 90 seconds Separation of storage and compute Run clusters segregated by job or function Per-minute billing 1 2 3 4 5
of PBs Low-latency and high-throughput Massively scalable NoSQL database SSDs or HDDs depending on need Easy to integrate with Dataflow, Dataproc 1 2 3 4 5
Value Blob SQL Cloud Storage Cloud Bigtable Cloud Datastore Cloud SQL Good for: Structured and unstructured binary or object data Such as: Images, large media files, backups Good for: Getting started, App Engine, serve use cases Such as: User profiles, product catalog Good for: Web frameworks, existing applications Such as: User credentials, customer orders Good for: Heavy read + write, events, and analytical data Such as: AdTech, Financial and IoT data Memcache Good for: Web/mobile applications, gaming Such as: Game state, user sessions
Fast and scales automatically No setup or administration Stream up to 100,000 rows/sec Integrates with third-party software like Tableau Google BigQuery
transform and process data on Google Cloud Platform or locally. Built on IPython/Jupyter which already has a thriving ecosystem of modules and a huge knowledge base. Write code in multiple languages: Python, SQL and JavaScript. Fully Integrated Built on Jupyter Choose your language Notebooks It leverages the power of Cloud Storage, BigQuery, Cloud DataStore and Cloud SQL for analyses.
tools and availability of third party libraries. Explore and analyze data with ad hoc queries and visualizations. Explore, transform and process data collaboratively or publish data as reports, dashboards or APIs. Collaboration Reach Increase Productivity Simplicity Makes Google’s Big Data capabilities easier to use and therefore more accessible across the company.
Beam Incubating at Apache Open source SDKs Open source runners for Spark and Flink Cloud Datalab built on Jupyter Cloud Bigtable supports HBase 1.0 API Cloud Dataproc open source Hadoop and Spark Kubernetes completely open source
Maps • Scalable Geolocation Telemetry using Cloud • Machine Learning with Financial Time Series • Cloud Bigtable Schema Design for Time Series Data • Analyzing Financial Time Series using BigQuery • Processing Logs at Scale using Cloud Dataflow A Few Solutions • Reliable Task Scheduling on Google Compute Engine • Real-Time Inventory using Google Cloud • Distributed Load Testing Using Kubernetes • Deploying Microservices on Google App Engine • Automated Image Builds with Jenkins, Packer, and Kubernetes • Real-time data analysis with Kubernetes, Redis, and BigQuery