Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Warehouses to Lakes: The Value of Streams

From Warehouses to Lakes: The Value of Streams

Every business has a wealth of data but getting value from data is hard. We've tried Data Warehouses and Data Lakes, and while both give us insights we are after, they present their own challenges. Perhaps most challenging of all is making decisions based on yesterday's data. In this talk we'll look at how you can start using your data to make decisions as events happen in your business and how we can even make predictions too. Best of all, we can populate our Data Lakes and Data Warehouses at the same time keeping all the historic analytics in place.

Mike Fowler

March 05, 2020
Tweet

More Decks by Mike Fowler

Other Decks in Technology

Transcript

  1. @mlfowler_ @Claranet From Warehouses to Lakes: The Value of Streams

    Mike Fowler - Principal Data Engineer PLACE CUSTOMER LOGO HERE March 5th 2020
  2. @mlfowler_ @Claranet Data is the New Oil mattbuck (category) /

    CC BY-SA (https://creativecommons.org/licenses/by-sa/2.0) Source: https://twitter.com/TheEconomist/status/860135249552003073?s=20
  3. @mlfowler_ @Claranet The Data Warehouse mattbuck (category) / CC BY-SA

    (https://creativecommons.org/licenses/by-sa/2.0)
  4. @mlfowler_ @Claranet Data Streams Source:https://trimmtravels.com/best-time-to-visit-lake-louise/ Source: https://techcrunch.com/2017/02/04/drain-the-swamp/ Source: By Bjørn

    Christian Tørrissen - Own work by uploader, http://bjornfree.com/galleries.html, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17476156
  5. @mlfowler_ @Claranet Ingest Ingest Kinesis Data Streams (Streaming) SQS (Messaging)

    Glue (Integration) Cloud Dataflow (Streaming) Pub/Sub (Messaging) Cloud Fusion (Integration) Event Hubs (Streaming) Queue Storage (Messaging) Data Factory (Integration) Kafka (Streaming) ActiveMQ (Messaging) Hive (Integration)
  6. @mlfowler_ @Claranet Ingest Kinesis Data Streams (Streaming) SQS (Messaging) Glue

    (Integration) Cloud Dataflow (Streaming) Pub/Sub (Messaging) Cloud Fusion (Integration) Event Hubs (Streaming) Queue Storage (Messaging) Data Factory (Integration) Kafka (Streaming) ActiveMQ (Messaging) Hive (Integration) Messaging Streaming Integration
  7. @mlfowler_ @Claranet HDFS (Unstructured) HBase (Semi-Structured) PostgreSQL (Structured) Data Lake

    Storage (Unstructured) Cosmos DB (Semi-Structured) SQL Data Warehouse (Structured) S3 (Unstructured) DynamoDB (Semi-Structured) Redshift (Structured) Ingest Store Cloud Bigtable (Semi-Structured) Cloud Storage (Unstructured) BigQuery (Structured)
  8. @mlfowler_ @Claranet HDFS (Unstructured) HBase (Semi-Structured) PostgreSQL (Structured) Data Lake

    Storage (Unstructured) Cosmos DB (Semi-Structured) SQL Data Warehouse (Structured) S3 (Unstructured) DynamoDB (Semi-Structured) Redshift (Structured) Store Cloud Bigtable (Semi-Structured) Cloud Storage (Unstructured) BigQuery (Structured) Unstructured Semi- Structured Structured
  9. @mlfowler_ @Claranet Ingest EMR (Batch) Kinesis Data Analytics (Streaming) SageMaker

    (Modelling) Databricks (Batch) Stream Analytics (Streaming) Machine Learning Service (Modelling) Spark (Batch) Beam (Streaming) Jupyter Notebooks (Modelling) Process & Analyse Cloud Dataflow (Streaming) Cloud Dataproc (Batch) Cloud Datalab (Modelling)
  10. @mlfowler_ @Claranet EMR (Batch) Kinesis Data Analytics (Streaming) SageMaker (Modelling)

    Databricks (Batch) Stream Analytics (Streaming) Machine Learning Service (Modelling) Spark (Batch) Beam (Streaming) Jupyter Notebooks (Modelling) Process & Analyse Cloud Dataflow (Streaming) Cloud Dataproc (Batch) Cloud Datalab (Modelling) Batch Streaming Modelling