Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Warehouses to Lakes: The Value of Streams

From Warehouses to Lakes: The Value of Streams

Every business has a wealth of data but getting value from data is hard. We've tried Data Warehouses and Data Lakes, and while both give us insights we are after, they present their own challenges. Perhaps most challenging of all is making decisions based on yesterday's data. In this talk we'll look at how you can start using your data to make decisions as events happen in your business and how we can even make predictions too. Best of all, we can populate our Data Lakes and Data Warehouses at the same time keeping all the historic analytics in place.

F1e0e0c3c3196a63c9b17a2344fb6a61?s=128

Mike Fowler

March 05, 2020
Tweet

Transcript

  1. @mlfowler_ @Claranet From Warehouses to Lakes: The Value of Streams

    Mike Fowler - Principal Data Engineer PLACE CUSTOMER LOGO HERE March 5th 2020
  2. About Me About Me

  3. I Know Data

  4. @mlfowler_ @Claranet A Quick Poll Source: https://www.18stripes.com/2018-preseason-prediction-poll-time/

  5. @mlfowler_ @Claranet A Quick Poll

  6. @mlfowler_ @Claranet A Quick Poll

  7. @mlfowler_ @Claranet A Quick Poll

  8. @mlfowler_ @Claranet A Quick Poll

  9. @mlfowler_ @Claranet A Quick Poll

  10. @mlfowler_ @Claranet Data is the New Oil mattbuck (category) /

    CC BY-SA (https://creativecommons.org/licenses/by-sa/2.0) Source: https://twitter.com/TheEconomist/status/860135249552003073?s=20
  11. @mlfowler_ @Claranet The Data Warehouse mattbuck (category) / CC BY-SA

    (https://creativecommons.org/licenses/by-sa/2.0)
  12. @mlfowler_ @Claranet Data Sources

  13. @mlfowler_ @Claranet Data Sources - Extract

  14. @mlfowler_ @Claranet Data Sources - Extract - Transform

  15. @mlfowler_ @Claranet Data Sources - Extract - Transform - Load

  16. @mlfowler_ @Claranet Dashboards! Reports! Stiegenaufgang [MIT (http://opensource.org/licenses/mit-license.php) or BSD (http://opensource.org/licenses/bsd-license.php)]

  17. @mlfowler_ @Claranet Problem: Pulling too hard on the source

  18. @mlfowler_ @Claranet Problem: Lag / Timeliness Source: https://www.mentalfloss.com/article/62725/15-things-you-didnt-know-about-persistence-memory

  19. @mlfowler_ @Claranet Problem: Flooding the Warehouse Source: https://www.shetnews.co.uk/2011/12/08/little-weather-damage-so-far/

  20. @mlfowler_ @Claranet Problem: Model First Source: https://www.tamr.com/blog/stop-putting-the-ai-cart-before-the-data-horse/

  21. @mlfowler_ @Claranet Problem: Change is Expensive Source: The Dark Knight,

    2008
  22. @mlfowler_ @Claranet The Data Lake Source:https://trimmtravels.com/best-time-to-visit-lake-louise/

  23. @mlfowler_ @Claranet Data Sources

  24. @mlfowler_ @Claranet Data Sources - Extract

  25. @mlfowler_ @Claranet Data Sources - Extract - Load

  26. @mlfowler_ @Claranet Data Sources - Extract - Load - Transform

  27. @mlfowler_ @Claranet Dashboards! Reports! Stiegenaufgang [MIT (http://opensource.org/licenses/mit-license.php) or BSD (http://opensource.org/licenses/bsd-license.php)]

  28. @mlfowler_ @Claranet Benefit: Storage Independent of Compute Source: https://scifi.stackexchange.com/questions/178793/which-way-is-up-in-a-borg-cube

  29. @mlfowler_ @Claranet Benefit: Model on Demand Source: https://mythcreants.com/blog/implications-of-replicator-technology/

  30. @mlfowler_ @Claranet Problem: Pulling too hard on the source

  31. @mlfowler_ @Claranet Problem: Lag / Timeliness Source: https://www.mentalfloss.com/article/62725/15-things-you-didnt-know-about-persistence-memory

  32. @mlfowler_ @Claranet Problem: The Data Swamp Source:https://trimmtravels.com/best-time-to-visit-lake-louise/ Source: https://techcrunch.com/2017/02/04/drain-the-swamp/

  33. @mlfowler_ @Claranet Data Streams Source:https://trimmtravels.com/best-time-to-visit-lake-louise/ Source: https://techcrunch.com/2017/02/04/drain-the-swamp/ Source: By Bjørn

    Christian Tørrissen - Own work by uploader, http://bjornfree.com/galleries.html, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=17476156
  34. @mlfowler_ @Claranet Data Sources

  35. @mlfowler_ @Claranet Data Sources - Ingest

  36. @mlfowler_ @Claranet Data Sources - Ingest - Store

  37. @mlfowler_ @Claranet Data Sources - Ingest - Store - Process

  38. @mlfowler_ @Claranet Seriously?

  39. @mlfowler_ @Claranet Data is the New Water

  40. @mlfowler_ @Claranet Sources - Ingest - Store - Process -

    Repeat
  41. @mlfowler_ @Claranet Ingest Ingest Kinesis Data Streams (Streaming) SQS (Messaging)

    Glue (Integration) Cloud Dataflow (Streaming) Pub/Sub (Messaging) Cloud Fusion (Integration) Event Hubs (Streaming) Queue Storage (Messaging) Data Factory (Integration) Kafka (Streaming) ActiveMQ (Messaging) Hive (Integration)
  42. @mlfowler_ @Claranet Ingest Kinesis Data Streams (Streaming) SQS (Messaging) Glue

    (Integration) Cloud Dataflow (Streaming) Pub/Sub (Messaging) Cloud Fusion (Integration) Event Hubs (Streaming) Queue Storage (Messaging) Data Factory (Integration) Kafka (Streaming) ActiveMQ (Messaging) Hive (Integration) Messaging Streaming Integration
  43. @mlfowler_ @Claranet HDFS (Unstructured) HBase (Semi-Structured) PostgreSQL (Structured) Data Lake

    Storage (Unstructured) Cosmos DB (Semi-Structured) SQL Data Warehouse (Structured) S3 (Unstructured) DynamoDB (Semi-Structured) Redshift (Structured) Ingest Store Cloud Bigtable (Semi-Structured) Cloud Storage (Unstructured) BigQuery (Structured)
  44. @mlfowler_ @Claranet HDFS (Unstructured) HBase (Semi-Structured) PostgreSQL (Structured) Data Lake

    Storage (Unstructured) Cosmos DB (Semi-Structured) SQL Data Warehouse (Structured) S3 (Unstructured) DynamoDB (Semi-Structured) Redshift (Structured) Store Cloud Bigtable (Semi-Structured) Cloud Storage (Unstructured) BigQuery (Structured) Unstructured Semi- Structured Structured
  45. @mlfowler_ @Claranet Ingest EMR (Batch) Kinesis Data Analytics (Streaming) SageMaker

    (Modelling) Databricks (Batch) Stream Analytics (Streaming) Machine Learning Service (Modelling) Spark (Batch) Beam (Streaming) Jupyter Notebooks (Modelling) Process & Analyse Cloud Dataflow (Streaming) Cloud Dataproc (Batch) Cloud Datalab (Modelling)
  46. @mlfowler_ @Claranet EMR (Batch) Kinesis Data Analytics (Streaming) SageMaker (Modelling)

    Databricks (Batch) Stream Analytics (Streaming) Machine Learning Service (Modelling) Spark (Batch) Beam (Streaming) Jupyter Notebooks (Modelling) Process & Analyse Cloud Dataflow (Streaming) Cloud Dataproc (Batch) Cloud Datalab (Modelling) Batch Streaming Modelling
  47. @mlfowler_ @Claranet Putting it Together Source: https://nerdist.com/article/star-trek-picard-data-where-he-is-now/

  48. @mlfowler_ @Claranet Scenario: Being On Call Source: https://www.silicon.co.uk/wp-content/uploads/2017/02/Pager.jpg

  49. @mlfowler_ @Claranet Our engineer rests peacefully Source: https://i.pinimg.com/originals/cb/32/5f/cb325f9c268bf2135125f512d957f8e6.jpg

  50. @mlfowler_ @Claranet 04:03 - Prod is Down!!!! Source: https://vignette.wikia.nocookie.net/memoryalpha/images/6/6b/RedAlert.jpg/revision/latest?cb=20100117050244&path-prefix=en

  51. @mlfowler_ @Claranet 04:04 - All Clear! Source: https://www.lakelouiseinn.com/wp-content/uploads/2019/01/LakeLouise2-1.jpg

  52. @mlfowler_ @Claranet

  53. @mlfowler_ @Claranet The Problem Many PagerDuty incidents resolve before I

    respond disrupting my sleep needlessly
  54. @mlfowler_ @Claranet Introducing Mr Data

  55. @mlfowler_ @Claranet Introducing Mr Data

  56. @mlfowler_ @Claranet Introducing Mr Data

  57. @mlfowler_ @Claranet A Streaming Solution

  58. @mlfowler_ @Claranet A Streaming Solution

  59. @mlfowler_ @Claranet A Streaming Solution

  60. @mlfowler_ @Claranet A Streaming Solution

  61. @mlfowler_ @Claranet A Streaming Solution

  62. @mlfowler_ @Claranet A Streaming Solution

  63. @mlfowler_ @Claranet A Streaming Solution

  64. @mlfowler_ @Claranet A Streaming Solution

  65. @mlfowler_ @Claranet A Streaming Solution

  66. @mlfowler_ @Claranet A Google Cloud Architecture

  67. @mlfowler_ @Claranet Solving the Problems

  68. @mlfowler_ @Claranet Solving the Problems

  69. @mlfowler_ @Claranet Solving the Problems

  70. @mlfowler_ @Claranet Solving the Problems

  71. @mlfowler_ @Claranet Fin

  72. Questions? Mike Fowler mlfowler gh-mlfowler @mlfowler_

  73. None