[Igor Khrol] Data Warehouse in Google Cloud

[Igor Khrol] Data Warehouse in Google Cloud

Presentation from GDG DevFest Ukraine 2017 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

3a6de6bc902de7f75c0e753b3202ed52?s=128

Google Developers Group Lviv

October 13, 2017
Tweet

Transcript

  1. 1.

    Hire the top 3% of freelance talent www.toptal.com Data Warehouse

    in Google Cloud Igor Khrol Minsk, Belarus www.toptal.com Hire the top 3% of freelance talent
  2. 2.

    Hire the top 3% of freelance talent www.toptal.com Who am

    I? • Igor Khrol • Team Lead / QA Engineer at Toptal Analytics department • >10 years in IT • Engineer, Team Lead, Manager, Architect, Trainer, Consultant • Python, Scala, Ruby, Java, SQL etc • www.khroliz.com 2
  3. 3.

    Hire the top 3% of freelance talent www.toptal.com Why am

    I here? • Share Google Cloud usage experience • Discuss Data Warehouse as a solution • Talk about both good and bad
  4. 4.

    Hire the top 3% of freelance talent www.toptal.com • Freelancing

    platform • Freelancers are screened • Clients are also filtered • Focus on automation and quality
  5. 5.

    Hire the top 3% of freelance talent www.toptal.com Analytics Department

    • Help business in decision making • Business process control • Reports, charts, dashboards
  6. 6.

    Hire the top 3% of freelance talent www.toptal.com Why Warehouse?

    www.toptal.com Hire the top 3% of freelance talent
  7. 8.

    Hire the top 3% of freelance talent www.toptal.com Analytics 1.0

    Ruby-on-Rails application Web Ruby on Rails Database
  8. 9.

    Hire the top 3% of freelance talent www.toptal.com Analytics 2.0

    Ruby-on-Rails application + Scala/Spark Web Ruby on Rails DB Spark Scala
  9. 10.

    Hire the top 3% of freelance talent www.toptal.com Analytics 1.0/2.0

    problems • No execution history • Too monolithic • SQL code is poorly reusable • Too much time to create new charts/dashboards...
  10. 12.

    Hire the top 3% of freelance talent www.toptal.com Data Warehouse

    A large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
  11. 13.

    Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports Extract Transform Load
  12. 14.

    Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports
  13. 15.

    Hire the top 3% of freelance talent www.toptal.com • Stores

    historical data • Used by ETL to reconstruct changes history • Stores current data • Analytics “UI” • Used by stakeholders as self-service analytics • Tomorrow at 14:50 by Márton Kodok
  14. 16.

    Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports Apache Avro
  15. 17.

    Hire the top 3% of freelance talent www.toptal.com Apache Avro

    A remote procedure call and data serialization framework developed within Apache's Hadoop project. • Serialized data in a compact binary format • Data types
  16. 18.

    Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports
  17. 19.

    Hire the top 3% of freelance talent www.toptal.com Luigi Luigi

    is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc.
  18. 20.

    Hire the top 3% of freelance talent www.toptal.com Or simple

    way... • Luigi has tasks • Tasks have targets and requirements • If target is absent task is executed • Before task run required tasks should be completed ETL Task Developers Countries Country Statisctics
  19. 25.

    Hire the top 3% of freelance talent www.toptal.com Microservices in

    GAE • Quick and easy to start • Seamless integration with BigQuery ◦ Data should be cached for quick access • Examples: ◦ Machine Learning services ◦ Monitoring dashboards
  20. 26.

    Hire the top 3% of freelance talent www.toptal.com Lessons learned...

    www.toptal.com Hire the top 3% of freelance talent
  21. 27.

    Hire the top 3% of freelance talent www.toptal.com Benefits •

    MVP in 1-2 months • Easy to integrate with Google Accounts at Toptal • BigQuery clients are everywhere • Scalable • Reliable
  22. 28.

    Hire the top 3% of freelance talent www.toptal.com Drawbacks •

    High latency with requests • Not full avro protocol support • Flakiness in Google Infrastructure • BigQuery permission model is not flexible enough
  23. 29.

    Hire the top 3% of freelance talent www.toptal.com Thank you!

    Questions? www.toptal.com Hire the top 3% of freelance talent Igor Khrol igor.khrol@toptal.com khroliz@gmail.com skype: igor.khrol https://github.com/Khrol/luigi_google_demo