Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Igor Khrol] Data Warehouse in Google Cloud

[Igor Khrol] Data Warehouse in Google Cloud

Presentation from GDG DevFest Ukraine 2017 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

Google Developers Group Lviv

October 13, 2017
Tweet

More Decks by Google Developers Group Lviv

Other Decks in Technology

Transcript

  1. Hire the top 3% of freelance talent www.toptal.com Data Warehouse

    in Google Cloud Igor Khrol Minsk, Belarus www.toptal.com Hire the top 3% of freelance talent
  2. Hire the top 3% of freelance talent www.toptal.com Who am

    I? • Igor Khrol • Team Lead / QA Engineer at Toptal Analytics department • >10 years in IT • Engineer, Team Lead, Manager, Architect, Trainer, Consultant • Python, Scala, Ruby, Java, SQL etc • www.khroliz.com 2
  3. Hire the top 3% of freelance talent www.toptal.com Why am

    I here? • Share Google Cloud usage experience • Discuss Data Warehouse as a solution • Talk about both good and bad
  4. Hire the top 3% of freelance talent www.toptal.com • Freelancing

    platform • Freelancers are screened • Clients are also filtered • Focus on automation and quality
  5. Hire the top 3% of freelance talent www.toptal.com Analytics Department

    • Help business in decision making • Business process control • Reports, charts, dashboards
  6. Hire the top 3% of freelance talent www.toptal.com Why Warehouse?

    www.toptal.com Hire the top 3% of freelance talent
  7. Hire the top 3% of freelance talent www.toptal.com Analytics 1.0

    Ruby-on-Rails application Web Ruby on Rails Database
  8. Hire the top 3% of freelance talent www.toptal.com Analytics 2.0

    Ruby-on-Rails application + Scala/Spark Web Ruby on Rails DB Spark Scala
  9. Hire the top 3% of freelance talent www.toptal.com Analytics 1.0/2.0

    problems • No execution history • Too monolithic • SQL code is poorly reusable • Too much time to create new charts/dashboards...
  10. Hire the top 3% of freelance talent www.toptal.com Data Warehouse

    A large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
  11. Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports Extract Transform Load
  12. Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports
  13. Hire the top 3% of freelance talent www.toptal.com • Stores

    historical data • Used by ETL to reconstruct changes history • Stores current data • Analytics “UI” • Used by stakeholders as self-service analytics • Tomorrow at 14:50 by Márton Kodok
  14. Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports Apache Avro
  15. Hire the top 3% of freelance talent www.toptal.com Apache Avro

    A remote procedure call and data serialization framework developed within Apache's Hadoop project. • Serialized data in a compact binary format • Data types
  16. Hire the top 3% of freelance talent www.toptal.com ETL Warehouse

    Datasource Datasource Datasource Explore C harts Reports
  17. Hire the top 3% of freelance talent www.toptal.com Luigi Luigi

    is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc.
  18. Hire the top 3% of freelance talent www.toptal.com Or simple

    way... • Luigi has tasks • Tasks have targets and requirements • If target is absent task is executed • Before task run required tasks should be completed ETL Task Developers Countries Country Statisctics
  19. Hire the top 3% of freelance talent www.toptal.com Microservices in

    GAE • Quick and easy to start • Seamless integration with BigQuery ◦ Data should be cached for quick access • Examples: ◦ Machine Learning services ◦ Monitoring dashboards
  20. Hire the top 3% of freelance talent www.toptal.com Lessons learned...

    www.toptal.com Hire the top 3% of freelance talent
  21. Hire the top 3% of freelance talent www.toptal.com Benefits •

    MVP in 1-2 months • Easy to integrate with Google Accounts at Toptal • BigQuery clients are everywhere • Scalable • Reliable
  22. Hire the top 3% of freelance talent www.toptal.com Drawbacks •

    High latency with requests • Not full avro protocol support • Flakiness in Google Infrastructure • BigQuery permission model is not flexible enough
  23. Hire the top 3% of freelance talent www.toptal.com Thank you!

    Questions? www.toptal.com Hire the top 3% of freelance talent Igor Khrol [email protected] [email protected] skype: igor.khrol https://github.com/Khrol/luigi_google_demo