Build Data Application with Dagster

Build Data Application with Dagster

Weera Kasetsin (Ball)
LINE Thailand Common Engineering Office Head
https://linedevday.linecorp.com/jp/2019/sessions/S1-04

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Build Data Application With Dagster > Weera Kasetsin

    (Ball) > LINE Thailand Common Engineering Office Head
  2. The story

  3. I spent 20% of my time building my web/app, and

    80% of my time fighting the browser.
  4. I spent 80% of my time cleaning the data, and

    20% of my time doing my job.
  5. Data Cleaning • Rolling own custom infrastructure • Maintaining unreliable

    processes build atop untested software • Doing repetitive work that should not be necessary • And much more…
  6. Engineering Data Science Current Engineering Data Science Goal

  7. “I do believe in software engineering”

  8. Data Application

  9. What is data application? Data application is graph of functional

    computations that consume and produce data assets
  10. Dagster building modern data application

  11. Multiple Output

  12. None
  13. def split_cereals(context, cereals): if context.solid_config["process_hot"]: hot_cereals = DataFrame( [cereal for

    cereal in cereals if cereal["type"] == "H"] ) yield Output(hot_cereals, "hot_cereals") if context.solid_config["process_cold"]: hot_cereals = DataFrame( [cereal for cereal in cereals if cereal["type"] == "C"] ) yield Output(hot_cereals, "cold_cereals")
  14. @pipeline def multiple_output_pipeline(): hot_cereals, cold_cereals = split_cereals(read_csv()) sort_hot_cereals_by_calories(hot_cereals) sort_cold_cereals_by_calories(cold_cereals)

  15. Pipeline Modes

  16. None
  17. @pipeline( mode_defs=[ ModeDefinition( name='unittest', resource_defs={'warehouse': local_sqlite_warehouse_resource}, ), ModeDefinition( name='dev', resource_defs={

    'warehouse': sqlachemy_postgres_warehouse_resource }, ), ] ) def modes_pipeline(): normalize_calories(read_csv())
  18. $ dagster pipeline execute -f modes.py -n mode_pipeline -d dev

  19. Type-Checking

  20. None
  21. None
  22. @ballweera WEERA KASETSIN https://dev.to/ballweera