Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build Data Application with Dagster

Build Data Application with Dagster

Weera Kasetsin (Ball)
LINE Thailand Common Engineering Office Head
https://linedevday.linecorp.com/jp/2019/sessions/S1-04

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay Build Data Application With Dagster > Weera Kasetsin

    (Ball) > LINE Thailand Common Engineering Office Head
  2. I spent 20% of my time building my web/app, and

    80% of my time fighting the browser.
  3. I spent 80% of my time cleaning the data, and

    20% of my time doing my job.
  4. Data Cleaning • Rolling own custom infrastructure • Maintaining unreliable

    processes build atop untested software • Doing repetitive work that should not be necessary • And much more…
  5. What is data application? Data application is graph of functional

    computations that consume and produce data assets
  6. def split_cereals(context, cereals): if context.solid_config["process_hot"]: hot_cereals = DataFrame( [cereal for

    cereal in cereals if cereal["type"] == "H"] ) yield Output(hot_cereals, "hot_cereals") if context.solid_config["process_cold"]: hot_cereals = DataFrame( [cereal for cereal in cereals if cereal["type"] == "C"] ) yield Output(hot_cereals, "cold_cereals")
  7. @pipeline( mode_defs=[ ModeDefinition( name='unittest', resource_defs={'warehouse': local_sqlite_warehouse_resource}, ), ModeDefinition( name='dev', resource_defs={

    'warehouse': sqlachemy_postgres_warehouse_resource }, ), ] ) def modes_pipeline(): normalize_calories(read_csv())