Upgrade to Pro — share decks privately, control downloads, hide ads and more …

限りなくストリーミングに近いバッチ処理を目指して / #DPCT 20190416

kyontan
April 16, 2019

限りなくストリーミングに近いバッチ処理を目指して / #DPCT 20190416

Data Pipeline Casual Talk Vol.2 (データパイプラインに関する知見をカジュアルに語る会) の発表資料です。
https://dpct.connpass.com/event/121371

kyontan

April 16, 2019
Tweet

More Decks by kyontan

Other Decks in Programming

Transcript

  1. !8

  2. !8

  3. !11

  4. !19

  5. !20

  6. !26

  7. !28

  8. !30

  9. !31

  10. !32 
 SELECT * EXCEPT(rn) FROM ( SELECT *, row_number()

    OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM ( SELECT * FROM dataset.diff -- diff UNION ALL SELECT * FROM dataset.master -- destination ) ) WHERE rn = 1
  11. bq query --destination_table=dataset.master " " !33 SELECT * EXCEPT(rn) FROM

    ( SELECT *, row_number() OVER (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM ( SELECT * FROM dataset.diff -- diff UNION ALL SELECT * FROM dataset.master -- destination ) ) WHERE rn = 1 ɾɾɾ
  12. → !37 MERGE dataset.master T USING ( SELECT * EXCEPT(rn)

    FROM ( SELECT *, row_number() over (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM dataset.diff) WHERE rn = 1 ) S ON T.id = S.id WHEN MATCHED AND T.updated_at < S.updated_at THEN UPDATE SET id = S.id, ..., updated_at = S.updated_at WHEN NOT MATCHED THEN INSERT (id, ..., updated_at) VALUES (id, ..., updated_at)
  13. → !38 MERGE dataset.master T USING ( SELECT * EXCEPT(rn)

    FROM ( SELECT *, row_number() over (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM dataset.diff) WHERE rn = 1 ) S ON T.id = S.id WHEN MATCHED AND T.updated_at < S.updated_at THEN UPDATE SET id = S.id, ..., updated_at = S.updated_at WHEN NOT MATCHED THEN INSERT (id, ..., updated_at) VALUES (id, ..., updated_at) MERGE 

  14. !40

  15. !43

  16. !45

  17. !47

  18. !55

  19. !58

  20. !60