Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기술 알아보기

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기술 알아보기

Woong Seok Kang

September 22, 2022
Tweet

Other Decks in Programming

Transcript

  1. 2022. 09. 21
 ъਔࢳ ௿ۄ਋٘ ؘ੉ఠ ೒ۖಬਸ ҳࢿೞח 
 ୭न

    ӝࣿ ঌইࠁӝ
  2. ߊ಴੗ ࣗѐ - ъਔࢳ • ؘ੉ఠ ӝࣿী ҙब੉ ݆਷ 6֙ର

    ূ૑פয • (੹) SaaS झఋ౟স, ஠஠য়झఋੌ, ܻ٣ • അ੤ח ೟Ү ׮פח ઺...
  3. द੘ೞӝ ੹ী • п ӝࣿী ؀ೠ ಣоח ୭؀ೠ ёҙ੸ਵ۽ ೮૑݅,

    ੷੄ ઱ҙ੉ ਷ো ઺ী ٜযщਸ ࣻب ੓णפ׮. • ੷ח AWSח ೂࠗೞѱ ॄࠁও૑݅, ׮ܲ ӝٜࣿ਷ ೐۽؋࣌ীࢲ ਍ਊ೧ࠁ૑ ঋই ੜޅػ ੿ࠁо ੓ਸ ࣻ ੓णפ׮. ೞ૑݅ ୭ࢶਸ ׮೧ ৢ߄ܲ ੿ࠁܳ ׸ਵ۰ ೮णפ׮. 😃 
 (ೞ૑݅ ׮ trial ೧ࠁӟ ೮णפ׮) • ৈ۞ ѐ֛ٜী ؀೧ ੷݅੄ ߑधਵ۽ ࢸݺೞח ࠗ࠙੉ ੓חؘ, ࢎۈ݃׮ ࢤп੉ ׮ܳ ࣻ ੓ ਵ޲۽ ੸੺൤ ߉ইٜৈ઱दݶ хࢎೞѷणפ׮.
  4. য়ט ࣗѐܾ٘ ղਊ • ؘ੉ఠ ೒ۖಬ੄ җѢ৬ അ੤ • ׮নೠ

    ௿ۄ਋٘ ؘ੉ఠ ೒ۖಬ ಁ۞׮੐ ࢓ಝࠁӝ • ରࣁ؀ ؘ੉ఠ ೒ۖಬ ӝࣿ ࢓ಝࠁӝ - Apache Iceberg, Delta Lake • ՘ݛ਺ ߂ Q&A
  5. ড 5֙ ੹...

  6. ؘ੉ఠ ೒ۖಬ

  7. ؘ੉ఠ ೒ۖಬ?

  8. ؘ੉ఠ ೒ۖಬ?

  9. ؘ੉ఠ ೒ۖಬ???

  10. ؘ੉ఠ ೒ۖಬ???

  11. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  12. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  13. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  14. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  15. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  16. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  17. ؘ੉ఠ ೒ۖಬ ୸୶੹Ҵद؀

  18. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ?

  19. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ)
  20. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+)
  21. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ
  22. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ
  23. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ
  24. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮...
  25. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... 😰
  26. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ
  27. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  28. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... য়ט੄ ݫੋ ઱ઁ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  29. Ӕؘ ؘ੉ఠ ೒ۖಬ੉ ৵ ೙ਃೠؘ?

  30. ؘ੉ఠܳ ా೧ ࢲ࠺झ, ࠺ૉפझ੄ җѢ৬ അ੤, ޷ېܳ ঌ ࣻ ੓׮


    
 ׮নೠ ૑಴৬ ా҅ (ݒ୹, AU, ...) ؘ੉ఠܳ ాೠ ੄ࢎѾ੿ ؘ੉ఠ ӝ߈ ࢲ࠺झ (Ѩ࢝, ୶ୌ, ੋӝب, ۘఊ, ੿࢑, A/B పझ౟, ৘ஏ, ...) ...
  31. ؘ੉ఠ ೒ۖಬ੄ ৈ۞ ಁ۞׮੐

  32. ؘ੉ఠ ೒ۖಬ੄ ৈ۞ ಁ۞׮੐ 
 Data Warehouse Data Lake Data

    Lakehouse?
  33. ؘ੉ఠ ೒ۖಬ੄ ৈ۞ ಁ۞׮੐ 
 Data Warehouse Data Lake Data

    Lakehouse? DB w/ Distributed Storage, Query Engine Distributed Storage / Distributed Query Engine / Metastore Data Warehouse & Data Lake
  34. 
 Data Warehouse Data Lake Data Lakehouse ੿੄ী ٮۄ ࠙ܨ೧ࠇद׮

  35. 
 Data Warehouse Data Lake Data Lakehouse ੿੄ী ٮۄ ࠙ܨ೧ࠇद׮

  36. 
 Data Warehouse Data Lake Data Lakehouse ੿੄ী ٮۄ ࠙ܨ೧ࠇद׮

  37. 
 Data Warehouse Data Lake Data Lakehouse Data Lakehouseח?

  38. 
 Data Warehouse Data Lake Data Lakehouse ࢎप ࢜۽਍ ѐ֛਷

    ইפ׮...
  39. Pros of Data Lake • ইޖ ؘ੉ఠա ݄ ੷੢ೞݶࢲ ॶ

    ࣻ ੓׮ (ELT) -> image, audioب оמ! • ߹ب੄ ingestion җ੿੉ ೙ਃ হ׮: storageী ੸੤ೞݶ ՘ • Storageо SSoTо غӝ ٸޙী, ೂࠗೠ storage ӝמ ࢎਊ оמ + ੑݍী ݏח query engine ࢎਊ оמ (SQL, API, Code, Framework, External readܳ ૑ਗೞח ׮নೠ ઁಿٜ) • ؀ࠗ࠙੄ ௿ۄ਋٘ח storageо ઁੌ ੷۴ೞ׮! ژೠ, tieringب оמೞ׮ • External, schema-on-read ߑध੄ ࠺Ү੸ ਬোೞҊ programmatic APIܳ ૑ਗೞח metadata system: Hive, Glue, Dataproc Metastore ١
  40. Cons of Data Lake • Underlying storage੄ ઁডਸ Ӓ؀۽ ੸ਊ߉ח׮

    (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  41. Pros of Data Warehouse • ࢎਊ੗о न҃ॶ Ѫ੉ ݆੉ হ׮:

    ingestion ೞҊ, աݠ૑ח ੹ࠗ DWо ঌইࢲ ೧઱Ѣ ա ࢎਊ੗о DW੄ ӝמਸ ੉ਊ೧ࢲ customize ೞݶ ػ׮. • Internal metadata (schema-on-write) ӝ ٸޙী schema evolution э਷ Ѫ੉ ࠺Ү੸ ੗ਬ܂׮. • ׮݅ ت੉ ٜ ࣻ ੓׮. ౠ੿ ো࢑ٜ਷ ੹୓ܳ ׮ զܻҊ ࢜۽ ݅٘ח ҃਋о ੓਺ • ױࣽ query engine ੉࢚੄ ӝמٜਸ ઁҕೞח ҃਋о ੓׮: CDC, SQL ޙߨ ١ • Component ѐࣻо ੸׮: BigQuery -> ਬ૑ࠁࣻ ࠺ਊ੉ Lakeࠁ׮ ੘ਸ ࣻ ੓਺ • SQL native۽ ؘ੉ఠܳ ҙܻೡ ࣻ ੓׮: Update, Delete ١ (ױ, ز੘ ߑधী ٮۄ ࠺ਊ ੉ ୒ҳغѢա ૑ਗೞ૑ ঋਸ ࣻ ੓਺)
  42. Cons of Data Warehouse • Ingestion җ੿੉ ੓׮. • DWীࢲ

    ૑ਗ೧઱૑ ঋח ӝמ਷ ࢎਊೡ ࣻ হ׮: image, audioܳ BigQueryীࢲ ׮ܖ Ҋ र׮ݶ? ղо ࢎਊೞח DWо semi-structured formatਸ ૑ਗೞ૑ ঋח׮ݶ? connectorо ૑ਗغ૑ ঋח׮ݶ? • ইޖѢա ݄ ֍Ҋ ࢎਊೞӝ ࠗ׸झۣ׮. (оѺ੉...) • Component ߹۽ ؘ੉ఠо ౵ಞചػ׮: real-time BI servingਸ ਤ೧ࢲח BigQuery ݅ਵ۽ח ࠗ઒ೞ׮. • ೠ ߣ ٜযр ؘ੉ఠܳ ߄Ӵਵ۽ ࡐӝ য۵׮: $$$ • Lakeী ࠺೧ ࠺व оמࢿ੉ ֫׮. (ౠ൤ multi-cloud ۄݶ)
  43. Price Comparison: Athena, BigQuery, Snowflake • 1PB੄ JSONਸ ingestion ೠ׮Ҋ

    о੿, ਘ ௪ܻ۝਷ 100TB • 1PB JSONਸ Parquet + zstd ঑୷ೞݶ ؀ۚ 90% ੿ب ঑୷ܫ੉ ա১ (=100TB) • ௿ۄ਋٘੄ ҃਋ э਷ ௿ۄ਋٘ ղ੄ э਷ regionীࢲ networkܳ ੉ਊೠ׮Ҋ о੿ (no public egress) • э਷ region੉ ইפۄݶ $$$ • ࠺ਊ ӝળ਷ ݽف Seoul regionਸ ӝળਵ۽ ೣ • ਘ߹ ࠺ਊਸ ҅࢑ • ଵҊਊੑפ׮! (ݒ਋ ࠗ੿ഛೡ ࣻ ੓਺)
  44. Price Comparison: Athena, BigQuery, Snowflake Athena BigQuery Snow fl ake

    Ingestion Free Free Free? Storage 100TB / 0.025$ per GB = $2,500 1PB (uncompressed) / 0.023$ per GB = $23,000? 100TB / 0.025$ per GB = $2,500 + another $2,500 for storage Query 100TB / 5$ per TB = $500 100TB / 6$ per TB = $600 18ݺ੉ ੌҗदрী ॳݶ = $5,952 Total $3,000 $23,600 $10,952
  45. ӝࣿ݃׮ ੢ױ੼੉ ੓૑݅,
 ࠁా تਸ ݆੉ ղݶ ੢੼݅ թח׮ 😅

  46. ؘ੉ఠ ۨ੉௼੄ ױ੼ਸ ӓࠂೡ ࣽ হਸө? • Underlying storage੄ ઁডਸ

    Ӓ؀۽ ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  47. ژೠ ౠ੿ ױ੼ٜ਷ Warehouseীب ઓ੤ೠ׮ • Underlying storage੄ ઁডਸ Ӓ؀۽

    ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  48. ରࣁ؀ ؘ੉ఠ ೒ۖಬ ೐ۨ੐ਕ௼
 (ۄҊ ݈ೞ૑݅ ࢎप਷ Ӓր Storage Engine)

  49. ରࣁ؀ ؘ੉ఠ ೒ۖಬ ೐ۨ੐ਕ௼

  50. ରࣁ؀ ؘ੉ఠ ೒ۖಬ ೐ۨ੐ਕ௼

  51. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...)
  52. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot)
  53. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  54. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  55. Features of Delta Lake / Iceberg

  56. Features of Delta Lake / Iceberg

  57. যڌѱ ॶ ࣻ ੓աਃ?

  58. যڌѱ ॶ ࣻ ੓աਃ?

  59. যڌѱ ॶ ࣻ ੓աਃ?

  60. যڌѱ ॶ ࣻ ੓աਃ?

  61. ࣁ࢚਷ ևҊ, ӝࣿ਷ ҅ࣘ աৡ׮.

  62. ݽٚ Ѫਸ ೧Ѿೞח ݅מ ӝࣿ਷ হ׮.

  63. ࠺ૉפझ ࢚ടী ݏח ӝࣿ ࢶఖ੉ ઺ਃೞ׮.
 Ӓ۞۰ݶ ӝࣿী ؀ೠ ೂࠗೠ

    ੉೧о ೙ਃೞ׮.
  64. Q&A

  65. Delta Lake & Iceberg ೐۽؋࣌ ॄب غաਃ?

  66. Delta Lake & Iceberg ೐۽؋࣌ ॄب غաਃ?

  67. Delta Lake & Iceberg ೐۽؋࣌ ॄب غաਃ? • Delta Lake੄

    ҃਋ ੉޷ ݆਷ Ҋёࢎীࢲ ࢎਊ ઺੉׮. (ೠҴ਷ ੜ ݽܰѷ֎ਃ) • Databricks৬ ҅ডਸ ݛҊ ࢎਊೞח ҃਋о ݆਺. য়೑ࣗझܳ بੑೠ Ҕ਷ ੓ח૑ ݽܰ ѷणפ׮. • Icebergب ೠҴ਷ ੜ ݽܰѷਵա Spark, Snowflake ӝ߈੄ ৻Ҵ ഥࢎٜ਷ ݆੉ ࢎਊೞҊ ੓णפ׮.
  68. Athenaח BigQueryࠁ׮ וܽоਃ?

  69. Redshiftח ૓૞ റ૓оਃ? • Redshiftח DW੉ӟ ೞ૑݅, OLAPী ؊ оӰ׮Ҋ

    ࠅ ࣻ ੓णפ׮. • Druid, Clickhouse ୊ۢ... • ખ ؊ ౠࣻ ݾ੸ী ࢎਊೞח ѱ જਸ Ѫ эणפ׮. • e.g. CDC from Postgres • e.g. ~ms latency dashboard
  70. द҅ৌ, Ӓې೐ ؘ੉ఠب ؘ੉ఠ ೒ۖಬী ֍ਸ ࣻ ੓աਃ? • द҅ৌ

    • ޛۿ੉૑݅, InfluxDB э਷ Ѫਸ ӝ؀ೞदݶ উ ؾפ׮. • Ӓې೐ • ৉द ֍ਸ ࣻ ੓૑݅, Ӓې೐ ഋకܳ row۽ ಽযࢲ ֍ח Ѫ੉ જणפ׮. (׼੢਷..)
  71. ElasticSearchܳ ؘ੉ఠ ೒ۖಬਵ۽ ॄب غաਃ? • ߈਷ ݏҊ, ߈਷ ইפ׮

    (ѐੋ੸ੋ ࢤп) • ESب cluster ҳઑ੄ distributed storage / query engine ਸ о૑ӟ ೮૑݅, ؀ӏݽ ؘ੉ ఠܳ ࡅܰѱ ઑഥೞח Ѫࠁ׮ ׮ܲ Ѫী ъ੼੉ ݆׮ • e.g.) time-series indexing, very fast latency ١ • ݫੋ ؘ੉ఠ ೒ۖಬਵ۽ח ࠗ੸੺ೞ૑݅ ౠ੿ use-caseܳ ࠁ৮ೞח ਊب۽ח 👍
  72. хࢎ೤פ׮

  73. ૓૞ Q&A

  74. ٜয઱࣊ࢲ хࢎ೤פ׮! • ؘ੉ఠ ҙ۲ ֤੄, ૕ޙ, ੉ঠӝо ੓ਵदݶ zephtys123@gmail.com

    ਵ۽ ੗ਬ܂ѱ...