Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기술 알아보기

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기술 알아보기

Woong Seok Kang

September 22, 2022
Tweet

More Decks by Woong Seok Kang

Other Decks in Programming

Transcript

  1. ߊ಴੗ ࣗѐ - ъਔࢳ • ؘ੉ఠ ӝࣿী ҙब੉ ݆਷ 6֙ର

    ূ૑פয • (੹) SaaS झఋ౟স, ஠஠য়झఋੌ, ܻ٣ • അ੤ח ೟Ү ׮פח ઺...
  2. द੘ೞӝ ੹ী • п ӝࣿী ؀ೠ ಣоח ୭؀ೠ ёҙ੸ਵ۽ ೮૑݅,

    ੷੄ ઱ҙ੉ ਷ো ઺ী ٜযщਸ ࣻب ੓णפ׮. • ੷ח AWSח ೂࠗೞѱ ॄࠁও૑݅, ׮ܲ ӝٜࣿ਷ ೐۽؋࣌ীࢲ ਍ਊ೧ࠁ૑ ঋই ੜޅػ ੿ࠁо ੓ਸ ࣻ ੓णפ׮. ೞ૑݅ ୭ࢶਸ ׮೧ ৢ߄ܲ ੿ࠁܳ ׸ਵ۰ ೮णפ׮. 😃 
 (ೞ૑݅ ׮ trial ೧ࠁӟ ೮णפ׮) • ৈ۞ ѐ֛ٜী ؀೧ ੷݅੄ ߑधਵ۽ ࢸݺೞח ࠗ࠙੉ ੓חؘ, ࢎۈ݃׮ ࢤп੉ ׮ܳ ࣻ ੓ ਵ޲۽ ੸੺൤ ߉ইٜৈ઱दݶ хࢎೞѷणפ׮.
  3. য়ט ࣗѐܾ٘ ղਊ • ؘ੉ఠ ೒ۖಬ੄ җѢ৬ അ੤ • ׮নೠ

    ௿ۄ਋٘ ؘ੉ఠ ೒ۖಬ ಁ۞׮੐ ࢓ಝࠁӝ • ରࣁ؀ ؘ੉ఠ ೒ۖಬ ӝࣿ ࢓ಝࠁӝ - Apache Iceberg, Delta Lake • ՘ݛ਺ ߂ Q&A
  4. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ
  5. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ
  6. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ
  7. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮...
  8. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... 😰
  9. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ
  10. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  11. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... য়ט੄ ݫੋ ઱ઁ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  12. ؘ੉ఠܳ ా೧ ࢲ࠺झ, ࠺ૉפझ੄ җѢ৬ അ੤, ޷ېܳ ঌ ࣻ ੓׮


    
 ׮নೠ ૑಴৬ ా҅ (ݒ୹, AU, ...) ؘ੉ఠܳ ాೠ ੄ࢎѾ੿ ؘ੉ఠ ӝ߈ ࢲ࠺झ (Ѩ࢝, ୶ୌ, ੋӝب, ۘఊ, ੿࢑, A/B పझ౟, ৘ஏ, ...) ...
  13. ؘ੉ఠ ೒ۖಬ੄ ৈ۞ ಁ۞׮੐ 
 Data Warehouse Data Lake Data

    Lakehouse? DB w/ Distributed Storage, Query Engine Distributed Storage / Distributed Query Engine / Metastore Data Warehouse & Data Lake
  14. Pros of Data Lake • ইޖ ؘ੉ఠա ݄ ੷੢ೞݶࢲ ॶ

    ࣻ ੓׮ (ELT) -> image, audioب оמ! • ߹ب੄ ingestion җ੿੉ ೙ਃ হ׮: storageী ੸੤ೞݶ ՘ • Storageо SSoTо غӝ ٸޙী, ೂࠗೠ storage ӝמ ࢎਊ оמ + ੑݍী ݏח query engine ࢎਊ оמ (SQL, API, Code, Framework, External readܳ ૑ਗೞח ׮নೠ ઁಿٜ) • ؀ࠗ࠙੄ ௿ۄ਋٘ח storageо ઁੌ ੷۴ೞ׮! ژೠ, tieringب оמೞ׮ • External, schema-on-read ߑध੄ ࠺Ү੸ ਬোೞҊ programmatic APIܳ ૑ਗೞח metadata system: Hive, Glue, Dataproc Metastore ١
  15. Cons of Data Lake • Underlying storage੄ ઁডਸ Ӓ؀۽ ੸ਊ߉ח׮

    (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  16. Pros of Data Warehouse • ࢎਊ੗о न҃ॶ Ѫ੉ ݆੉ হ׮:

    ingestion ೞҊ, աݠ૑ח ੹ࠗ DWо ঌইࢲ ೧઱Ѣ ա ࢎਊ੗о DW੄ ӝמਸ ੉ਊ೧ࢲ customize ೞݶ ػ׮. • Internal metadata (schema-on-write) ӝ ٸޙী schema evolution э਷ Ѫ੉ ࠺Ү੸ ੗ਬ܂׮. • ׮݅ ت੉ ٜ ࣻ ੓׮. ౠ੿ ো࢑ٜ਷ ੹୓ܳ ׮ զܻҊ ࢜۽ ݅٘ח ҃਋о ੓਺ • ױࣽ query engine ੉࢚੄ ӝמٜਸ ઁҕೞח ҃਋о ੓׮: CDC, SQL ޙߨ ١ • Component ѐࣻо ੸׮: BigQuery -> ਬ૑ࠁࣻ ࠺ਊ੉ Lakeࠁ׮ ੘ਸ ࣻ ੓਺ • SQL native۽ ؘ੉ఠܳ ҙܻೡ ࣻ ੓׮: Update, Delete ١ (ױ, ز੘ ߑधী ٮۄ ࠺ਊ ੉ ୒ҳغѢա ૑ਗೞ૑ ঋਸ ࣻ ੓਺)
  17. Cons of Data Warehouse • Ingestion җ੿੉ ੓׮. • DWীࢲ

    ૑ਗ೧઱૑ ঋח ӝמ਷ ࢎਊೡ ࣻ হ׮: image, audioܳ BigQueryীࢲ ׮ܖ Ҋ र׮ݶ? ղо ࢎਊೞח DWо semi-structured formatਸ ૑ਗೞ૑ ঋח׮ݶ? connectorо ૑ਗغ૑ ঋח׮ݶ? • ইޖѢա ݄ ֍Ҋ ࢎਊೞӝ ࠗ׸झۣ׮. (оѺ੉...) • Component ߹۽ ؘ੉ఠо ౵ಞചػ׮: real-time BI servingਸ ਤ೧ࢲח BigQuery ݅ਵ۽ח ࠗ઒ೞ׮. • ೠ ߣ ٜযр ؘ੉ఠܳ ߄Ӵਵ۽ ࡐӝ য۵׮: $$$ • Lakeী ࠺೧ ࠺व оמࢿ੉ ֫׮. (ౠ൤ multi-cloud ۄݶ)
  18. Price Comparison: Athena, BigQuery, Snowflake • 1PB੄ JSONਸ ingestion ೠ׮Ҋ

    о੿, ਘ ௪ܻ۝਷ 100TB • 1PB JSONਸ Parquet + zstd ঑୷ೞݶ ؀ۚ 90% ੿ب ঑୷ܫ੉ ա১ (=100TB) • ௿ۄ਋٘੄ ҃਋ э਷ ௿ۄ਋٘ ղ੄ э਷ regionীࢲ networkܳ ੉ਊೠ׮Ҋ о੿ (no public egress) • э਷ region੉ ইפۄݶ $$$ • ࠺ਊ ӝળ਷ ݽف Seoul regionਸ ӝળਵ۽ ೣ • ਘ߹ ࠺ਊਸ ҅࢑ • ଵҊਊੑפ׮! (ݒ਋ ࠗ੿ഛೡ ࣻ ੓਺)
  19. Price Comparison: Athena, BigQuery, Snowflake Athena BigQuery Snow fl ake

    Ingestion Free Free Free? Storage 100TB / 0.025$ per GB = $2,500 1PB (uncompressed) / 0.023$ per GB = $23,000? 100TB / 0.025$ per GB = $2,500 + another $2,500 for storage Query 100TB / 5$ per TB = $500 100TB / 6$ per TB = $600 18ݺ੉ ੌҗदрী ॳݶ = $5,952 Total $3,000 $23,600 $10,952
  20. ؘ੉ఠ ۨ੉௼੄ ױ੼ਸ ӓࠂೡ ࣽ হਸө? • Underlying storage੄ ઁডਸ

    Ӓ؀۽ ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  21. ژೠ ౠ੿ ױ੼ٜ਷ Warehouseীب ઓ੤ೠ׮ • Underlying storage੄ ઁডਸ Ӓ؀۽

    ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  22. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...)
  23. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot)
  24. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  25. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  26. Q&A

  27. Delta Lake & Iceberg ೐۽؋࣌ ॄب غաਃ? • Delta Lake੄

    ҃਋ ੉޷ ݆਷ Ҋёࢎীࢲ ࢎਊ ઺੉׮. (ೠҴ਷ ੜ ݽܰѷ֎ਃ) • Databricks৬ ҅ডਸ ݛҊ ࢎਊೞח ҃਋о ݆਺. য়೑ࣗझܳ بੑೠ Ҕ਷ ੓ח૑ ݽܰ ѷणפ׮. • Icebergب ೠҴ਷ ੜ ݽܰѷਵա Spark, Snowflake ӝ߈੄ ৻Ҵ ഥࢎٜ਷ ݆੉ ࢎਊೞҊ ੓णפ׮.
  28. Redshiftח ૓૞ റ૓оਃ? • Redshiftח DW੉ӟ ೞ૑݅, OLAPী ؊ оӰ׮Ҋ

    ࠅ ࣻ ੓णפ׮. • Druid, Clickhouse ୊ۢ... • ખ ؊ ౠࣻ ݾ੸ী ࢎਊೞח ѱ જਸ Ѫ эणפ׮. • e.g. CDC from Postgres • e.g. ~ms latency dashboard
  29. द҅ৌ, Ӓې೐ ؘ੉ఠب ؘ੉ఠ ೒ۖಬী ֍ਸ ࣻ ੓աਃ? • द҅ৌ

    • ޛۿ੉૑݅, InfluxDB э਷ Ѫਸ ӝ؀ೞदݶ উ ؾפ׮. • Ӓې೐ • ৉द ֍ਸ ࣻ ੓૑݅, Ӓې೐ ഋకܳ row۽ ಽযࢲ ֍ח Ѫ੉ જणפ׮. (׼੢਷..)
  30. ElasticSearchܳ ؘ੉ఠ ೒ۖಬਵ۽ ॄب غաਃ? • ߈਷ ݏҊ, ߈਷ ইפ׮

    (ѐੋ੸ੋ ࢤп) • ESب cluster ҳઑ੄ distributed storage / query engine ਸ о૑ӟ ೮૑݅, ؀ӏݽ ؘ੉ ఠܳ ࡅܰѱ ઑഥೞח Ѫࠁ׮ ׮ܲ Ѫী ъ੼੉ ݆׮ • e.g.) time-series indexing, very fast latency ١ • ݫੋ ؘ੉ఠ ೒ۖಬਵ۽ח ࠗ੸੺ೞ૑݅ ౠ੿ use-caseܳ ࠁ৮ೞח ਊب۽ח 👍