Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기...

Avatar for Woong Seok Kang Woong Seok Kang
September 22, 2022

AWSKRUG DS 2022/09 발표 - 클라우드 데이터 플랫폼을 구성하는 최신 기술 알아보기

Avatar for Woong Seok Kang

Woong Seok Kang

September 22, 2022
Tweet

More Decks by Woong Seok Kang

Other Decks in Programming

Transcript

  1. ߊ಴੗ ࣗѐ - ъਔࢳ • ؘ੉ఠ ӝࣿী ҙब੉ ݆਷ 6֙ର

    ূ૑פয • (੹) SaaS झఋ౟স, ஠஠য়झఋੌ, ܻ٣ • അ੤ח ೟Ү ׮פח ઺...
  2. द੘ೞӝ ੹ী • п ӝࣿী ؀ೠ ಣоח ୭؀ೠ ёҙ੸ਵ۽ ೮૑݅,

    ੷੄ ઱ҙ੉ ਷ো ઺ী ٜযщਸ ࣻب ੓णפ׮. • ੷ח AWSח ೂࠗೞѱ ॄࠁও૑݅, ׮ܲ ӝٜࣿ਷ ೐۽؋࣌ীࢲ ਍ਊ೧ࠁ૑ ঋই ੜޅػ ੿ࠁо ੓ਸ ࣻ ੓णפ׮. ೞ૑݅ ୭ࢶਸ ׮೧ ৢ߄ܲ ੿ࠁܳ ׸ਵ۰ ೮णפ׮. 😃 
 (ೞ૑݅ ׮ trial ೧ࠁӟ ೮णפ׮) • ৈ۞ ѐ֛ٜী ؀೧ ੷݅੄ ߑधਵ۽ ࢸݺೞח ࠗ࠙੉ ੓חؘ, ࢎۈ݃׮ ࢤп੉ ׮ܳ ࣻ ੓ ਵ޲۽ ੸੺൤ ߉ইٜৈ઱दݶ хࢎೞѷणפ׮.
  3. য়ט ࣗѐܾ٘ ղਊ • ؘ੉ఠ ೒ۖಬ੄ җѢ৬ അ੤ • ׮নೠ

    ௿ۄ਋٘ ؘ੉ఠ ೒ۖಬ ಁ۞׮੐ ࢓ಝࠁӝ • ରࣁ؀ ؘ੉ఠ ೒ۖಬ ӝࣿ ࢓ಝࠁӝ - Apache Iceberg, Delta Lake • ՘ݛ਺ ߂ Q&A
  4. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ
  5. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ
  6. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ
  7. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮...
  8. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ 
 ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ ੉ ਍৔غח ؘח ޙઁо হҊ оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... 😰
  9. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ
  10. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... ରӔରӔ ࢓ಝࠁӝ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  11. Ӓېࢲ ؘ੉ఠ ೒ۖಬ੉ ޥؘ? ই઱ ׮নೠ ഋక੄ ؘ੉ఠܳ (੿ഋ, ࠺੿ഋ,

    ߈੿ഋ) ষ୒աѱ ݆੉ ੷੢ೡ ࣻ ੓Ҋ (PB+) ղ ੋղबਸ ੗ӓೞ૑ ঋח ࢶীࢲ ؘ੉ఠ ઑഥо оמ೧ঠೞҊ ࢎղ ҳࢿਗ ݽفо زदী ؘ੉ఠܳ ઑഥ೧ب ੜ ઑഥغযঠ ೞҊ ੉ۧѱ ࡳইմ ؘ੉ఠܳ दпചೡ ࣻ ੓যঠ ೞݴ 
 ղо ਗೞח ؘ੉ఠܳ ࡅܰѱ ੜ ଺ਸ ࣻ ੓যঠ ೞҊ 
 ٸ۽ח যڃ ؘ੉ఠח ࠁউ ޙઁ۽ ࠁ੉૑ ঋѢա ੌࠗ݅ ࠁৈঠೞҊ 
 ࢎղ੄ ઺ਃೠ ؘ੉ఠח ই઱ ੜ ୶࢚ചغয ੿ܻغয ੓যঠ ೞҊ ؘ੉ఠܳ ݅٘ח ۽૒ਸ ೠ ׀ী ঌইࠅ ࣻ ੓ Ҋ ӒѦ ੸਷ ࠺ਊਵ۽ ਬ૑ࠁࣻ ೡ ࣻ ੓ਵݴ ղо ׼੢ ృࢎ೧ب ؘ੉ఠ ೒ۖಬ੉ ਍৔غח ؘח ޙઁо হҊ 
 оә੸੉ݶ पदрਵ۽ ࠅ ࣻ ੓যঠ ೞҊ 
 ѐߊ੗о ইצ ࢎۈب ಞೞѱ ࠅ ࣻ ੓যঠ ೞחؘ 
 Ӕؘ য૰ٚ ੹୓ ࠺ਊ਷ ژ ੷۴೧ঠೠ׮... য়ט੄ ݫੋ ઱ઁ Distributed Storage Distributed Query Engine Visualization Discovery Security, Governance (Near) Realtime Data Pipeline, CDC Less-code; e.g.) Amplitude, DBT, SQL Software Engineering, IaC, DataOps Data Type, Compression, Tiering, Cache, Partitioning
  12. ؘ੉ఠܳ ా೧ ࢲ࠺झ, ࠺ૉפझ੄ җѢ৬ അ੤, ޷ېܳ ঌ ࣻ ੓׮


    
 ׮নೠ ૑಴৬ ా҅ (ݒ୹, AU, ...) ؘ੉ఠܳ ాೠ ੄ࢎѾ੿ ؘ੉ఠ ӝ߈ ࢲ࠺झ (Ѩ࢝, ୶ୌ, ੋӝب, ۘఊ, ੿࢑, A/B పझ౟, ৘ஏ, ...) ...
  13. ؘ੉ఠ ೒ۖಬ੄ ৈ۞ ಁ۞׮੐ 
 Data Warehouse Data Lake Data

    Lakehouse? DB w/ Distributed Storage, Query Engine Distributed Storage / Distributed Query Engine / Metastore Data Warehouse & Data Lake
  14. Pros of Data Lake • ইޖ ؘ੉ఠա ݄ ੷੢ೞݶࢲ ॶ

    ࣻ ੓׮ (ELT) -> image, audioب оמ! • ߹ب੄ ingestion җ੿੉ ೙ਃ হ׮: storageী ੸੤ೞݶ ՘ • Storageо SSoTо غӝ ٸޙী, ೂࠗೠ storage ӝמ ࢎਊ оמ + ੑݍী ݏח query engine ࢎਊ оמ (SQL, API, Code, Framework, External readܳ ૑ਗೞח ׮নೠ ઁಿٜ) • ؀ࠗ࠙੄ ௿ۄ਋٘ח storageо ઁੌ ੷۴ೞ׮! ژೠ, tieringب оמೞ׮ • External, schema-on-read ߑध੄ ࠺Ү੸ ਬোೞҊ programmatic APIܳ ૑ਗೞח metadata system: Hive, Glue, Dataproc Metastore ١
  15. Cons of Data Lake • Underlying storage੄ ઁডਸ Ӓ؀۽ ੸ਊ߉ח׮

    (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  16. Pros of Data Warehouse • ࢎਊ੗о न҃ॶ Ѫ੉ ݆੉ হ׮:

    ingestion ೞҊ, աݠ૑ח ੹ࠗ DWо ঌইࢲ ೧઱Ѣ ա ࢎਊ੗о DW੄ ӝמਸ ੉ਊ೧ࢲ customize ೞݶ ػ׮. • Internal metadata (schema-on-write) ӝ ٸޙী schema evolution э਷ Ѫ੉ ࠺Ү੸ ੗ਬ܂׮. • ׮݅ ت੉ ٜ ࣻ ੓׮. ౠ੿ ো࢑ٜ਷ ੹୓ܳ ׮ զܻҊ ࢜۽ ݅٘ח ҃਋о ੓਺ • ױࣽ query engine ੉࢚੄ ӝמٜਸ ઁҕೞח ҃਋о ੓׮: CDC, SQL ޙߨ ١ • Component ѐࣻо ੸׮: BigQuery -> ਬ૑ࠁࣻ ࠺ਊ੉ Lakeࠁ׮ ੘ਸ ࣻ ੓਺ • SQL native۽ ؘ੉ఠܳ ҙܻೡ ࣻ ੓׮: Update, Delete ١ (ױ, ز੘ ߑधী ٮۄ ࠺ਊ ੉ ୒ҳغѢա ૑ਗೞ૑ ঋਸ ࣻ ੓਺)
  17. Cons of Data Warehouse • Ingestion җ੿੉ ੓׮. • DWীࢲ

    ૑ਗ೧઱૑ ঋח ӝמ਷ ࢎਊೡ ࣻ হ׮: image, audioܳ BigQueryীࢲ ׮ܖ Ҋ र׮ݶ? ղо ࢎਊೞח DWо semi-structured formatਸ ૑ਗೞ૑ ঋח׮ݶ? connectorо ૑ਗغ૑ ঋח׮ݶ? • ইޖѢա ݄ ֍Ҋ ࢎਊೞӝ ࠗ׸झۣ׮. (оѺ੉...) • Component ߹۽ ؘ੉ఠо ౵ಞചػ׮: real-time BI servingਸ ਤ೧ࢲח BigQuery ݅ਵ۽ח ࠗ઒ೞ׮. • ೠ ߣ ٜযр ؘ੉ఠܳ ߄Ӵਵ۽ ࡐӝ য۵׮: $$$ • Lakeী ࠺೧ ࠺व оמࢿ੉ ֫׮. (ౠ൤ multi-cloud ۄݶ)
  18. Price Comparison: Athena, BigQuery, Snowflake • 1PB੄ JSONਸ ingestion ೠ׮Ҋ

    о੿, ਘ ௪ܻ۝਷ 100TB • 1PB JSONਸ Parquet + zstd ঑୷ೞݶ ؀ۚ 90% ੿ب ঑୷ܫ੉ ա১ (=100TB) • ௿ۄ਋٘੄ ҃਋ э਷ ௿ۄ਋٘ ղ੄ э਷ regionীࢲ networkܳ ੉ਊೠ׮Ҋ о੿ (no public egress) • э਷ region੉ ইפۄݶ $$$ • ࠺ਊ ӝળ਷ ݽف Seoul regionਸ ӝળਵ۽ ೣ • ਘ߹ ࠺ਊਸ ҅࢑ • ଵҊਊੑפ׮! (ݒ਋ ࠗ੿ഛೡ ࣻ ੓਺)
  19. Price Comparison: Athena, BigQuery, Snowflake Athena BigQuery Snow fl ake

    Ingestion Free Free Free? Storage 100TB / 0.025$ per GB = $2,500 1PB (uncompressed) / 0.023$ per GB = $23,000? 100TB / 0.025$ per GB = $2,500 + another $2,500 for storage Query 100TB / 5$ per TB = $500 100TB / 6$ per TB = $600 18ݺ੉ ੌҗदрী ॳݶ = $5,952 Total $3,000 $23,600 $10,952
  20. ؘ੉ఠ ۨ੉௼੄ ױ੼ਸ ӓࠂೡ ࣽ হਸө? • Underlying storage੄ ઁডਸ

    Ӓ؀۽ ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  21. ژೠ ౠ੿ ױ੼ٜ਷ Warehouseীب ઓ੤ೠ׮ • Underlying storage੄ ઁডਸ Ӓ؀۽

    ੸ਊ߉ח׮ (e.g. EC, rename, ACID, streaming, ...) • ࢎਊ੗о storageܳ যڌѱ ࢎਊೞוջী ٮۄ ࢿמ੉ ୌର݅߹੉׮: data type, format, compression, directory structure, block size, ...) • Lakeܳ ҳࢿೞח componentо ցޖ ݆׮: storage, query engine, metastore, ... • ؘ੉ఠ ҙܻо ࢚؀੸ਵ۽ ؊ য۵׮ - Data swamp • External schema ҙܻо ࠂ੟ೞ׮ - file format߹۽, store ߹۽, query engine ߹۽ ׮ܰҊ schema evolution੉ ࠺о৉੸ੋ ҃਋ب ੓׮. • SQL ݅ਵ۽ ؘ੉ఠ ҙܻܳ ೡ ࣻ হ׮ - Update, Delete?
  22. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...)
  23. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot)
  24. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  25. Features of Delta Lake / Iceberg • Support Full Schema

    Evolution with more expressive DML • Update, Delete, Merge, or by yourself (using Spark, Python, API, ...) • Time Travel & Rollback (for reproducible query, by snapshot) • ACID Transactions, Optimized Streaming, Audit Logs, Caching, Data Layout Optimization (Z-order, multiple column ordering), Support ML Features • Open Sources!
  26. Q&A

  27. Delta Lake & Iceberg ೐۽؋࣌ ॄب غաਃ? • Delta Lake੄

    ҃਋ ੉޷ ݆਷ Ҋёࢎীࢲ ࢎਊ ઺੉׮. (ೠҴ਷ ੜ ݽܰѷ֎ਃ) • Databricks৬ ҅ডਸ ݛҊ ࢎਊೞח ҃਋о ݆਺. য়೑ࣗझܳ بੑೠ Ҕ਷ ੓ח૑ ݽܰ ѷणפ׮. • Icebergب ೠҴ਷ ੜ ݽܰѷਵա Spark, Snowflake ӝ߈੄ ৻Ҵ ഥࢎٜ਷ ݆੉ ࢎਊೞҊ ੓णפ׮.
  28. Redshiftח ૓૞ റ૓оਃ? • Redshiftח DW੉ӟ ೞ૑݅, OLAPী ؊ оӰ׮Ҋ

    ࠅ ࣻ ੓णפ׮. • Druid, Clickhouse ୊ۢ... • ખ ؊ ౠࣻ ݾ੸ী ࢎਊೞח ѱ જਸ Ѫ эणפ׮. • e.g. CDC from Postgres • e.g. ~ms latency dashboard
  29. द҅ৌ, Ӓې೐ ؘ੉ఠب ؘ੉ఠ ೒ۖಬী ֍ਸ ࣻ ੓աਃ? • द҅ৌ

    • ޛۿ੉૑݅, InfluxDB э਷ Ѫਸ ӝ؀ೞदݶ উ ؾפ׮. • Ӓې೐ • ৉द ֍ਸ ࣻ ੓૑݅, Ӓې೐ ഋకܳ row۽ ಽযࢲ ֍ח Ѫ੉ જणפ׮. (׼੢਷..)
  30. ElasticSearchܳ ؘ੉ఠ ೒ۖಬਵ۽ ॄب غաਃ? • ߈਷ ݏҊ, ߈਷ ইפ׮

    (ѐੋ੸ੋ ࢤп) • ESب cluster ҳઑ੄ distributed storage / query engine ਸ о૑ӟ ೮૑݅, ؀ӏݽ ؘ੉ ఠܳ ࡅܰѱ ઑഥೞח Ѫࠁ׮ ׮ܲ Ѫী ъ੼੉ ݆׮ • e.g.) time-series indexing, very fast latency ١ • ݫੋ ؘ੉ఠ ೒ۖಬਵ۽ח ࠗ੸੺ೞ૑݅ ౠ੿ use-caseܳ ࠁ৮ೞח ਊب۽ח 👍