Upgrade to Pro — share decks privately, control downloads, hide ads and more …

データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to loo...

polamjag
January 23, 2019

データ基盤を作るのに使うようなサービスは単発で使うだけでも便利 / Manage to look over unmanaged data with fully managed services

polamjag

January 23, 2019
Tweet

More Decks by polamjag

Other Decks in Programming

Transcript

  1. whoami • id:polamjag ϙ ϥ Ϝ δ ϟ ά •

    ͸ͯͳϒϩά • ΞϓϦέʔγϣϯΤϯδχΞ
  2. • SQL Λී௨ʹୟ͘ • ෳ਺ͷσʔλϕʔεʹ·͕ͨͬͯूܭ͍ͨ͠? • SELECT COUNT(*) ʹ͔͔࣌ؒΔΑ͏ͳςʔϒϧͰ΋ͱʹ͔͘ू ܭ͍ͨ͠?

    • ूܭΫΤϦΛߴ଎Խ͢ΔͨΊʹ௥ՃͰ index ཉ͍͚͠Ͳ
 ͱΓ͋͑ͣݟͯΈ͍ͨϨϕϧͷूܭͷͨΊʹ ALTER TABLE Ͱ͖Δ? • ྺ࢙తܦҢͰγϦΞϥΠζ͞Εͨσʔλ͕ೖͬͯΔΧϥϜ͕͋Δ ͷΛΫΤϦ͍ͨ͠?
  3. !!! BigQuery !!! • ͏·͍͍҆଎͍ۜͷ஄ؙ • సૹʹ Embulk Λ࢖͍ɺ DSL

    ͷதͰσʔλΛ௚͔ͯ͠Β
 όϧΫϩʔσΟϯά͢Δ • ඞཁͳσʔλશ෦ूΊ͖ͯͨΒ౰વ JOIN ͳͲͰ͖Δ • σʔλྔ͕͍ͨͨ͜͠ͱͳ͍[ͲΕ͘Β͍?]ͳΒߟ͑ͳ͍ͱ͍ ͚ͳ͍͜ͱ͸Ұ੾ͳ͍
  4. ΛϫϯγϣοτͰ࢖͏ • Embulk ͰσʔλૹΔ • 1ճ͖ΓͳΒߟ͑ͳ͍ͱ͍͚ͳ͍͜ͱ͸গͳ͍ • (ఆظ࣮ߦɾߋ৽ͷ͘͠ΈɺΠϯΫϦϝϯλϧϩʔυɺ……) • BigQuery

    ͰͳΜΒ͔ूܭ͢Δ • Google εϓϨουγʔτʹΤΫεϙʔτ͢ΔϘλϯ • σʔλϙʔλϧ (Data Studio) ʹಥͬࠐΉ
  5. BigQuery ʹΠϯϙʔτ͠ʹ͍͘ςʔϒϧ • ೔෇෼ׂͷ෼ׂςʔϒϧΛ࢖͍ʹ͍͘ςʔϒϧ • Ϩίʔυ͕ INSERT ͞Εͨ࣌ࠁɺΛද͢ΧϥϜ͕ແ͍ • ্Ґ֓೦ͷͦ͏͍͏৘ใΛ

    JOIN ͖ͯͯͦ͠ΕͰύʔςΟγϣ ϯ͢Δ? • αʔϏεͷྺ࢙͕ 4000 ೔ (≒ 10.96 ೥) Ҏ্͋Δ • צͰύʔςΟγϣϯͷִؒΛؒҾ͘? • Α͘Α͘ߟ͑ͨΒຊ౰͸·ͩύʔςΟγϣχϯάඞཁͳ͍ͱ͔
  6. Google BigQuery ྉۚ • ౦ژϦʔδϣϯͳΒετϨʔδ୯Ձ͸ S3 ΑΓ͍҆ • 0.023USD /

    GB / ݄ • ΫΤϦྉۚ͸ 8.55 USD / 1 TB • ࠓͷҝସϨʔτͩͱ͍͍ͩͨ 1 GB ͋ͨΓ 1 ԁऑ • ͪΐͬͱͨ͠࢖͍ํͳΒ΄ͱΜͲޡࠩϨϕϧ
  7. ͦͷϩά͸ࠓͲ͜ʹ • Ͳ͔͜ͷαʔό্? • ͦΕͦ͜ BigQuery ͱ͔ RedShift ͱ͔? •

    Amazon S3 ʹஔ͔Ε͍ͯΔ? • ࣗ෼ͷ৔߹͸ݟ͔ͨͬͨ΋ͷ͕ S3 ʹஔ͔Εͯ ͍ͨ
  8. S3 ʹஔ͍ͯ͋ΔͳΒ Amazon Athena Ͱ • S3 ʹஔ͍ͯ͋ΔϑΝΠϧΛΫΤϦͰ͖ΔϚωʔδυαʔϏε • த਎͸

    Presto (https://prestodb.io/) • Presto ΍ Hive ͷ஌͕ࣝ͋Δͱ࢖͍͜ͳͤΔͱ͍͏͜ͱ͕ଟ͍ • ϑϧϚωʔδυͳͷͰɺूܭͷͨΊʹετϨʔδͱωοτϫʔΫ ڧΊͷϚγϯΛ༻ҙͨ͠Γɺͱ͍ͬͨ͜ͱΛ͠ͳ͍͍ͯ͘ • ͓΋ΉΖʹ S3 ͷϑΝΠϧΛࢦఆͯ͠ CREATE TABLE ͯ͠ΫΤ Ϧ͢Δ͚ͩ
  9. • ೚ҙͷϑΥʔϚοτΛύʔεͯ͠ SQL ͰΫΤϦ • CSV/TSV, JSONL, ֤छΧϥϜφ (ྻࢤ޲) ϑΝΠϧͳͲ

    ͸ඪ४αϙʔτ • LTSV ͸ਖ਼نදݱͰύʔεͯ͠ΫΤϦͰ͖Δ • → ਖ਼نදݱͰύʔεͯ͠஋Λൈ͖ग़ͤΔϑΝΠϧͳ ΒԿͰ΋ΫΤϦͰ͖Δ • ύʔε͠ͳ͍͜ͱͰ grep as a service ͱͯ͠׆༻
  10. -- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT

    EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
  11. -- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT

    EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false')
  12. -- LTSV Λ Athena ͰΫΤϦ͢ΔͨΊʹύʔεͯ͠ΈΑ͏ CREATE EXTERNAL TABLE IF NOT

    EXISTS `sugoi-service-logs` ( `time` string, `host` string, `method` string, `path` string, `taken_sec` float, `ua` string, `referer` string ) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '^time:([^\\t]*)\\thost:([^\\t]*)\\tmethod:([^\\t]*)\\turi: ([^\\t]*)\\ttaken_sec:([^\\t]*)\\tua:([^\\t]*)\\treferer:([^\\t]*)$' ) LOCATION 's3://your-logs-bucket/sugoi-service-logs/' TBLPROPERTIES ('has_encrypted_data'='false') ……
  13. ৘ใ • S3 ʹͲΜͲΜϩάΛอଘ͢Δͱ͖ɺHive ܗࣜͷσΟϨΫτϦͰ อଘͯ͠ΔͱύʔςΟγϣχϯά͕ࣗಈͰߦΘΕͯศར • ྫ: s3://bucket/somelogs/year=2019/month=1/ day=23/engineer_seminar_001.log

    • ͦ͏ͳͬͯͳͯ͘΋ symlink ͰͳΜͱ͔Ͱ͖Δͱ͍͏࿩୊΋͋Δ • https://dev.classmethod.jp/cloud/aws/create-athena-table- with-symboliclink/
  14. Amazon Athena ஋ஈ • εΩϟϯͨ͠༰ྔʹରͯ͠՝ۚ (5 USD / 1 TB)

    • gzip ѹॖࡁϑΝΠϧ΋ͦͷ··ΫΤϦͰ͖Δ • ྻࢤ޲Ͱͳ͍ϑΝΠϧʹରͯ͠ΫΤϦ͢ΔͳΒɺूܭ ͢ΔΧϥϜΛߜͬͯ΋ྉۚઅ໿ʹ͸ͳΒͳ͍ • ج൫ͱͯ͠׆༻͍ͯ͘͠ײ͡ͳΒɺ
 ྻࢤ޲ϑΝΠϧʹม׵͢ΔϑΣʔζ͕ͳ͍ͱݫͦ͠͏