Upgrade to Pro — share decks privately, control downloads, hide ads and more …

embulk, digdagによるデータ基盤構築

embulk, digdagによるデータ基盤構築

株式会社タイミーのデータ基盤構築に関してのスライド

Toshiki Tsuchikawa

June 09, 2020
Tweet

More Decks by Toshiki Tsuchikawa

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ ౔઒ ູੜ (Tsuchikawa Toshiki) • 2020೥3݄ ౦ژ޻ۀେֶ৘ใཧ޻ֶӃଔۀ • DRE,

    GrowthTeam @ Timee, Inc. - σʔλج൫ߏஙɺӡ༻ - ෼ੳɺABςετ • େֶӃޙظ՝ఔ @ ౦޻େ - εʔύʔίϯϐϡʔλɺػցֶशؔ࿈ • Twitter @tvtg_24 2
  2. embulk, digdagͱ͸ʁ • ΦʔϓϯιʔεͷόϧΫσʔλసૹπʔϧ • ༷ʑͳϓϥάΠϯΛ༻͍ͯɺinput, output, filterͳͲΛࢦఆ͠ɺɹ ॊೈͳσʔλసૹ͕Մೳ •

    ฒྻॲཧʹΑΓ୹࣌ؒͰసૹՄೳ • λεΫͷ࣮ߦɺεέδϡʔϦϯάɺϞχλϦϯά͢ΔͨΊͷπʔϧ • άϧʔϓԽͳͲΛ࢖͏͜ͱͰෳࡶͳϫʔΫϑϩʔΛఆٛͰ͖Δ • Τϥʔॲཧ΍ϦτϥΠ࣮ߦ͕؆୯ʹॻ͚Δ 4 4
  3. σʔλϕʔε → BigQuery 8 8 શςʔϒϧ໊ σʔλऔಘ 1ςʔϒϧ͝ͱ • ςʔϒϧ͸embulkͰҰͭͮͭॲཧ

    ‣ ςʔϒϧ͝ͱʹΧϥϜ͕ҟͳΔ (ΧϥϜ໊ΛεΩʔϚ໊ͱͯ͠ఆٛ) ‣ ϚεΩϯάͷॲཧ͕ҟͳΔ (ruby_procϓϥάΠϯ࢖༻) ‣ ྫ: ి࿩൪߸ 090-5333-2222 → 080-9446-3523 ϚεΩϯά
  4. σʔλϕʔε → BigQuery 10 10 table_A - id - …

    - … - updated_at ࠷ޙͷupdated_atΛอଘ > updated_at SELECT * EXCEPT(rn) FROM (SELECT *, row_number() over (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM (SELECT * FROM BQ_DATASET.`{0}`)) WHERE rn = 1 ORDER BY id".format(digdag.env.params['UPDATE_TABLE']) idͰpartition byͯ͠updated_atͰorder byͯ͠৽͍͠σʔλ͚ͩΛऔಘ https://tech.mercari.com/entry/2018/06/28/100000
  5. ϩά৘ใ(S3ͳͲ) → BigQuery 13 13 {method:”GET”… [cont-init.d… {severity: … {method:”POST”

    … ༷ʑͳܗࣜͷ ϩά͕ࠞࡏ cat $file | jq -e -r -R 'fromjson? | .log' | jq . -c | grep '^{"method' | sponge $file s3://.../2020/03/10/09 {method:”GET”… {method:”POST” … s3://.../2020/03/10/09 ϑΝΠϧΛ1ͭ ͮͭॲཧ͢Δ ཉ͍͠ΧϥϜ͚ͩΛऔ Γग़͠ɺBigQueryͷ εΩʔϚͱͯ͠ࢦఆ ※ @ͳͲɺεΩʔϚ໊ʹ ରԠ͠ͳ͍จࣈ͕͋Δ ͦ΋ͦ΋ɺཉ͍͠ϩά͕ೖͬͯͳ͍͜ͱ΋…! (timestampͳͲ) ↓ ࠷௿ݶཉ͍͠ϩά৘ใΛࣄલʹܝࣔ ௥Ճͷࡍ͸DRE͕ίʔσΟϯά
  6. Τϥʔॲཧ 16 16 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - …

    - name - updated_at table_A - id - … - last_name - updated_at ❌ 1౓શσʔλΛ࡟আͯ͠ೖΕ௚͢ඞཁ͕͋Δ ↓ (खಈͰશମʹߋ৽Λ͔͚ΔΑ͏ͳϫʔΫϑ ϩʔΛಛఆͷςʔϒϧͰಈ͔͢) ΤϥʔจຖճಡΈͨ͘ͳ͍… ୭͕͜ͷΤϥʔղফ͢Δͷ…?
  7. Τϥʔॲཧ 17 17 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - …

    - name - updated_at table_A - id - … - last_name - updated_at ❌ ୲౰ऀ͕Θ͔Γ΍͍͢!! σόοά΋͠΍͍͢!! Pull Request εΩʔϚ৘ใΛ؂ࢹ issueͱͯ͠௥Ճ ୲౰ऀΛࢦ໊ ୲౰ऀʹ௨஌