Upgrade to Pro — share decks privately, control downloads, hide ads and more …

embulk, digdagによるデータ基盤構築

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

embulk, digdagによるデータ基盤構築

株式会社タイミーのデータ基盤構築に関してのスライド

Avatar for Toshiki Tsuchikawa

Toshiki Tsuchikawa

June 09, 2020
Tweet

More Decks by Toshiki Tsuchikawa

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ ౔઒ ູੜ (Tsuchikawa Toshiki) • 2020೥3݄ ౦ژ޻ۀେֶ৘ใཧ޻ֶӃଔۀ • DRE,

    GrowthTeam @ Timee, Inc. - σʔλج൫ߏஙɺӡ༻ - ෼ੳɺABςετ • େֶӃޙظ՝ఔ @ ౦޻େ - εʔύʔίϯϐϡʔλɺػցֶशؔ࿈ • Twitter @tvtg_24 2
  2. embulk, digdagͱ͸ʁ • ΦʔϓϯιʔεͷόϧΫσʔλసૹπʔϧ • ༷ʑͳϓϥάΠϯΛ༻͍ͯɺinput, output, filterͳͲΛࢦఆ͠ɺɹ ॊೈͳσʔλసૹ͕Մೳ •

    ฒྻॲཧʹΑΓ୹࣌ؒͰసૹՄೳ • λεΫͷ࣮ߦɺεέδϡʔϦϯάɺϞχλϦϯά͢ΔͨΊͷπʔϧ • άϧʔϓԽͳͲΛ࢖͏͜ͱͰෳࡶͳϫʔΫϑϩʔΛఆٛͰ͖Δ • Τϥʔॲཧ΍ϦτϥΠ࣮ߦ͕؆୯ʹॻ͚Δ 4 4
  3. σʔλϕʔε → BigQuery 8 8 શςʔϒϧ໊ σʔλऔಘ 1ςʔϒϧ͝ͱ • ςʔϒϧ͸embulkͰҰͭͮͭॲཧ

    ‣ ςʔϒϧ͝ͱʹΧϥϜ͕ҟͳΔ (ΧϥϜ໊ΛεΩʔϚ໊ͱͯ͠ఆٛ) ‣ ϚεΩϯάͷॲཧ͕ҟͳΔ (ruby_procϓϥάΠϯ࢖༻) ‣ ྫ: ి࿩൪߸ 090-5333-2222 → 080-9446-3523 ϚεΩϯά
  4. σʔλϕʔε → BigQuery 10 10 table_A - id - …

    - … - updated_at ࠷ޙͷupdated_atΛอଘ > updated_at SELECT * EXCEPT(rn) FROM (SELECT *, row_number() over (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM (SELECT * FROM BQ_DATASET.`{0}`)) WHERE rn = 1 ORDER BY id".format(digdag.env.params['UPDATE_TABLE']) idͰpartition byͯ͠updated_atͰorder byͯ͠৽͍͠σʔλ͚ͩΛऔಘ https://tech.mercari.com/entry/2018/06/28/100000
  5. ϩά৘ใ(S3ͳͲ) → BigQuery 13 13 {method:”GET”… [cont-init.d… {severity: … {method:”POST”

    … ༷ʑͳܗࣜͷ ϩά͕ࠞࡏ cat $file | jq -e -r -R 'fromjson? | .log' | jq . -c | grep '^{"method' | sponge $file s3://.../2020/03/10/09 {method:”GET”… {method:”POST” … s3://.../2020/03/10/09 ϑΝΠϧΛ1ͭ ͮͭॲཧ͢Δ ཉ͍͠ΧϥϜ͚ͩΛऔ Γग़͠ɺBigQueryͷ εΩʔϚͱͯ͠ࢦఆ ※ @ͳͲɺεΩʔϚ໊ʹ ରԠ͠ͳ͍จࣈ͕͋Δ ͦ΋ͦ΋ɺཉ͍͠ϩά͕ೖͬͯͳ͍͜ͱ΋…! (timestampͳͲ) ↓ ࠷௿ݶཉ͍͠ϩά৘ใΛࣄલʹܝࣔ ௥Ճͷࡍ͸DRE͕ίʔσΟϯά
  6. Τϥʔॲཧ 16 16 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - …

    - name - updated_at table_A - id - … - last_name - updated_at ❌ 1౓શσʔλΛ࡟আͯ͠ೖΕ௚͢ඞཁ͕͋Δ ↓ (खಈͰશମʹߋ৽Λ͔͚ΔΑ͏ͳϫʔΫϑ ϩʔΛಛఆͷςʔϒϧͰಈ͔͢) ΤϥʔจຖճಡΈͨ͘ͳ͍… ୭͕͜ͷΤϥʔղফ͢Δͷ…?
  7. Τϥʔॲཧ 17 17 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - …

    - name - updated_at table_A - id - … - last_name - updated_at ❌ ୲౰ऀ͕Θ͔Γ΍͍͢!! σόοά΋͠΍͍͢!! Pull Request εΩʔϚ৘ใΛ؂ࢹ issueͱͯ͠௥Ճ ୲౰ऀΛࢦ໊ ୲౰ऀʹ௨஌