Slide 1

Slide 1 text

2020/06/09 ୈ4ճ σʔλΞʔΩςΫτʢσʔλ੔උਓʣΛ”લ޲͖ʹ”ߟ͑Δձ embulk, digdagʹΑΔσʔλج൫ߏங גࣜձࣾλΠϛʔ ౔઒ູੜ 1

Slide 2

Slide 2 text

ࣗݾ঺հ ౔઒ ູੜ (Tsuchikawa Toshiki) • 2020೥3݄ ౦ژ޻ۀେֶ৘ใཧ޻ֶӃଔۀ • DRE, GrowthTeam @ Timee, Inc. - σʔλج൫ߏஙɺӡ༻ - ෼ੳɺABςετ • େֶӃޙظ՝ఔ @ ౦޻େ - εʔύʔίϯϐϡʔλɺػցֶशؔ࿈ • Twitter @tvtg_24 2

Slide 3

Slide 3 text

ߏஙͨ͠σʔλج൫ਤ 3 3

Slide 4

Slide 4 text

embulk, digdagͱ͸ʁ • ΦʔϓϯιʔεͷόϧΫσʔλసૹπʔϧ • ༷ʑͳϓϥάΠϯΛ༻͍ͯɺinput, output, filterͳͲΛࢦఆ͠ɺɹ ॊೈͳσʔλసૹ͕Մೳ • ฒྻॲཧʹΑΓ୹࣌ؒͰసૹՄೳ • λεΫͷ࣮ߦɺεέδϡʔϦϯάɺϞχλϦϯά͢ΔͨΊͷπʔϧ • άϧʔϓԽͳͲΛ࢖͏͜ͱͰෳࡶͳϫʔΫϑϩʔΛఆٛͰ͖Δ • Τϥʔॲཧ΍ϦτϥΠ࣮ߦ͕؆୯ʹॻ͚Δ 4 4

Slide 5

Slide 5 text

σʔλϕʔε → BigQuery 5

Slide 6

Slide 6 text

σʔλϕʔε → BigQuery σʔλϕʔε͔Β1ςʔϒϧͣͭσʔλΛembulkʹinput embulk಺ͰϚεΩϯάॲཧΛͯ͠BigQueryʹग़ྗ digdagͰεέδϡʔϦϯά͠ɺࠩ෼࣮ߦ 6 6

Slide 7

Slide 7 text

σʔλϕʔε → BigQuery σʔλϕʔε͔Β1ςʔϒϧͣͭσʔλΛembulkʹinput embulk಺ͰϚεΩϯάॲཧΛͯ͠BigQueryʹग़ྗ digdagͰεέδϡʔϦϯά͠ɺࠩ෼࣮ߦ 7 7

Slide 8

Slide 8 text

σʔλϕʔε → BigQuery 8 8 શςʔϒϧ໊ σʔλऔಘ 1ςʔϒϧ͝ͱ • ςʔϒϧ͸embulkͰҰͭͮͭॲཧ ‣ ςʔϒϧ͝ͱʹΧϥϜ͕ҟͳΔ (ΧϥϜ໊ΛεΩʔϚ໊ͱͯ͠ఆٛ) ‣ ϚεΩϯάͷॲཧ͕ҟͳΔ (ruby_procϓϥάΠϯ࢖༻) ‣ ྫ: ి࿩൪߸ 090-5333-2222 → 080-9446-3523 ϚεΩϯά

Slide 9

Slide 9 text

σʔλϕʔε → BigQuery σʔλϕʔε͔Β1ςʔϒϧͣͭσʔλΛembulkʹinput embulk಺ͰϚεΩϯάॲཧΛͯ͠BigQueryʹग़ྗ digdagͰεέδϡʔϦϯά͠ɺࠩ෼࣮ߦ 9 9

Slide 10

Slide 10 text

σʔλϕʔε → BigQuery 10 10 table_A - id - … - … - updated_at ࠷ޙͷupdated_atΛอଘ > updated_at SELECT * EXCEPT(rn) FROM (SELECT *, row_number() over (PARTITION BY id ORDER BY updated_at DESC) AS rn FROM (SELECT * FROM BQ_DATASET.`{0}`)) WHERE rn = 1 ORDER BY id".format(digdag.env.params['UPDATE_TABLE']) idͰpartition byͯ͠updated_atͰorder byͯ͠৽͍͠σʔλ͚ͩΛऔಘ https://tech.mercari.com/entry/2018/06/28/100000

Slide 11

Slide 11 text

ϩά৘ใ(S3ͳͲ) → BigQuery 11

Slide 12

Slide 12 text

ϩά৘ใ(S3ͳͲ) → BigQuery 12 12 ετϨʔδͷϩά͔Βཉ͍͠ϩάΛநग़ ϩάΛembulk, BigQueryʹରԠͨ͠ܗࣜʹՃ޻ ετϨʔδͷ೔෇৘ใΛ΋ͱʹࠩ෼࣮ߦ

Slide 13

Slide 13 text

ϩά৘ใ(S3ͳͲ) → BigQuery 13 13 {method:”GET”… [cont-init.d… {severity: … {method:”POST” … ༷ʑͳܗࣜͷ ϩά͕ࠞࡏ cat $file | jq -e -r -R 'fromjson? | .log' | jq . -c | grep '^{"method' | sponge $file s3://.../2020/03/10/09 {method:”GET”… {method:”POST” … s3://.../2020/03/10/09 ϑΝΠϧΛ1ͭ ͮͭॲཧ͢Δ ཉ͍͠ΧϥϜ͚ͩΛऔ Γग़͠ɺBigQueryͷ εΩʔϚͱͯ͠ࢦఆ ※ @ͳͲɺεΩʔϚ໊ʹ ରԠ͠ͳ͍จࣈ͕͋Δ ͦ΋ͦ΋ɺཉ͍͠ϩά͕ೖͬͯͳ͍͜ͱ΋…! (timestampͳͲ) ↓ ࠷௿ݶཉ͍͠ϩά৘ใΛࣄલʹܝࣔ ௥Ճͷࡍ͸DRE͕ίʔσΟϯά

Slide 14

Slide 14 text

Τϥʔॲཧ 14

Slide 15

Slide 15 text

Τϥʔॲཧ 15 15 ਺ेճʹҰճఔ౓BigQueryͷಉظ͕ࣦഊ͢Δ https://github.com/szyn/digdag-slack

Slide 16

Slide 16 text

Τϥʔॲཧ 16 16 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - … - name - updated_at table_A - id - … - last_name - updated_at ❌ 1౓શσʔλΛ࡟আͯ͠ೖΕ௚͢ඞཁ͕͋Δ ↓ (खಈͰશମʹߋ৽Λ͔͚ΔΑ͏ͳϫʔΫϑ ϩʔΛಛఆͷςʔϒϧͰಈ͔͢) ΤϥʔจຖճಡΈͨ͘ͳ͍… ୭͕͜ͷΤϥʔղফ͢Δͷ…?

Slide 17

Slide 17 text

Τϥʔॲཧ 17 17 εΩʔϚมߋɺ࡟আͷࡍʹΤϥʔ͕ग़Δ (ओʹσʔλϕʔε) table_A - id - … - name - updated_at table_A - id - … - last_name - updated_at ❌ ୲౰ऀ͕Θ͔Γ΍͍͢!! σόοά΋͠΍͍͢!! Pull Request εΩʔϚ৘ใΛ؂ࢹ issueͱͯ͠௥Ճ ୲౰ऀΛࢦ໊ ୲౰ऀʹ௨஌