Upgrade to Pro — share decks privately, control downloads, hide ads and more …

iQONを支えるデータ分析基盤/iqon-bigquery

 iQONを支えるデータ分析基盤/iqon-bigquery

2015.04.23 えびスタ vol2での発表内容です @kyuns

5b25474e615258ee6421d5b9ef2d3f8e?s=128

Masayuki Imamura

April 23, 2015
Tweet

Transcript

  1. J20/Λࢧ͑Δ σʔλ෼ੳج൫ͱ#JH2VFSZ ͑ͼελWPM 7"4*-:!LZVOT

  2. ࠓଜխ޾ !LZVOT 7"4*-: *OD औక໾$50$P'PVOEFS ೥ʹ:BIPP+"1"/ʹ৽ଔೖࣾ :BIPP'"4)*0/ɺ9#3"/%ͳͲͷϝσΟΞͷ্ཱͪ͛ ೥ʹಠཱɺ7"4*-:Λ૑ۀɺऔక໾$50ʹब೚ ޷͖ͳ੠༏͸ਫथಸʑ

  3. ঁੑͷϑΝογϣϯϥΠϑΛΦγϟϨʹً͔ͤΔͨΊͷɺ ண͍ͨίʔσ͕ݟ͔ͭΔɺങ͑ΔɺஷΊΒΕΔϑΝογϣϯΞϓϦɻ ݱࡏສͷձһొ࿥ ٕज़ސ໰·ͭ΋ͱΏ͖ͻΖ

  4. J20/ͷσʔλͱ͸ʁ

  5. w ΞϓϦ಺Πϕϯτϩά w ϓογϡ௨஌ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ શͯ#JH2VFSZʹ͍Εͨ

  6. ͳͥσʔλΛूΊͨͷ͔ w ࣾ಺ͷ৭ʑͳσʔλ͕༷ʑͳετϨʔδʹ෼ࢄ ͍ͯ͠Δʢ.Z42- .POHP%# ֤छ4BB4ʜ  w σʔλΛՕॴʹूΊͯޮ཰తʹ෼ੳ͍ͨ͠ w

    #*πʔϧͰΰϦΰϦ෼ੳ͍ͨ͠ w μογϡϘʔυ࡞Γ͍ͨ
  7. ͳͥ#JH2VFSZͳͷ͔ʁ w JNQPSUͷܗ͕ࣜॊೈͰ͋Δ w #*πʔϧରԠ UBCMFBV  w ਺ԯϨίʔυͰ΋ߴ଎ʹಈ͘ w

    qVFOUEͷQMVHJO͕ἧͬͯΔ w TUSFBNJOHJOTFSUʹରԠ͍ͯ͠Δ
  8. w ΞϓϦ಺Πϕϯτϩά w ϓογϡ௨஌ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ

  9. ΞϓϦ಺ߦಈϩά ΞϓϦղੳπʔϧʮ-PDBMZUJDTʯͷੜσʔλ
 ϢʔβʔͷશΠϕϯτΛه࿥ͨ͠+40/ ԯߦ݄

  10. ऩूΞʔΩςΫνϟ Amazon S3 "844 &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ ΞϓϦ JNQPSUCBUDI -PDBMZUJDTͷTCVDLFU͔Β($4ܦ༝Ͱ#JH2VFSZʹDPQZ

    ଎౓ͷ໰୊ͰҰ౓(PPHMF$MPVE4UPSBHFʹDPQZͨ͠΄͏͕଎͍ ఆظతʹ#BUDIͰJNQPSU
  11. w ΞϓϦ಺Πϕϯτϩά w ϓογϡ௨஌ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ

  12. ϓογϡ௨஌ϩά w ΞϓϦͷϓογϡ௨஌ͷϩά w શͯͷϓογϡͷૹ৴ɺ։෧ϩά ສRQT  w ϓογϡͷ$53ޮՌݕূ΍λʔήςΟϯάΞϧ ΰϦζϜͳͲͷ࠷దԽʹར༻

    w qVFOUE͕େ׆༂
  13. ϩάϑΥʔϚοτ ͍ͭͲͷVTFSʹରͯ͠ͲΜͳϓογϡΛૹ͔ͬͨɺ։͍͔ͨ

  14. ϓογϡϩάΞʔΩςΫνϟ #JH2VFSZ "1*4FSWFST qVFOUE "1/T ($. -PH4FSWFS qVFOUE qVFOUEܦ༝Ͱ#JH2VFSZʹ 4USFBNJOH*OTFSU

    qVFOUQMVHJOCJHRVFSZΛར༻ ΞϓϦ Amazon S3 "844 ૹ৴ϩά ΫϦοΫϩά ૹ৴ˍΫϦοΫϩά
  15. <case event.notification.click> <store> type bigquery method insert auth_method private_key email

    xxxxxxxx@developer.gserviceaccount.com private_key_path /home/vasily/fluentd/XXXXXX.p12 buffer_type file flush_interval 0 try_flush_interval 0.05 queued_chunk_flush_interval 0.01 buffer_chunk_records_limit 250 buffer_chunk_limit 512k bugger_queue_limit 1024 retry_limit 5 retry_wait 0.5 num_threads 32 dataset app project iqon-data-mining auto_create_table true table event_notification_click_%Y%m schema_path /home/vasily/fluentd/schema_event_notification_click.json buffer_path /var/log/td-agent/buffer/bigquery/event.notification.click </store> </case> ສRQTҎ্ʹ଱͑ΔͨΊʹ CV⒎FS@UZQFΛpMFʹ͠ͳ͍ͱ͍͚ͳ͔ͬͨ
  16. w ΞϓϦ಺Πϕϯτϩά w ϓογϡ௨஌ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ

  17. Ϛελʔσʔλ w .Z42- 3%4 ʹ͋ΔJ20/ͷதͷϚελʔσʔλ
 6TFST *UFNT 4FUT #SBOET 4IPQʜFUD


    w ϢʔβʔͷΞΫςΟϏςΟσʔλ
 ΞΠςϜ-*,& ίʔσ-*,& ϑΥϩʔʜFUD
 ԯϨίʔυͱ͔͋Δςʔϒϧୡ w "84্ͷ3%4ʹ֨ೲ͞Ε͍ͯΔ΋ͷΛ#JH2VFSZʹ 4ZOD
  18. TZODํ๏ w ಠࣗεΫϦϓτͰҰׅJNQPSU w .Z42-ͷςʔϒϧఆ͔ٛΒࣗಈతʹ#JH2VFSZ ͷ4DIFNBΛੜ੒ w ·Δ͝ͱςʔϒϧΛEVNQͯ͠·Δ͝ͱDPQZ w ਺ઍສϨίʔυͰ΋ͦΜͳʹ͔͔࣌ؒΒͳ͍

     ෼ఔ౓
  19. 3%4UP#JH2VFSZ "843%4 &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ JNQPSUCBUDI EFTDUBCMF@OBNFͨ͠಺༰͔Β#JH2VFSZͷTDIFNBKTPOΛੜ੒ TFMFDU GSPNUBCMF@OBNFUBCMFUTWͨ͠಺༰Λ($4ʹVQMPBEͨ͠ޙɺ #JH2VFSZʹλϒ۠੾ΓUTWͱͯ͠DPQZΧϯϚ۠੾ΓͷσʔλʹରԠ͢ΔͨΊ

    bq load --max_bad_record=1000 --project_id=iqon-data-mining —source_format=CSV --field_delimiter='\t' --skip_leading_rows=1 iqon-data- mining:app.#{@table_name} gs://iqon-rds/#{@table_name}.tsv schema_#{@table_name}.json
  20. w ΞϓϦ಺Πϕϯτϩά w ϓογϡ௨஌ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ

  21. ηʔϧεσʔλ w ΞΠςϜͷߪೖΫϦοΫɺߪೖ׬ྃσʔλ w ֨ೲઌ͸.POHP%#   w ͱͯ΋େ͖͍ίϨΫγϣϯʢ਺े( 

    w NPOHPCRΛ༻͍ͯ#JH2VFSZʹҰׅJNQPSU
 !IBLPCFSB IUUQTHJUIVCDPNIBLPCFSBNPOHPCR
  22. .POHP%#UP#JH2VFSZ &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ JNQPSUCBUDI NPOHPCRίϚϯυΛ༻͍ͯTUSFBNͰॲཧͯ͘͠ΕΔ TDIFNBࣗಈ൑ఆػೳ΋͋Δ͕ɺಠࣗͰTDIFNBΛࢦఆ΋Ͱ͖Δ ࢦఆ͢ΔLFZpMF͸QͰ͸ͳ͘KTPOܗࣜͷ΋ͷͳͷͰ஫ҙ mongobq --host

    mongo02c --port 27017 --database iqon_conversion --collection click_log -q '{"date": "#{date}"}' --project iqon-data-mining --dataset app --keyfile /home/vasily/XXXX.json -B iqon-mongo -T click_log_test --schema ./ schema_mongo_iqon_conversion_click_log.json --autoclean
  23. ΞΫηεσʔλ ൪֎ฤ w (PPHMF"OBMZUJDTͷσʔλ͸ (PPHMF"OBMZUJDTϓϨϛΞϜͳΒ#JH2VFSZʹ ੜσʔλΛࣗಈతʹJNQPSUͯ͘͠ΕΔ w ສԁ݄͙Β͍ͱ͍͏ᷚ

  24. ·ͱΊ w શͯͷσʔλ΍ϩάΛ#JH2VFSZՕॴʹूΊͨ ͓͔͛Ͱ෼ੳ͕֨ஈʹޮ཰తʹͳͬͨ
 ϚελʔσʔλͱϩάσʔλΛ+0*/ͯ͠෼ੳ w #JH2VFSZ҆ͯ͘͏·͍
 5#ετϨʔδ࢖ͬͯ΋݄ ԁఔ౓
 εΩϟϯྉ͕5#͋ͨΓ݄ԁఔ౓

    ݱࡏ
  25. 8F`SF)JSJOH J20/͸೔ຊͰҰ൪ϑΝογϣϯʹ ؔ͢Δσʔλ͕ू·Δ৔ॴͰ͢ɻ σʔλΛ࢖ͬͯνϟϨϯδͯ͠Έ͍ͨਓΛ ͓଴͓ͪͯ͠Γ·͢ɻ JOGP!WBTJMZKQ