Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyCon JP 2014 Python + Hive on AWS EMRで貧者のログ集計

Akira Chiku
September 14, 2014

PyCon JP 2014 Python + Hive on AWS EMRで貧者のログ集計

Akira Chiku

September 14, 2014
Tweet

More Decks by Akira Chiku

Other Decks in Technology

Transcript


  1. 1ZUIPO"84&.3
    顆罏ךؚٗ꧊鎘
    1Z$PO+1
    "LJSB$IJLV

    View full-size slide

  2. BDIJLV


    /BNF"LJSB$IJLV
    5XJUUFS!@BDIJLV
    (JU)VC!BDIJLV

    馯㄂拦莸 "LJSB$IJLV'JSFד嗚稊

    耵噟ؒٝآص،!LBONV

    View full-size slide

  3. (PBM

    Ø  չז׈׉ך圓䧭זךַպח搊挿׾縧ְ׋✲⢽ךⰟ剣
    Ø  չⰅꟌ⟃♳պ׾湡䭷׃׋1ZUIPO&.3崞欽倯岀ךⰟ剣
    涺ׁ׿ך鑧׾耀ֹ׋ְ 荈ⴓָ

    View full-size slide

  4. ,BONV #VTJOFTT


    Ø  ؕ٦س⠓爡ה⼿噟׃׋寸幥ر٦ةⴓ匿
    Ø  ؕ٦سח秡בֻؙ٦هٝךꂁ⥋
    Ø  $BSE-JOLFE0FS $-0

    View full-size slide

  5. $BSE-JOLFE0FS


    View full-size slide

  6. $BSE-JOLFE0FS


    չ"䏄ךؙ٦هٝؒٝزٔ٦׃
    ׋ؕ٦سד顠ְ暟ׅ׸לه
    ؎ٝز؜حزկպ
    չְֲֲֶֿ㹏ׁ׿ח
    ְֲֲֿؙ٦هٝ⳿׃׋ְպ

    չְֲֲֿ飑顠⫘ぢךֶ㹏ׁ׿ך倯
    ְְָךדכպ

    չֿ׿ז穠卓׌׏׋ךדծ如㔐כֿ
    ְֲֲإًؚٝزⴖ׶ת׃׳ֲպ
    ؕ٦س
    ⠓爡
    ,BONV
    ؕ٦س
    ⠓㆞
    ֶ䏄

    View full-size slide

  7. 2VJDL4VSWFZ

    Ø  ؚٗⴓ匿חꟼ׻׏גְ׵׏׃ׯ׷倯
    Ø  )BEPPQ⢪׏ג׵׏׃ׯ׷倯
    Ø  )JWF⢪׏ג׵׏׃ׯ׷倯
    Ø  &.3⢪׏ג׵׏׃ׯ׷倯

    View full-size slide


  8. չז׈׉ך圓䧭זךַպ
    ח搊挿׾䔲ג׋✲⢽ךⰟ剣

    View full-size slide


  9. 䒦爡ך⵸䲿

    View full-size slide


  10. ֿךز٦ؙךة؎زٕ

    View full-size slide


  11. 顆罏ךؚٗ꧊鎘

    View full-size slide


  12. 1PPSNBOˏT
    Ø  ➙֮׷植朐׾⯋ח
    Ø  満ٔا٦أ ➂儗꟦穗꿀
    ד湡涸׾麦䧭ׅ׷㪦⹲
    Ø  湡涸׾麦䧭ׅ׷أؾ٦س׾〳腉זꣲ׶♳־׷㪦⹲
    Ø  搀欽ז佄⳿׾鼘ֽ׷㪦⹲

    View full-size slide

  13. ,BONV &OHJOFFS5FBN


    NBLJ
    $&0&OHJOFFS

    @JEFZVUB
    %FTJHOFS

    NPRBEB
    &OHJOFFS

    @BDIJLV
    &OHJOFFS

    爡ꞿ噟灇瑔Ꟛ涪
    رؠ؎ٝؿٗٝز
    أوم،فؚٔٗⴓ匿
    ؿٗٝزغحؙؒٝس
    ؎ٝؿٓأوم،فٔ
    غحؙؒٝس؎ٝؿٓ
    ⴓ匿㛇湍ؚٗⴓ匿㼎ػ٦زش٦璞〡

    View full-size slide


  14. 3FRVJSFNFOUT
    Ø  ֮׷玎䏝ךꆀחז׷ر٦ة׾أزٖأ搀ֻ꧊鎘׃׋ְ
    •  "WF(EBZ .BY(EBZ ꬊ㖇簭

    •  (#剢 ꬊ㖇簭

    •  ؟٦ؽأך䧭ꞿהⰟח㟓ִ׷鋅鴥׫
    •  剢⽃⡘ד،سمحؙזؙؒٔ׮䫎־׋ְ
    Ø  爡ⰻח㣐鋉垷ر٦ة׾Ⳣ椚ׅ׷濼鋅׾顕׭׋ְ
    •  չل٦أꂁⴓ׾׃׋♳דպ濼鋅׾顕׭׷
    •  㢩鿇ח⳿׃חְֻإٝءذ؍ـזر٦ة׮㶷㖈
    Ø  麊欽؝أزⴱ劍䫎项׾⡚ֻ䫇ִ׋ְ

    View full-size slide


  15. /PU3FRVJSFNFOUT
    Ø  ⴓ匿ָٔ،ٕة؎يד֮׷䗳銲䚍כ植朐넝ֻזְ
    Ø  ،سمحؙⴓ匿㛇湍ך؟٦ؽأٖكٕכ寸׃ג넝ֻזְ
    •  兛鸐ךغحثⳢ椚כ衅׍גכ꼽湡׌ֽו
    •  䌢חⵃ欽〳腉ז朐䡾חז׏גְזֻג׮葺ְ
    •  ⵃ欽כ爡ⰻחꣲ㹀ׁ׸גְ׷
    Ø  ׋׌׃ծ♳鎸ָ3FRVJSFNFOUTחז׷〳腉䚍כ⼧ⴓ剣׷

    View full-size slide


  16. "NB[PO&MBTUJD.BQ3FEVDF

    View full-size slide


  17. Ø  侧֮׷"84؟٦ؽأךֲ׍ך♧א
    Ø  )BEPPQװ)BEPPQؒ؝ءأذيⰻך48ָر
    ؿٕؓزדⵃ欽〳腉
    Ø  "1*ד饯⹛ծ+PCך㹋遤ծ⨡姺׾乼⡲〳腉
    Ø  ٌصةؚٔٝ瘝׮״׃זח㹋倵׃גֻ׸׷
    Ø  4׾)%'4ך剏׶חⵃ欽〳腉
    Ø  ؙٓأةך〴侧㢌刿ָ㺁僒
    "84&.3

    View full-size slide


  18. "SDIJUFDUVSF
    盖椚؟٦غ
    ؙ٦هٝ
    ꂁ⥋؟٦غ
    ؙ٦هٝ
    ꂁ⥋؟٦غ
    •  ꂁ⥋؟٦غ♳ך'MVFOUEדؚٗ꧊《
    •  VFOUETQMVHJOד ꧊׃׋ؚٗ׾
    4♳ח⥂㶷
    •  &.3♳ך)JWFדؚٗ׾⸇䊨ծ꧊鎘
    •  ꧊鎘⦼׾3%4ח⥂㶷׃ג〳鋔⻉

    View full-size slide


  19. %BUB"OBMZTJT'MPX CZUBHPNPSJT

    1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F
    ⳿ⰩIUUQXXXTMJEFTIBSFOFUUBHPNPSJTIBOEMJOHOPUTPCJHEBUB

    View full-size slide


  20. 1PPSNBOˏT%BUB"OBMZTJT'MPX
    1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F

    View full-size slide


  21. $PMMFDU
    1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F

    View full-size slide


  22. $PMMFDU
    Ø  ؙ٦هٝꂁ⥋؟٦غַ׵'MVFOUEVFOUET
    QMVHJO׾ⵃ欽׃גؚٗ׾굲לׅ
    Ø  굲לؚׅٗכِ٦ؠך،ؙءّٝ׾2VFSZ4USJOH
    חろ׭ג굲לׅ
    •  醱꧟ז+40/כ굲לׁ׆ծ2VFSZ4USJOHח䞔㜠鯹ׇ׷
    •  )JWFדך꧊鎘儗חⰋג+40/ח㢌䳔
    •  IUUQTFYBNQMFDPNCFBDPO TVCPCKDPVQPOBDUJPODMJDLDJE

    Ø  'MVFOUE꧊秈؟٦غכⵃ欽׃זְ
    •  ٔ،ٕة؎ي꧊鎘ך䗳銲䚍כ植朐넝ֻזְ
    •  ⱔꞿ圓䧭׮罋ִילז׵׆醱꧟חז׷
    •  4ך㸜㹀䠬חֶ⟣ׇ׃׋ְ

    View full-size slide


  23. 1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F
    4UPSF

    View full-size slide


  24. 4UPSF
    Ø  ה׶ִ֮׆4ח굲לׅ
    Ø  4ךغ؛حزכ劤殢嗚鏾דⴓֽגֶֻ
    •  غ؛حز⽃⡘ד،ؙإأ؝ٝزٗ٦ٕ〳腉
    •  FYBNQMFDPNQSPEVDUJPOMPH
    Ø  ؟٦غ䕵ⶴⴽחؗ٦׾ⴓֽגֶֻ
    •  ⴽ؟٦غָ㟓ִג׮㸜䗰
    •  FYBNQMFDPNQSPEVDUJPOMPHBQJ
    Ø  傈ⴽחؗ٦׾ⴓֽגֶֻ
    •  )JWFךػ٦ذ؍ءّٝ׾ⵃ欽ׅ׷捀
    •  FYBNQMFDPNQSPEVDUJPOMPHBQJEU

    View full-size slide


  25. 1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F
    1SPDFTT

    View full-size slide


  26. 1SPDFTT
    Ø  ⥋걾ה㹋籐ך㢸꟦غحث
    •  盖椚؟٦غַ׵)BEPPQ)JWFך&.3׾饯⹛
    •  'MVFOUE꧊秈؟٦غ׾ⵃ欽׃גְזְ捀稢ⴖ׸הז׏׋ؚٗؿ؋؎
    ٕ׾㖇簭ծ穠さ )BEPPQכ稢ⴖ׸㼭ְׁؿ؋؎ٕךⳢ椚蕱䩛

    •  ؚٗח鎸ꐮׁ׸גְ׷2VFSZ4USJOH׾6%'׾ⵃ欽׃ג+40/ח㢌䳔
    •  鋅׷ץֹ鯥ד꧊鎘׃ג⥂㶷
    •  ♳鎸Ⰻגך1SPDFTT׾)%'4חر٦ة׾衅הׁ׆4׾ⵃ欽׃ג㹋遤
    •  剑穄涸ז꧊鎘⦼׾3%4ח呓秛
    Ø  厫鮾ד鸞ְ儎꟦ؙؒٔ
    •  盖椚؟٦غַ׵)BEPPQ)JWF1SFTUPך&.3׾饯⹛
    •  1SFTUPָ)JWFךًةأز، ذ٦ـٕ㹀纏
    ׾⿫撑
    •  ر٦ةכⰋג4♳ח֮׷

    View full-size slide


  27. 1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F
    7JTVBMJ[F

    View full-size slide


  28. 7JTVBMJ[F
    Ø  &.3ד꧊鎘׃׋ر٦ة׾.Z42-חٗ٦س
    Ø  盖椚؟٦غ♳ד⹛ֻ؟٦ؽأ׾ⵃ欽׃ג⦼׾〳鋔⻉
    •  ًٝغ٦Ⰻ㆞ָずׄ⦼׾鋅ג侧⦼然钠
    Ø  ⡭׏ג׷爡ⰻ؟٦غח鑐꿀涸ח&MBTUJDTFBSDI,JCBOB׾
    㼪Ⰵ
    •  ر٦ة׾䒚׶זָ׵ⴓ匿鯥׾罋ִ׋ְ儗ח⤑ⵃ

    View full-size slide


  29. 1PPSNBOˏT%BUB"OBMZTJT'MPX
    1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F

    View full-size slide


  30. ד׮䗳銲חז׏׋׵鷄⸇דֹ׷

    View full-size slide


  31. 鑥תזְ״ֲח׃גֶֻ

    View full-size slide


  32. 1PPSNBOˏT%BUB"OBMZTJT'MPX
    1SPDFTT
    $PMMFDU 1BSTF
    $MFBOVQ
    4UPSF 1SPDFTT
    7JTVBMJ[F

    View full-size slide


  33. 3FGFSFODFT
    Ø  "84"NB[PO&.3#FTU1SBDUJDFT
    •  ؝ٖ׾铣׭ל荈ⴓ麦ך؝ٝذؙأزחさ׏׋&.3圓䧭ָ׻ַ׷կ
    )BEPPQךⰅꟌה׃ג׮葺ְךדכկ
    Ø  NJYJך鍑匿㛇湍ה"QBDIF)JWFדך+40/ػ٦؟
    ך崞欽ך稱➜
    •  +40/ד顕׭ג7JFXדذ٦ـٕ׏שֻ䪔ֲ،؎ر؍،׾顗׏׋կؚٗ
    ꧊鎘חꟼ׻׷➂麦ך؝ىُص؛٦ءّٝ؝أزծהְֲ嚊䙀׮顗׏׋կ
    Ø  #BUDI1SPDFTTJOHBOE4USFBN1SPDFTTJOHCZ42-
    •  ֿךز٦ؙ׾耀ְגⴓ匿㛇湍ח.11禸ؒٝآٝ׾ⵃ欽ׅ׷✲׾寸䠐կ
    *NQBMBה1SFTUP׾嫰鯰׃ծ4ח׮湫䱸ؙؒٔ׾䫎־׸׷1SFTUP׾㼪
    Ⰵ׃׋կ *NQBMB׮如劍غ٦آّٝדכ4ח湫䱸ؙؒٔ䫎־׸׷׵׃
    ְךד׉ך儗חⱄ䏝嗚鏾✮㹀

    View full-size slide


  34. չⰅꟌ⟃♳պ׾湡䭷׃׋
    1ZUIPO&.3崞欽倯岀ךⰟ剣
    ؚٗ꧊鎘

    View full-size slide


  35. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  36. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  37. BXTDMJ
    Ø  ٔٔ٦أך7FSַ׵&.3堣腉ך1SFWJFX
    أذ٦ةأָ《׸ծ兦׸ג㸜㹀׃׋"1*ה׃גⵃ欽〳腉
    Ø  ➙תדرؿ؋ؙز׌׏׋3VCZך&MBTUJD.BQ3FEVDFأؙ
    ٔفزַ׵⛦׶䳔ִ
    •  QJQד知⽃ח؎ٝأز٦ٕדֹ׷
    •  ⟃⵸ַ׵BXTDMJ׾⢪׏ג׷ךדخ٦ٕ窟♧
    •  (JU)VC♳דךꟚ涪ָ崞涪ד13׮⳿ׇ׷

    View full-size slide


  38. 8F-PWF1ZUIPO

    View full-size slide


  39. $  mkvirtualenv  pycon-­‐emr-­‐dev  
    (pycon-­‐emr-­‐dev)$  pip  install  awscli  
    (pycon-­‐emr-­‐dev)$  mkdir  ~/.awscli  
    (pycon-­‐emr-­‐dev)$  cat  <<-­‐EOF  >>    ~/.awscli/config  
    [profile  development]  
    aws_access_key_id=  
    aws_secret_access_key=  
    region=ap-­‐northeast-­‐1  
    EOF  
    (pycon-­‐emr-­‐dev)$  cat  <<-­‐EOF  >>    $VIRTUAL_ENV/bin/activate  
    export  AWS_CONFIG_FILE=~/.awscli/config  
    export  AWS_DEFAULT_PROFILE=development  
    source  aws_zsh_completer.sh  
    EOF  

    View full-size slide


  40. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  41. $  aws  emr  create-­‐cluster  -­‐-­‐ami-­‐version  3.1.1  \  
           -­‐-­‐name  'PyConJP  2014  (AMI  3.1.1  Hive)'  \  
           -­‐-­‐tags  Name=pycon-­‐jp-­‐emr  environment=development  \  
           -­‐-­‐ec2-­‐attributes  KeyName=yourkey  
           -­‐-­‐log-­‐uri  's3://yourbucket/jobflow_logs/'  \  
           -­‐-­‐no-­‐auto-­‐terminate  \  
           -­‐-­‐visible-­‐to-­‐all-­‐users  \  
           -­‐-­‐instance-­‐groups  file://./normal-­‐instance-­‐setup.json  \  
           -­‐-­‐applications  file://./app-­‐hive.json  

    View full-size slide


  42. [  
       {  
             "Name":  "emr-­‐master",  
             "InstanceGroupType":  "MASTER",  
             "InstanceCount":  1,  
             "InstanceType":  "m1.medium"  
       },  
       {  
             "Name":  "emr-­‐core",  
             "InstanceGroupType":  "CORE",  
             "InstanceCount":  2,  
             "InstanceType":  "m1.medium"  
       }  
    ]  
    [  
       {  
           "Name":  "HIVE"  
       }  
    ]  
    OPSNBMJOTUBODFHSPVQKTPO BQQIJWFKTPO

    View full-size slide


  43. SFTVMU
    {  
           "ClusterId":  "j-­‐8xxxxxxxxx"  
    }  

    View full-size slide


  44. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  45. $  aws  emr  add-­‐steps  -­‐-­‐cluster-­‐id  j-­‐8xxxxxxxxx  \  
           -­‐-­‐steps  file://./hive-­‐sample-­‐step-­‐1.json  

    View full-size slide


  46. [  
       {  
           "Args":  [  
               "-­‐f",  "s3n://yourbucket/hive-­‐script/sample01.hql",  
               "-­‐d",  "BUCKET_NAME=yourbucket",  
               "-­‐d",  "TARGET_DATE=20140818"  
           ],  
           "ActionOnFailure":  "CONTINUE",  
           "Name":  "Hive  Sample  Program  01",  
           "Type":  "HIVE"  
       },  
       {  
           "Args":  [  
               "-­‐f",  "s3n://yourbucket/hive-­‐script/sample02.hql",  
               "-­‐d",  "BUCKET_NAME=yourbucket",  
               "-­‐d",  "TARGET_DATE=20140818"  
           ],  
           "ActionOnFailure":  "CONTINUE",  
           "Name":  "Hive  Sample  Program  02",  
           "Type":  "HIVE"  
       }  
    ]  
    IJWFTBNQMFTUFQKTPO

    View full-size slide


  47. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  48. $  aws  emr  add-­‐steps  -­‐-­‐cluster-­‐id  j-­‐8xxxxxxxxx  \  
           -­‐-­‐steps  file://./s3distcp-­‐sample-­‐step.json  

    View full-size slide


  49. [  
       {  
           "Name":  "s3distcp  Sample",  
           "ActionOnFailure":  "CONTINUE",  
           "Jar":  "/home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar",  
           "Type":  "CUSTOM_JAR",  
           "Args":  [  
               "-­‐-­‐src",  "s3n://yourbucket/access_log/dt=20140818",  
               "-­‐-­‐dest",  "s3n://yourbucket/compressed_log/dt=20140818",  
               "-­‐-­‐groupBy",  ".*(nginx_access_log-­‐).*",  
               "-­‐-­‐targetSize",  "100",  
               "-­‐-­‐outputCodec",  "gzip"  
           ]  
       }  
    ]  
    TEJTUDQTBNQMFTUFQKTPO

    View full-size slide


  50. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  51. $  aws  emr  create-­‐cluster  -­‐-­‐ami-­‐version  3.1.1  \  
           -­‐-­‐name  'PyConJP  2014  (AMI  3.1.1  Hive)'  \  
           -­‐-­‐tags  Name=pycon-­‐jp-­‐emr  environment=development  \  
           -­‐-­‐ec2-­‐attributes  KeyName=yourkey  
           -­‐-­‐log-­‐uri  's3://yourbucket/jobflow_logs/'  \  
           -­‐-­‐no-­‐auto-­‐terminate  \  
           -­‐-­‐visible-­‐to-­‐all-­‐users  \  
           -­‐-­‐instance-­‐groups  file://./normal-­‐instance-­‐setup.json  \  
           -­‐-­‐applications  file://./app-­‐hive-­‐with-­‐config.json  

    View full-size slide


  52. [  
       {  
           "Args":  [  
               "-­‐-­‐hive-­‐site=s3://yourbucket/libs/config/hive-­‐site.xml"  
           ],  
           "Name":  "HIVE"  
       }  
    ]  
    BQQIJWFXJUIDPOHKTPO

    View full-size slide


  53.  
     
     
         
           hive.optimize.s3.query  
           true  
           Optimize  query  on  S3  
         
     
    IJWFTJUFYNM

    View full-size slide


  54. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  55. $  aws  emr  create-­‐cluster  -­‐-­‐ami-­‐version  3.1.1  \  
           -­‐-­‐name  'PyConJP  2014  (AMI  3.1.1  Hive  +  Presto)'  \  
           -­‐-­‐tags  Name=pycon-­‐jp-­‐emr  environment=development  \  
           -­‐-­‐ec2-­‐attributes  KeyName=yourkey  
           -­‐-­‐log-­‐uri  's3://yourbucket/jobflow_logs/'  \  
           -­‐-­‐no-­‐auto-­‐terminate  \  
           -­‐-­‐visible-­‐to-­‐all-­‐users  \  
           -­‐-­‐instance-­‐groups  file://./normal-­‐instance-­‐setup.json  \  
           -­‐-­‐bootstrap-­‐actions  file://./bootstrap-­‐presto.json  \  
           -­‐-­‐applications  file://./app-­‐hive-­‐with-­‐config.json  

    View full-size slide


  56. [  
       {  
           "Name":  "Install/Setup  Presto",  
           "Path":  "s3://yourbucket/libs/setup-­‐presto.rb",  
           "Args":  [  
               "-­‐-­‐task_memory",  "1GB",  
               "-­‐-­‐log-­‐level",  "DEGUB",  
               "-­‐-­‐version",  "0.75",  
               "-­‐-­‐presto-­‐repo-­‐url",  "http://central.maven.org/maven2/com/
    facebook/presto/",  
               "-­‐-­‐sink-­‐buffer-­‐size",  "1GB",  
               "-­‐-­‐query-­‐max-­‐age",  "1h",  
               "-­‐-­‐jvm-­‐config",    
               "-­‐server  -­‐Xmx2G  -­‐XX:+UseConcMarkSweepGC  -­‐XX:
    +ExplicitGCInvokesConcurrent  -­‐XX:+CMSClassUnloadingEnabled  -­‐XX:
    +AggressiveOpts  -­‐XX:+HeapDumpOnOutOfMemoryError  -­‐
    XX:OnOutOfMemoryError=kill  -­‐9  %p  -­‐XX:PermSize=150M  -­‐
    XX:MaxPermSize=150M  -­‐XX:ReservedCodeCacheSize=150M  -­‐
    Dhive.config.resources=/home/hadoop/conf/core-­‐site.xml,/home/
    hadoop/conf/hdfs-­‐site.xml"  
           ]  
       }  
    ]  

    View full-size slide


  57. Ø  TFUVQQSFTUPSC㹋䡾כ IUUQTHJUIVCDPN
    BXTMBCTFNSCPPUTUSBQBDUJPOTCMPCNBTUFS
    QSFTUPJOTUBMM

    Ø  "84ָ㹋꿀涸ח⳿׃ג׷1SFTUP׾&.3חⰅ׸׷捀
    ך#PPUTUSBQأؙٔفز
    Ø  ".*PSדכ⹛ְ׋ֽוծ".*דכ
    ⹛ַזַ׏׋ )JWF)JWF

    Ø  5ISJGU4FSWJDFךه٦زָ殯ז׷׏שְ

    View full-size slide


  58. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  59. Ø  .FUBTUPSFהכ)JWFךذ٦ـٕ㹀纏瘝ך䞔㜠׾⥂
    㶷׃גֶֻ㜥䨽ךֿה
    Ø  植㖈㢳ֻכ.Z42-ָⵃ欽ׁ׸גְ׷
    Ø  ⡦׮鏣㹀׃זְה&.3ך؎ٝأةٝأך.Z42-ח
    ⥂㶷ׁ׸׷
    Ø  .FUBTUPSF׾&.3㢩鿇ך%#ח鏣㹀׃גֶֻֿהדծ
    &.3甧׍♳־׷ꥷח%%-׾ⱄ䏝崧ׁזֻג׮葺ֻ
    ז׷
    Ø  %#⩎ך4FDVSJUZ(SPVQ׾⥜姻ׅ׷䗳銲֮׶

    View full-size slide


  60.  
         
           hive.optimize.s3.query  
           true  
           Optimize  query  on  S3  
         
         
           javax.jdo.option.ConnectionURL  
           jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true  
           JDBC  connect  string  for  a  JDBC  metastore  
         
         
           javax.jdo.option.ConnectionDriverName  
           com.mysql.jdbc.Driver  
           Driver  class  name  for  a  JDBC  metastore  
         
         
           javax.jdo.option.ConnectionUserName  
           username  
           Username  to  use  against  metastore  database  
         
         
           javax.jdo.option.ConnectionPassword  
           password  
           Password  to  use  against  metastore  database  
         
     
    BQQIJWFXJUIDPOHKTPO

    View full-size slide


  61. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  62. Ø  1ZUIPOغحثⳢ椚ⰻד&.3׾饯⹛׃׋ְ✲׮֮׷
    Ø  ׮׃ֻכ$FMFSZך5BTLה׃ג饯⹛׃׋ְהַ
    Ø  ׉ְֲ׏׋㜥さחכ1ZUIPOך⚥ַ׵&.3׾⢪ֲ✲
    ׮〳腉
    Ø  CPUPFNS׾ⵃ欽ׅ׷
    Ø  BXTDMJⰻַ׵⤑ⵃז6UJMJUZ׾《׏גֹג⢪ֲך׮
    ֮׶ַ׮

    View full-size slide


  63. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  64. #  -­‐*-­‐  coding:  utf-­‐8  -­‐*-­‐  
    from  datetime  import  datetime  
    from  boto.emr  import  connect_to_region  
    from  boto.emr.step  import  InstallHiveStep  
       
       
    def  setup_emr():  
           #  need  to  export  AWS_ACCESS_KEY_ID  and  AWS_SECRET_ACCESS_KEY  
           #  as  environment  variables.  
           conn  =  connect_to_region('ap-­‐northeast-­‐1')  
           install_step  =  InstallHiveStep(hive_versions='0.11.0.2')  
       
           jobid  =  conn.run_jobflow(  
                   name='Create  EMR  [{}]'.format(datetime.today().strftime('%Y%m%d')),  
                   log_uri='s3://yourbucket/jobflow_logs/',  
                   ec2_keyname='your_key',  
                   master_instance_type='m1.medium',  
                   slave_instance_type='m1.medium',  num_instances=3,  
                   action_on_failure='TERMINATE_JOB_FLOW',  keep_alive=True,  
                   enable_debugging=False,  
                   hadoop_version='2.4.0',  
                   steps=[install_step],  
                   bootstrap_actions=[],  
                   instance_groups=None,  
                   additional_info=None,  
                   ami_version='3.1.1',  
                   api_params=None,  
                   visible_to_all_users=True,  
                   job_flow_role=None)  
       
           return  jobid  
       
       
    if  __name__  ==  '__main__':  
           jobflow_id  =  setup_emr()  
           print  "JobFlowID:  {}  started.".format(jobflow_id)  

    View full-size slide


  65. Ø  "84ךؙٖرٝءٍٕכا٦أⰻחⰅ׸זְ✲
    •  橆㞮㢌侧חⰅ׸׷׮װ׭׋倯ָ葺ְ
    •  ٗ٦ٕؕوءٝדذأز׃׋ְ㜥さכ䊺׬搀׃ַ
    •  &.3׾甧׍♳־׷&$ח➰♷ׅ׷*".3PMFדⵖ䖴

    View full-size slide


  66. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN UPEPUIFGPMMPXJOH
    &YFDVUF
    )JWF2-
    &.3
    VTF

    View full-size slide


  67.  jobid  =  conn.run_jobflow(  
                   name='Create  EMR  and  Exec  hiveql  [{}]'.format(target_date),  
                   log_uri='s3://{}/jobflow_logs/'.format(bucket_name),  
                   ec2_keyname='your_key',  
                   master_instance_type='m1.medium',  
                   slave_instance_type='m1.medium',  num_instances=3,  
                   action_on_failure='TERMINATE_JOB_FLOW',  keep_alive=True,  
                   enable_debugging=False,  
                   hadoop_version='2.4.0',  
                   steps=[install_step],  
                   bootstrap_actions=[],  
                   instance_groups=None,  
                   additional_info=None,  
                   ami_version='3.1.1',  
                   api_params=None,  
                   visible_to_all_users=True,  
                   job_flow_role=None)  
       
           query_files  =  ['sample01.hql',  'sample02.hql']  
           hql_steps  =  []  
           for  query_file  in  query_files:  
                   hql_step  =  HiveStep(  
                           name='Executing  Query  [{}]'.format(query_file),  
                           hive_file='s3n://{0}/hive-­‐script/{1}'.format(  
                                   bucket_name,  query_file),  
                           hive_versions=hive_version,  
                           hive_args=['-­‐dTARGET_DATE={0}'.format(target_date),  
                                                 '-­‐dBUCKET_NAME={0}'.format(bucket_name)])  
                   hql_steps.append(hql_step)  
       
           conn.add_jobflow_steps(jobid,  hql_steps)  
    ꞿֻז׏ג׃ת׏׋ךדꨜ㔲孡׌ֽ

    View full-size slide


  68. BXTDMJ
    &YFDVUF
    )JWF2-
    &YFDVUF
    TEJTUDQ
    $POH
    :PVS&.3
    #PPUTUSQ
    1SFTUP
    $SFBUF
    $MVTUFS
    .FUBTUS
    $POH
    1ZUIPO
    4DSJQU
    $SFBUF
    $MVTUFS
    +PC'MPX
    .HNOU
    GSPN
    &YFDVUF
    )JWF2-
    &.3
    VTF UPEPUIFGPMMPXJOH

    View full-size slide


  69. Ø  غحثⳢ椚ח⣛㶷ꟼ⤘׾⡲׶׋ְ
    •  "ָ穄׻׏׋׵#ה$ず儗ח㹋遤ׅ׷ծ瘝
    •  "ה#ָ穄׻׏׋׵$׾㹋遤ׅ׷ծ瘝
    Ø  饯⹛儗꟦ך盖椚׾׮׏ה䩛鯪ח遤ְ׋ְ

    View full-size slide


  70. •  IUUQTHJUIVCDPNTQPUJGZMVJHJ
    •  1ZUIPO醡ךػ؎فٓ؎ٝ盖椚ؿٖ٦يٙ٦ؙ
    •  )BEPPQ4USFBNJOH׾ⵃ欽׃׋.BQ3FEVDFָ知⽃ח剅ֽ׷堣圓֮׶
    •  1ZUIPOך؝٦س׌ֽד⣛㶷䚍鍑寸
    •  ⣛㶷䚍〳鋔⻉ ⴽ؟٦ؽأה׃ג甧׍♳־

    •  ⣛㶷䚍〳鋔⻉خ٦ٕכ钠鏾瘝稢ְַ堣腉כ搀ְ
    •  )JWF2-ך㹋遤ח㼎䘔׃גְ׷
    •  1JHך㹋遤ח㼎䘔׃גְ׷
    •  4ך乼⡲ח㼎䘔׃ג׷
    •  植朐׌הؔ٦غ٦ٕؗ

    View full-size slide


  71. •  盖椚歗꬗כ%KBOHP׾ⵃ欽
    •  ず♧ך؟٦غדDFMFSZהDFMFSZCFBU׾饯⹛
    •  EKBOHPDFMFSZ׾ⵃ欽׃ג暴㹀ةأؙ׾暴㹀ך儗꟦חُؗ٦חⰅ׸׷״
    ֲח鏣㹀
    •  DFMFSZCFBUָُؗ٦חⰅ׏׋ةأؙ׾䭪׏ג㹋遤׃גֻ׸׷
    •  EKBOHPDFMFSZזֻג׮DFMFSZה%KBOHPכ鸬䵿דֹ׷ֽוծֿךأ؛
    آُ٦ٕ堣腉ָ⤑ⵃזךדת׌⢪׏ג׷

    View full-size slide


  72. 3FGFSFODFT
    Ø  IUUQTHJUIVCDPNBXTBXTDMJ
    •  劤㹺ך项俱הا٦أ
    Ø  IUUQTHJUIVCDPNCPUPCPUP
    •  劤㹺ך项俱הا٦أ

    View full-size slide


  73. ,BONV
    窫额⟗꟦⹫꧊⚥

    View full-size slide


  74. ת׆כֶ鑧׌ֽד׮

    View full-size slide


  75. IUUQTXXXXBOUFEMZDPNQSPKFDUT

    View full-size slide


  76. 蕯䎁➙㔐ך1Z$PO彊⪒酅鑧

    View full-size slide


  77. Ø  ⯓鹈ꆃ刑傈儗挿ד遤ֻ׵ְך.BSLEPXO
    Ø  4MJEFMFTTח䮋䨌׃״ֲה׃׋
    Ø  爡ⰻדٖؽُ٦⠓㹋倵

    View full-size slide


  78. ➂➂➂➂➂➂➂➂➂➂➂➂➂➂➂➂
    խ劤䔲ח֮׶ָהֲ׀ְׂת׃׋խ
    :?:?:?:?:?:?:?:?:?:?:?:?:?:?:

    View full-size slide


  79. Ø  ⴱ׭ג䪮遭禸ך涪邌׃׋
    Ø  ➬✲דװ׏גֹ׋✲׾תה׭׷ְְ堣⠓
    Ø  ➭ך倯׋׍ָ➬✲׃ג׷儗ח罋ִגְ׷✲׾濼׶׋ְ
    Ø  ➭ך⠓爡ך圓䧭ָז׈׉ך圓䧭׾ה׏גְ׷ךַ濼׶׋ְ

    View full-size slide


  80. (PBM
    Ø  չז׈׉ך圓䧭זךַպח搊挿׾縧ְ׋✲⢽ךⰟ剣
    Ø  չⰅꟌ⟃♳պ׾湡䭷׃׋1ZUIPO&.3崞欽倯岀ךⰟ剣
    涺ׁ׿ך鑧׾耀ֹ׋ְ 荈ⴓָ

    View full-size slide


  81. ➙㔐כ涪邌ך堣⠓׾갥ֹ

    View full-size slide


  82. ➂➂➂➂➂➂➂➂➂➂➂➂➂➂➂➂
    խ劤䔲ח֮׶ָהֲ׀ְׂת׃׋խ
    :?:?:?:?:?:?:?:?:?:?:?:?:?:?:

    View full-size slide