Upgrade to Pro — share decks privately, control downloads, hide ads and more …

実践!Go/GAE+DDDでのクローラー構築

 実践!Go/GAE+DDDでのクローラー構築

Go Conference 2017 Springでの講演スライドです。

Seiji Takahashi

March 25, 2017
Tweet

More Decks by Seiji Takahashi

Other Decks in Technology

Transcript

  1. ࣮ફ!Go/GAE+DDD
    ͰͷΫϩʔϥʔߏங
    @__timakin__
    GoConference 2017 Spring

    View Slide

  2. ࣗݾ঺հ

    View Slide

  3. ࣗݾ঺հ
    • twitter: @__timakin__
    • github: timakin
    • גࣜձࣾGunosy ৽نࣄۀ։ൃࣨ ← New!
    • ओͳGoϥΠϒϥϦ։ൃ෺
    • gopli (DBϨϓϦέʔγϣϯπʔϧ)
    • gonvert (จࣈίʔυม׵ϥΠϒϥϦ)
    • octop (githubͷissue, PRϏϡʔ༻CLIπʔϧ)
    • ts (ٕज़ɾϏδωεܥχϡʔε८ճCLIπʔϧ)

    View Slide

  4. Copyright© Gunosy Inc. All Rights Reserved 4
    Go / Python ΤϯδχΞืूத
    ▶https://gunosy.co.jp/recruit/
    Gunosy͸ɺ౦ژେֶʹ௨͏3ਓͷֶੜͷ

    ʮ৘ใΛੈքதͷਓʹ࠷దʹಧ͚͍ͨʯͱ͍͏૝͍͔Β࢝·Γ·ͨ͠ɻ
    ౦ূϚβʔζ্৔ɺ࿡ຊ໦ώϧζ΁ͷΦϑΟεҠసΛܦͯɺ

    େ͖͘੒௕͍ͯ͠ΔձࣾͰ׆༂͍ͨ͠ϝϯόʔΛืू͍ͯ͠·͢ɻ

    View Slide

  5. ΞδΣϯμ

    View Slide

  6. ΞδΣϯμ
    • Go/GAE + DDDͰ࡞ΔΫϩʔϥʔ
    • APIͷ֓ཁ
    • σΟϨΫτϦߏ੒
    • ڞ௨ॲཧɺݸผͷυϝΠϯͷৄࡉ
    • Go/GAEͰ٧·ͬͨϙΠϯτ
    • ·ͱΊ

    View Slide

  7. ࠓճͷ։ൃϓϩηε

    View Slide

  8. APIͷ֓ཁ

    View Slide

  9. APIͷ֓ཁ
    • ओͳ࢖༻ٕज़
    • GAE SDK: go version go1.6.3 (appengine-1.9.48) darwin/amd64
    • Web Framework: echo v3.0.3
    • Vendoring: dep
    • ػೳཁ݅
    • Facebook, Twitter౳ͷAPI & ΢Σϒϖʔδ͔ΒίϯςϯπΛऔಘ
    • ͋Β͔͡ΊऔಘઌͷީิΛDBʹొ࿥͓͖ͯ͠ɺcronδϣϒͰదٓΫϩʔϧ
    • λΠτϧɺbodyɺαϜωΠϧ౳ͷڞ௨ϓϩύςΟʹ֨ೲ
    • ҎલΫϩʔϧͨ͠λΠϛϯά͔Βߋ৽͕ͳ͔ͬͨΒಡΈࠐ·ͳ͍
    • Ϋϩʔϧ݁Ռ͸JSONΦϒδΣΫτʹ·ͱΊͯ࠷ޙʹS3ʹΞοϓϩʔυ͢Δ
    • ը૾ͷՃ޻ͱ͔͸ͳ͠ɻ͍͔ͭإೝࣝͱ͔γϡοͱ΍Γ͍ͨɻ

    View Slide

  10. ίϯςΩετϚοϓ(DDD)
    ఆظऩू
    Ϋϩʔϧδϣϒͷ
    ൃՐ
    ίϯςϯπऔಘ
    औಘઌͷ؅ཧ
    ίϯςϯπऔಘ
    ʢੜͷϨεϙϯεʣ
    ಺༰ͷՃ޻
    औಘͨ͠৘ใΛՃ޻ͯ͠+40/༻ͷ
    ύϥϝʔλʔʹม׵͢Δ
    อଘ
    ετϨʔδ 4
    ʹΞοϓϩʔυ
    A B
    calls
    Worker
    Fetcher
    Parser
    Uploader

    View Slide

  11. σΟϨΫτϦߏ੒

    View Slide

  12. σΟϨΫτϦߏ੒
    GAEͷίϯςΩετੜ੒
    ϧʔςΟϯά
    GAEͷδϣϒఆٛ
    ίΞػೳͷ࣮૷෦෼
    ґଘύοέʔδ
    GAEϏϧυͷ౎߹্ىಈεΫϦϓτͱ࣮૷ɺvendorσΟϨΫτϦ͸ผʹ෼͚Δ
    (package໊ͷিಥɺunsafe౳ͷඇਪ঑ػೳͱ͔ͰΞϥʔτ͕ग़Δ)

    View Slide

  13. σΟϨΫτϦߏ੒
    ΞϓϦ಺ͷڞ௨ઃఆ
    Λѻ͏Օॴʢޙड़ʣ
    ֤υϝΠϯͷ࣮૷
    DBΞΫηεΛ୲͏
    ϦϙδτϦ܈
    Contextઃఆ

    View Slide

  14. DDDߏ੒ʹͯ͠Α͔ͬͨͱ͜Ζ
    • ࣮ݱ͍ͨ͠ϏδωεϩδοΫ୯ҐͰ

    package໊͕౷Ұ͞ΕΔͷͰɺ

    package໊ͷিಥ͕ى͜Γʹ͘͘ͳͬͨɻ
    • Ϧιʔεͱ͍͏ʮϞϊʯͰ͸ͳ͘ɺ

    υϝΠϯͱ͍͏ʮߦҝʯʹ஫໨ͯ͠

    ίʔυΛॻ͘͜ͱͰɺʮ͍ͭ͜͸ԿΛͯ͘͠ΕΔ΍ͭͳͷʁʯ

    ͱ͍͏ٙ໰ͷղ͕͙͢ʹಘΒΕͯɺ୯७ʹಡΈਐΊ΍͘͢ͳͬͨɻ
    • repositoryΛ੾Γ཭͢ͱɺ࣮૷ͱσʔλΞΫηεͷؔ܎͕

    ૄʹͳΓɺϝϯςφϯε͠΍͔ͬͨ͢ɻ

    (CloudSQL͔ΒDataStore΁ͷҠߦͱ͔͠΍͔ͬͨ͢)

    View Slide

  15. ڞ௨ॲཧ
    ݸผͷυϝΠϯͷৄࡉ

    View Slide

  16. ࣮૷ɿڞ௨ॲཧ
    GAEϏϧυʹඞཁͳechoͷίʔυ͸ҎԼΛࢀߟʹ͍ͯͩ͘͠͞ɻ
    https://echo.labstack.com/cookbook/google-app-engine

    View Slide

  17. ࣮૷ɿڞ௨ॲཧ
    GAEϏϧυʹඞཁͳechoͷίʔυ͸ҎԼΛࢀߟʹ͍ͯͩ͘͠͞ɻ
    https://echo.labstack.com/cookbook/google-app-engine
    ʁ

    View Slide

  18. ࣮૷ɿڞ௨ॲཧ
    GAEͷίϯςΩετΛɺ
    ɾtimeout limitΛઃఆ
    ɾίϯςΩετʹrepositoryͷΠϯελϯεΛ࣋ͨͤΔ
    ͱ͍͏͜ͱΛ্ͨ͠ͰɺechoͷΧελϜίϯςΩετͱͯ͠ઃఆ͢Δ

    View Slide

  19. ࣮૷ɿڞ௨ॲཧ
    echo.Contextɺඪ४ύοέʔδͷcontext.Context͡Όͳ͍…
    echo.ContextΛར༻ͨ͠GrawlerCtxͱ͍͏ΧελϜίϯςΩετΛ࡞੒ͯ͠ɺ
    Contextͷ1ม਺ͱͯ͠AppEngineCtxΛ࣋ͭɻ

    View Slide

  20. Fetcherɿίϯςϯπऔಘઌొ࿥
    ϦΫΤετύϥϝʔλʔΛ΋ͱʹɺ”Entity”Λ࡞ͬͯɺ
    ίϯςΩετ಺෦ͷrepository͔ΒɺίϯςϯπऔಘઌΛొ࿥͢Δ

    View Slide

  21. ิ଍ɿrepositoryͷར༻ํ๏
    ΧελϜίϯςΫετͷத͔ΒAppEngineͷContextΛऔಘ͠ɺ
    ͦͷContext͕࣋ͭFetcher(·ͨ͸ଞͷυϝΠϯͷ)
    DB΁ͷΞΫηαϝιουΛݺͼग़͢

    View Slide

  22. ิ଍ɿrepositoryͷ಺෦
    Entity(ࣝผࢠΛ࣋ͬͨσʔλߏ଄)Λࡐྉʹͯ͠σʔλΞΫηε͢Δɻ
    ݸผͷϦϙδτϦΛ
    ೖΕࢠͰࢀরͯ͠ɺ
    ୯ҰίϯςΩετ
    ͔ΒશϦϙδτϦʹ
    ΞΫηεͰ͖Δ
    Α͏ʹ͢Δɻ

    View Slide

  23. ิ଍ɿrepositoryͷ಺෦
    Entity(ࣝผࢠΛ࣋ͬͨσʔλߏ଄)Λࡐྉʹͯ͠σʔλΞΫηε͢Δɻ

    View Slide

  24. FetcherɿΠϯλʔϑΣʔε

    View Slide

  25. FetcherɿΠϯλʔϑΣʔε

    View Slide

  26. FetcherɿΠϯλʔϑΣʔε
    ϨεϙϯεΛ
    ͦͷ··interface{}
    ͱͯ͠ฦ͢
    ֎෦΁ͷϦΫΤετ͸
    appengine/urlfetchΛ
    ࢖͏

    View Slide

  27. ParserɿΠϯλʔϑΣʔε
    HTMLύʔαʔͱ͔͸ɺαΠτʹΑͬͯେ෯ʹparseͷํ͕ࣜҟͳΔɻ
    ͦͷͨΊɺParserFactory.CreateΛܦ༝ͯ͠ɺParseServiceΛ࡞੒͢Δ
    fetcherInstance͸ϖʔδϯά࣌ͷϦΫΤετ༻

    View Slide

  28. ParserɿHTMLύʔαͷఆٛ
    FactoryͰͲͷ
    ύʔαʔΛ࡞੒͢Δ͔
    ಛఆ͢Δͱ͖ʹ࢖͏
    ʢεϥΠυͰ͸
    ౎߹্໊લΛมߋͯ͠·͢ʣ
    appengineͷόʔδϣϯ
    ͷ౎߹্ɺcontext.Context͸
    x/netͷ΋ͷΛ࢖͏ɻ

    View Slide

  29. ParserɿParserFactory
    ࢦఆ͞ΕͨparserKeyʹԠͯ͡ParseServiceΛ࡞੒ɻ
    FactoryType͸ߏ଄ମΛ࡞Δ্Ͱக͠ํͳ͘༻ҙ͍ͯͯ͠ɺ
    ”html”ͱ͍͏จࣈྻΛೖΕͯΔ͚ͩɻ

    View Slide

  30. ParserɿParse࣮૷Օॴ
    ίϯςΫετΛड͚औͬͯFetch͠ɺͦͷ݁ՌΛparse͢Δɻ
    ஫) ຊ౰͸ParserʹFetcherΛ౉͢ͷ͸ݏͩͬͨͷͰ͕͢ɺ
    ɹ ϖʔδϯά౳ͰͲ͏ͯ͠΋ඞཁͩͬͨͷͰɺFetcherΛ
    ɹ ࣋ͨͤΔ͜ͱʹ͠·ͨ͠ɻͭΒ͍ʂ

    FacebookParserͱ͔͸ϖʔδϯά͍Βͳ͍ͷͰɺ
    ׬શʹFetcherͱParser͕෼཭͍ͯ͠·͢ɻ

    View Slide

  31. ParserɿParse࣮૷Օॴ
    yhat/scrapeΛ࢖ͬͯ
    ཁૉΛऔΓग़͢ɻ
    goqueryΈ͍ͨʹ
    ίʔϧόοΫͷॻ͖ํ
    ͠ͳ͍͠γϯϓϧͰ͢ɻ
    ྫ) aλά͕࢖ΘΕͯͯɺ
    ਌͕h1λάɺͦͯ͠
    ֘౰NodeͷΫϥε͕
    “skin-entryTitle”Ͱ͋Δ
    ΋ͷ͸ɺ”title”ͱͯ͠
    ύʔε͢Δ

    View Slide

  32. ParserɿParse࣮૷Օॴ
    ϒϩάهࣄҰཡ͔Βɺ
    هࣄৄࡉϖʔδͷ಺༰Λ
    ࠶౓Fetch
    ͦͷதͷTitleཁૉͱ͔Λ
    औಘ͢Δॲཧ

    View Slide

  33. UploaderɿΠϯλʔϑΣʔε
    JSONΞοϓϩʔυ༻ͷڞ௨σʔλߏ଄ʹ

    parse݁ՌΛ٧Ίͨ΋ͷʢ[]feeditem.FeedItemʣΛ౉ͯ͠ɺ
    ಛఆετϨʔδʹΞοϓϩʔυ͢Δ

    View Slide

  34. UploaderɿॳظԽॲཧ(ྫ: S3)
    S3ΞοϓϩʔμͷॳظԽॲཧɻ
    γʔΫϨοτͳ৘ใΛ౉ͯ͠ɺS3ΞΫηε༻ͷΦϒδΣΫτΛ࡞੒

    View Slide

  35. Uploaderɿ࣮૷Օॴ
    ͜͜Ͱ΋appengine/urlfetch
    Ͱੜ੒ͨ͠HTTPClientΛ
    ઃఆ͠ͳ͍ͱɺϦΫΤετ͕௨Βͳ͍
    ॗʑͱbody΍Β
    credentialsΛઃఆͯ͠ɺ
    S3ClientΛ࡞੒
    ͜ΕҎ߱͸requestΛ
    send()͢Δ͚ͩͳͷͰ
    লུ

    View Slide

  36. Workerɿϋϯυϥ
    GAEͷcronͰୟ͘ΤϯυϙΠϯτͷϋϯυϥΛ༻ҙ͢Δ
    workerΛ࡞ͬͯɺCrawlϝιουΛݺͿ͚ͩɻ

    View Slide

  37. Workerɿϋϯυϥ
    GAEͷcronͰୟ͘ΤϯυϙΠϯτͷϋϯυϥΛ༻ҙ͢Δ
    workerΛ࡞ͬͯɺCrawlϝιουΛݺͿ͚ͩɻ

    View Slide

  38. Workerɿ࣮૷Օॴ
    fetch -> parse·Ͱɻ
    ΈΜͳେ޷͖goroutineɻ
    sync/errgroupΛ࢖͑͹ɺ
    ΤϥʔϋϯυϦϯά͠΍ͯ͘͢
    ͓͢͢ΊͰ͢ɻ

    View Slide

  39. Workerɿ࣮૷Օॴ
    context.ContextΛड͚औͬͯΔͷͰɺ
    δϣϒ࣮ߦதʹλΠϜΞ΢τͨ͠Β
    ΤϥʔΛฦ͢ɻ
    Ξοϓϩʔυॲཧ
    Ξοϓϩʔυ͕

    ׬ྃͨ͠Βɺ
    ࠷ऴऩू࣌ࠁΛߋ৽

    View Slide

  40. cron
    cron.yamlͱ͍͏ͷΛ༻ҙ͢Δͱɺ
    ಛఆͷΤϯυϙΠϯτʹܾ·ͬͨස౓ͰGETϦΫΤετΛૹͬͯ͘ΕΔͷͰɺ
    ͜ΕΛ࢖ͬͯworkerͷΤϯυϙΠϯτΛୟ͘

    View Slide

  41. Go/GAEͰ٧·ͬͨϙΠϯτ

    View Slide

  42. σΟϨΫτϦߏ੒ͷݟ௚͠
    • ͍ܰؾ࣋ͪͰGAEࢼͯ͠ΈΑ͏ͱɺ

    ॳظσϓϩΠΛ͠ͳ͍··ਐΊͯͨΒɺ

    ͋ͱ͋ͱมߋ͕͍ͬͺ͍ೖΔɻ
    • ಛʹvendoringπʔϧΛ࢖͏ͱɺ

    package໊িಥͱ͍͏ΫϦςΟΧϧͳॴͰ

    Ϗϧυ͕௨Βͳ͍ͷͰɺGAE্ʹσϓϩΠ͢ΔͳΒ

    ࠷ॳ͔Β༷ࣜʹ߹ΘͤΔɻ

    View Slide

  43. View Slide

  44. ൿີ৘ใͷ؅ཧ
    • access tokenͱ͔ΛͲ͜ʹஔ͔͘ɻ
    • ࠷ॳtomlͰ؅ཧͯͨ͠Μ͚ͩͲɺgoapp deployͨ͠Β

    srcҎԼͷtoml͕ফ໓͢ΔɻࠔΔɻ
    • app.yamlͷenvʹઃఆ͢Δͱ͍͏ํ๏΋͋Δ͕ɺ

    gitignoreͰ͖ͳ͍ͷͰौ͍ɻ
    • ConfigurationRepositoryΛ࡞ͬͯɺDataStore্ʹ

    อଘ͢Δͱ͍͏ํ਑Λͱͬͨɻɹ

    View Slide

  45. ൿີ৘ใͷ؅ཧ

    View Slide

  46. CloudSQL or DataStore
    • ࠷ॳCloudSQL(gormܦ༝)Λར༻͍͕ͯͨ͠ɺ

    ͳʹ΍Β

    ͕͔͔͍ͬͯΔɻ

    ͜ͷAPIࣗମ΁ͷϦΫΤετ͸ଟ͘ͳ͍͸ͣͳͷͰɺ

    খن໛Ͱ͔͔ۚΔͷ͸ौ͍ɻ
    • ConfigurationRepositoryΛ࡞ΔλΠϛϯάͰɺ

    શ෦ετϨʔδΛDataStore΁Ҡߦͨ͠ɻ
    • RepositoryΛ෼཭͍ͯ͠Ε͹Ҡߦ΋ָͩ͠ɺ

    ίϯιʔϧ͔ΒΤϯςΟςΟ৘ใݟΕΔͷͰ

    DataStoreͷํ͕Αͦ͞͏ɻ

    View Slide

  47. before
    after

    View Slide

  48. ·ͱΊ
    • ΫϩʔϥʔΛ࡞Γ·ͨ͠ɻ
    • DDD͸ݟ௨͕͠Α͘ɺpackage໊ͷিಥ΋ආ͚ΒΕͯΑ͍ɻ
    • ಛʹɺͪΌΜͱinterfaceΛఆٛ͢Ε͹ɺ

    υϝΠϯ͝ͱͷίʔυͷՄಡੑ͕ඈ༂తʹ্͕ΔͷͰɺ

    Goͱ૬ੑ͕͍͍ͱࢥ͏ɻ
    • GAE͸σΟϨΫτϦߏ੒౳ͷ໘ͰΫη΋͋Γ·͕͢ɺ

    cron΍ΒDataStore΍ΒͰԸܙ͕͋ΔͷͰɺҰ୴؀ڥΛ

    ߏங͢Δͱେมศརɻඪ४ύοέʔδͷcontext͕

    ར༻Ͱ͖ΔΑ͏ʹͳΔͱ͍͍ͳɻ

    View Slide

  49. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ
    twitter: @__timakin__
    github: timakin

    View Slide