実践!Go/GAE+DDDでのクローラー構築

 実践!Go/GAE+DDDでのクローラー構築

Go Conference 2017 Springでの講演スライドです。

5bfed9aa3a9ebccb0c0f0cb65ee9e012?s=128

Seiji Takahashi

March 25, 2017
Tweet

Transcript

  1. ࣮ફ!Go/GAE+DDD ͰͷΫϩʔϥʔߏங @__timakin__ GoConference 2017 Spring

  2. ࣗݾ঺հ

  3. ࣗݾ঺հ • twitter: @__timakin__ • github: timakin • גࣜձࣾGunosy ৽نࣄۀ։ൃࣨ

    ← New! • ओͳGoϥΠϒϥϦ։ൃ෺ • gopli (DBϨϓϦέʔγϣϯπʔϧ) • gonvert (จࣈίʔυม׵ϥΠϒϥϦ) • octop (githubͷissue, PRϏϡʔ༻CLIπʔϧ) • ts (ٕज़ɾϏδωεܥχϡʔε८ճCLIπʔϧ)
  4. Copyright© Gunosy Inc. All Rights Reserved 4 Go / Python

    ΤϯδχΞืूத ▶https://gunosy.co.jp/recruit/ Gunosy͸ɺ౦ژେֶʹ௨͏3ਓͷֶੜͷ
 ʮ৘ใΛੈքதͷਓʹ࠷దʹಧ͚͍ͨʯͱ͍͏૝͍͔Β࢝·Γ·ͨ͠ɻ ౦ূϚβʔζ্৔ɺ࿡ຊ໦ώϧζ΁ͷΦϑΟεҠసΛܦͯɺ
 େ͖͘੒௕͍ͯ͠ΔձࣾͰ׆༂͍ͨ͠ϝϯόʔΛืू͍ͯ͠·͢ɻ
  5. ΞδΣϯμ

  6. ΞδΣϯμ • Go/GAE + DDDͰ࡞ΔΫϩʔϥʔ • APIͷ֓ཁ • σΟϨΫτϦߏ੒ •

    ڞ௨ॲཧɺݸผͷυϝΠϯͷৄࡉ • Go/GAEͰ٧·ͬͨϙΠϯτ • ·ͱΊ
  7. ࠓճͷ։ൃϓϩηε

  8. APIͷ֓ཁ

  9. APIͷ֓ཁ • ओͳ࢖༻ٕज़ • GAE SDK: go version go1.6.3 (appengine-1.9.48)

    darwin/amd64 • Web Framework: echo v3.0.3 • Vendoring: dep • ػೳཁ݅ • Facebook, Twitter౳ͷAPI & ΢Σϒϖʔδ͔ΒίϯςϯπΛऔಘ • ͋Β͔͡ΊऔಘઌͷީิΛDBʹొ࿥͓͖ͯ͠ɺcronδϣϒͰదٓΫϩʔϧ • λΠτϧɺbodyɺαϜωΠϧ౳ͷڞ௨ϓϩύςΟʹ֨ೲ • ҎલΫϩʔϧͨ͠λΠϛϯά͔Βߋ৽͕ͳ͔ͬͨΒಡΈࠐ·ͳ͍ • Ϋϩʔϧ݁Ռ͸JSONΦϒδΣΫτʹ·ͱΊͯ࠷ޙʹS3ʹΞοϓϩʔυ͢Δ • ը૾ͷՃ޻ͱ͔͸ͳ͠ɻ͍͔ͭإೝࣝͱ͔γϡοͱ΍Γ͍ͨɻ
  10. ίϯςΩετϚοϓ(DDD) ఆظऩू Ϋϩʔϧδϣϒͷ ൃՐ ίϯςϯπऔಘ औಘઌͷ؅ཧ ίϯςϯπऔಘ ʢੜͷϨεϙϯεʣ ಺༰ͷՃ޻ औಘͨ͠৘ใΛՃ޻ͯ͠+40/༻ͷ

    ύϥϝʔλʔʹม׵͢Δ อଘ ετϨʔδ 4 ʹΞοϓϩʔυ A B calls Worker Fetcher Parser Uploader
  11. σΟϨΫτϦߏ੒

  12. σΟϨΫτϦߏ੒ GAEͷίϯςΩετੜ੒ ϧʔςΟϯά GAEͷδϣϒఆٛ ίΞػೳͷ࣮૷෦෼ ґଘύοέʔδ GAEϏϧυͷ౎߹্ىಈεΫϦϓτͱ࣮૷ɺvendorσΟϨΫτϦ͸ผʹ෼͚Δ (package໊ͷিಥɺunsafe౳ͷඇਪ঑ػೳͱ͔ͰΞϥʔτ͕ग़Δ)

  13. σΟϨΫτϦߏ੒ ΞϓϦ಺ͷڞ௨ઃఆ Λѻ͏Օॴʢޙड़ʣ ֤υϝΠϯͷ࣮૷ DBΞΫηεΛ୲͏ ϦϙδτϦ܈ Contextઃఆ

  14. DDDߏ੒ʹͯ͠Α͔ͬͨͱ͜Ζ • ࣮ݱ͍ͨ͠ϏδωεϩδοΫ୯ҐͰ
 package໊͕౷Ұ͞ΕΔͷͰɺ
 package໊ͷিಥ͕ى͜Γʹ͘͘ͳͬͨɻ • Ϧιʔεͱ͍͏ʮϞϊʯͰ͸ͳ͘ɺ
 υϝΠϯͱ͍͏ʮߦҝʯʹ஫໨ͯ͠
 ίʔυΛॻ͘͜ͱͰɺʮ͍ͭ͜͸ԿΛͯ͘͠ΕΔ΍ͭͳͷʁʯ
 ͱ͍͏ٙ໰ͷղ͕͙͢ʹಘΒΕͯɺ୯७ʹಡΈਐΊ΍͘͢ͳͬͨɻ

    • repositoryΛ੾Γ཭͢ͱɺ࣮૷ͱσʔλΞΫηεͷؔ܎͕
 ૄʹͳΓɺϝϯςφϯε͠΍͔ͬͨ͢ɻ
 (CloudSQL͔ΒDataStore΁ͷҠߦͱ͔͠΍͔ͬͨ͢)
  15. ڞ௨ॲཧ ݸผͷυϝΠϯͷৄࡉ

  16. ࣮૷ɿڞ௨ॲཧ GAEϏϧυʹඞཁͳechoͷίʔυ͸ҎԼΛࢀߟʹ͍ͯͩ͘͠͞ɻ https://echo.labstack.com/cookbook/google-app-engine

  17. ࣮૷ɿڞ௨ॲཧ GAEϏϧυʹඞཁͳechoͷίʔυ͸ҎԼΛࢀߟʹ͍ͯͩ͘͠͞ɻ https://echo.labstack.com/cookbook/google-app-engine ʁ

  18. ࣮૷ɿڞ௨ॲཧ GAEͷίϯςΩετΛɺ ɾtimeout limitΛઃఆ ɾίϯςΩετʹrepositoryͷΠϯελϯεΛ࣋ͨͤΔ ͱ͍͏͜ͱΛ্ͨ͠ͰɺechoͷΧελϜίϯςΩετͱͯ͠ઃఆ͢Δ

  19. ࣮૷ɿڞ௨ॲཧ echo.Contextɺඪ४ύοέʔδͷcontext.Context͡Όͳ͍… echo.ContextΛར༻ͨ͠GrawlerCtxͱ͍͏ΧελϜίϯςΩετΛ࡞੒ͯ͠ɺ Contextͷ1ม਺ͱͯ͠AppEngineCtxΛ࣋ͭɻ

  20. Fetcherɿίϯςϯπऔಘઌొ࿥ ϦΫΤετύϥϝʔλʔΛ΋ͱʹɺ”Entity”Λ࡞ͬͯɺ ίϯςΩετ಺෦ͷrepository͔ΒɺίϯςϯπऔಘઌΛొ࿥͢Δ

  21. ิ଍ɿrepositoryͷར༻ํ๏ ΧελϜίϯςΫετͷத͔ΒAppEngineͷContextΛऔಘ͠ɺ ͦͷContext͕࣋ͭFetcher(·ͨ͸ଞͷυϝΠϯͷ) DB΁ͷΞΫηαϝιουΛݺͼग़͢

  22. ิ଍ɿrepositoryͷ಺෦ Entity(ࣝผࢠΛ࣋ͬͨσʔλߏ଄)Λࡐྉʹͯ͠σʔλΞΫηε͢Δɻ ݸผͷϦϙδτϦΛ ೖΕࢠͰࢀরͯ͠ɺ ୯ҰίϯςΩετ ͔ΒશϦϙδτϦʹ ΞΫηεͰ͖Δ Α͏ʹ͢Δɻ

  23. ิ଍ɿrepositoryͷ಺෦ Entity(ࣝผࢠΛ࣋ͬͨσʔλߏ଄)Λࡐྉʹͯ͠σʔλΞΫηε͢Δɻ

  24. FetcherɿΠϯλʔϑΣʔε

  25. FetcherɿΠϯλʔϑΣʔε

  26. FetcherɿΠϯλʔϑΣʔε ϨεϙϯεΛ ͦͷ··interface{} ͱͯ͠ฦ͢ ֎෦΁ͷϦΫΤετ͸ appengine/urlfetchΛ ࢖͏

  27. ParserɿΠϯλʔϑΣʔε HTMLύʔαʔͱ͔͸ɺαΠτʹΑͬͯେ෯ʹparseͷํ͕ࣜҟͳΔɻ ͦͷͨΊɺParserFactory.CreateΛܦ༝ͯ͠ɺParseServiceΛ࡞੒͢Δ fetcherInstance͸ϖʔδϯά࣌ͷϦΫΤετ༻

  28. ParserɿHTMLύʔαͷఆٛ FactoryͰͲͷ ύʔαʔΛ࡞੒͢Δ͔ ಛఆ͢Δͱ͖ʹ࢖͏ ʢεϥΠυͰ͸ ౎߹্໊લΛมߋͯ͠·͢ʣ appengineͷόʔδϣϯ ͷ౎߹্ɺcontext.Context͸ x/netͷ΋ͷΛ࢖͏ɻ

  29. ParserɿParserFactory ࢦఆ͞ΕͨparserKeyʹԠͯ͡ParseServiceΛ࡞੒ɻ FactoryType͸ߏ଄ମΛ࡞Δ্Ͱக͠ํͳ͘༻ҙ͍ͯͯ͠ɺ ”html”ͱ͍͏จࣈྻΛೖΕͯΔ͚ͩɻ

  30. ParserɿParse࣮૷Օॴ ίϯςΫετΛड͚औͬͯFetch͠ɺͦͷ݁ՌΛparse͢Δɻ ஫) ຊ౰͸ParserʹFetcherΛ౉͢ͷ͸ݏͩͬͨͷͰ͕͢ɺ ɹ ϖʔδϯά౳ͰͲ͏ͯ͠΋ඞཁͩͬͨͷͰɺFetcherΛ ɹ ࣋ͨͤΔ͜ͱʹ͠·ͨ͠ɻͭΒ͍ʂ
 FacebookParserͱ͔͸ϖʔδϯά͍Βͳ͍ͷͰɺ ׬શʹFetcherͱParser͕෼཭͍ͯ͠·͢ɻ

  31. ParserɿParse࣮૷Օॴ yhat/scrapeΛ࢖ͬͯ ཁૉΛऔΓग़͢ɻ goqueryΈ͍ͨʹ ίʔϧόοΫͷॻ͖ํ ͠ͳ͍͠γϯϓϧͰ͢ɻ ྫ) aλά͕࢖ΘΕͯͯɺ ਌͕h1λάɺͦͯ͠ ֘౰NodeͷΫϥε͕

    “skin-entryTitle”Ͱ͋Δ ΋ͷ͸ɺ”title”ͱͯ͠ ύʔε͢Δ
  32. ParserɿParse࣮૷Օॴ ϒϩάهࣄҰཡ͔Βɺ هࣄৄࡉϖʔδͷ಺༰Λ ࠶౓Fetch ͦͷதͷTitleཁૉͱ͔Λ औಘ͢Δॲཧ

  33. UploaderɿΠϯλʔϑΣʔε JSONΞοϓϩʔυ༻ͷڞ௨σʔλߏ଄ʹ
 parse݁ՌΛ٧Ίͨ΋ͷʢ[]feeditem.FeedItemʣΛ౉ͯ͠ɺ ಛఆετϨʔδʹΞοϓϩʔυ͢Δ

  34. UploaderɿॳظԽॲཧ(ྫ: S3) S3ΞοϓϩʔμͷॳظԽॲཧɻ γʔΫϨοτͳ৘ใΛ౉ͯ͠ɺS3ΞΫηε༻ͷΦϒδΣΫτΛ࡞੒

  35. Uploaderɿ࣮૷Օॴ ͜͜Ͱ΋appengine/urlfetch Ͱੜ੒ͨ͠HTTPClientΛ ઃఆ͠ͳ͍ͱɺϦΫΤετ͕௨Βͳ͍ ॗʑͱbody΍Β credentialsΛઃఆͯ͠ɺ S3ClientΛ࡞੒ ͜ΕҎ߱͸requestΛ send()͢Δ͚ͩͳͷͰ লུ

  36. Workerɿϋϯυϥ GAEͷcronͰୟ͘ΤϯυϙΠϯτͷϋϯυϥΛ༻ҙ͢Δ workerΛ࡞ͬͯɺCrawlϝιουΛݺͿ͚ͩɻ

  37. Workerɿϋϯυϥ GAEͷcronͰୟ͘ΤϯυϙΠϯτͷϋϯυϥΛ༻ҙ͢Δ workerΛ࡞ͬͯɺCrawlϝιουΛݺͿ͚ͩɻ

  38. Workerɿ࣮૷Օॴ fetch -> parse·Ͱɻ ΈΜͳେ޷͖goroutineɻ sync/errgroupΛ࢖͑͹ɺ ΤϥʔϋϯυϦϯά͠΍ͯ͘͢ ͓͢͢ΊͰ͢ɻ

  39. Workerɿ࣮૷Օॴ context.ContextΛड͚औͬͯΔͷͰɺ δϣϒ࣮ߦதʹλΠϜΞ΢τͨ͠Β ΤϥʔΛฦ͢ɻ Ξοϓϩʔυॲཧ Ξοϓϩʔυ͕
 ׬ྃͨ͠Βɺ ࠷ऴऩू࣌ࠁΛߋ৽

  40. cron cron.yamlͱ͍͏ͷΛ༻ҙ͢Δͱɺ ಛఆͷΤϯυϙΠϯτʹܾ·ͬͨස౓ͰGETϦΫΤετΛૹͬͯ͘ΕΔͷͰɺ ͜ΕΛ࢖ͬͯworkerͷΤϯυϙΠϯτΛୟ͘

  41. Go/GAEͰ٧·ͬͨϙΠϯτ

  42. σΟϨΫτϦߏ੒ͷݟ௚͠ • ͍ܰؾ࣋ͪͰGAEࢼͯ͠ΈΑ͏ͱɺ
 ॳظσϓϩΠΛ͠ͳ͍··ਐΊͯͨΒɺ
 ͋ͱ͋ͱมߋ͕͍ͬͺ͍ೖΔɻ • ಛʹvendoringπʔϧΛ࢖͏ͱɺ
 package໊িಥͱ͍͏ΫϦςΟΧϧͳॴͰ
 Ϗϧυ͕௨Βͳ͍ͷͰɺGAE্ʹσϓϩΠ͢ΔͳΒ
 ࠷ॳ͔Β༷ࣜʹ߹ΘͤΔɻ

  43. None
  44. ൿີ৘ใͷ؅ཧ • access tokenͱ͔ΛͲ͜ʹஔ͔͘ɻ • ࠷ॳtomlͰ؅ཧͯͨ͠Μ͚ͩͲɺgoapp deployͨ͠Β
 srcҎԼͷtoml͕ফ໓͢ΔɻࠔΔɻ • app.yamlͷenvʹઃఆ͢Δͱ͍͏ํ๏΋͋Δ͕ɺ


    gitignoreͰ͖ͳ͍ͷͰौ͍ɻ • ConfigurationRepositoryΛ࡞ͬͯɺDataStore্ʹ
 อଘ͢Δͱ͍͏ํ਑Λͱͬͨɻɹ
  45. ൿີ৘ใͷ؅ཧ

  46. CloudSQL or DataStore • ࠷ॳCloudSQL(gormܦ༝)Λར༻͍͕ͯͨ͠ɺ
 ͳʹ΍Β ͕͔͔͍ͬͯΔɻ
 ͜ͷAPIࣗମ΁ͷϦΫΤετ͸ଟ͘ͳ͍͸ͣͳͷͰɺ
 খن໛Ͱ͔͔ۚΔͷ͸ौ͍ɻ •

    ConfigurationRepositoryΛ࡞ΔλΠϛϯάͰɺ
 શ෦ετϨʔδΛDataStore΁Ҡߦͨ͠ɻ • RepositoryΛ෼཭͍ͯ͠Ε͹Ҡߦ΋ָͩ͠ɺ
 ίϯιʔϧ͔ΒΤϯςΟςΟ৘ใݟΕΔͷͰ
 DataStoreͷํ͕Αͦ͞͏ɻ
  47. before after

  48. ·ͱΊ • ΫϩʔϥʔΛ࡞Γ·ͨ͠ɻ • DDD͸ݟ௨͕͠Α͘ɺpackage໊ͷিಥ΋ආ͚ΒΕͯΑ͍ɻ • ಛʹɺͪΌΜͱinterfaceΛఆٛ͢Ε͹ɺ
 υϝΠϯ͝ͱͷίʔυͷՄಡੑ͕ඈ༂తʹ্͕ΔͷͰɺ
 Goͱ૬ੑ͕͍͍ͱࢥ͏ɻ •

    GAE͸σΟϨΫτϦߏ੒౳ͷ໘ͰΫη΋͋Γ·͕͢ɺ
 cron΍ΒDataStore΍ΒͰԸܙ͕͋ΔͷͰɺҰ୴؀ڥΛ
 ߏங͢Δͱେมศརɻඪ४ύοέʔδͷcontext͕
 ར༻Ͱ͖ΔΑ͏ʹͳΔͱ͍͍ͳɻ
  49. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ twitter: @__timakin__ github: timakin