$30 off During Our Annual Pro Sale. View Details »

ヘッドレスChromeでクローラを作った後の話

 ヘッドレスChromeでクローラを作った後の話

yujiosaka

March 20, 2018
Tweet

More Decks by yujiosaka

Other Decks in Technology

Transcript

 1. ϔουϨεChromeͰ
  ΫϩʔϥΛ࡞ͬͨ”ޙͷ”࿩
  Roppongi.js #1
  Yuji Isobe

  View Slide

 2. Yuji Isobe @yujiosaka
  ϓϩδΣΫτϚωʔδϟʔ at


  https://speakerdeck.com/yujiosaka/hitasurale-sitedeipuraningu

  View Slide

 3. ΫϩʔϥΛ࡞ͬͨ࣌ͷ࿩
  https://speakerdeck.com/yujiosaka/an-gazui-chu-nihetudoresuchromedekurorazuo-tuta-shi-ninannekana

  View Slide

 4. ϔουϨεChromeͱ͸
  ✓ Chrome͕ϔουϨεϞʔυͰىಈͰ͖Δ
  ✓ ChromeͷىಈΦϓγϣϯʹʮ--headessʯΛՃ͑Δ͚ͩ
  ✓ ୅දతͳϔουϨεϒϥ΢βͱ͍͑͹PhantomJS
  ✓ ߴ଎Ͱ҆ఆͯ͠ಈ࡞͢Δ
  ✓ ඪ४΁ͷରԠ͕ૣ͍ʢES2017΍Async-Await͕࢖͑Δʣ
  ✓ ओͳ༻్͸ςετࣗಈԽͱΫϩʔϥ

  View Slide

 5. Headless Chrome Crawler
  ✓ ϔουϨεChromeͰΫϩʔϥ
  ✓ ෼ࢄ؀ڥͰಈ࡞͢Δ
  ✓ ਂ͞༏ઌ୳ࡧʢDFSʣͱ

  ෯༏ઌ୳ࡧʢBFSʣΛαϙʔτ
  ✓ robots.txt, sitemap.xmlʹै͏
  ✓ Puppeteerʹґଘ
  ✓ Node.jsʢJavaScriptʣ੡
  https://github.com/yujiosaka/headless-chrome-crawler

  View Slide

 6. ࠓ೔͸࡞ͬͨ”ޙͷ”࿩

  View Slide

 7. GitHub > Insights > Traffic
  https://twitter.com/yujiosaka/status/967316514322890752

  View Slide

 8. GitHub Trending Repositories
  ͜ͷลΛ2-3೔ؒ
  ͏Ζ͍ͭͯͨ
  https://github.com/trending

  View Slide

 9. Hit 2000 Stars in 7 days
  https://github.com/yujiosaka/headless-chrome-crawler
  > 2000

  View Slide

 10. ເ͕׎ͬͨ

  View Slide

 11. ظ଴͍ͯͨ͠ϝϦοτ
  ✓ ඼࣭ͷ޲্
  ✓ ։ൃྗͷ޲্
  ✓ ϒϥϯυ޲্
  ✓ ࢓ࣄʹͭͳ͕Δ ୭͔࢓ࣄ͘ΕΖ͍ͩ͘͞
  ҙ֎ͱࣗݾΞϐʔϧஏ͔͍ͣ͠
  ·ͩλΠϙमਖ਼͔͠ૹΒΕͯͳ͍
  Issue΍Βϝʔϧ΍ΒରԠ๩͍͠

  View Slide

 12. ࢓ࣄதͣͬͱχϠχϠͰ͖Δ

  View Slide

 13. ຊ୊
  ଟ͘ͷਓͷ໨ʹཹ·ͬͨϥΠϒϥϦͱ
  ͦ͏ͳΒͳ͔ͬͨϥΠϒϥϦͷҧ͍Λ੔ཧ
  ※ͨͬͨαϯϓϧ̍ͷ͜ͱͳͷͰ࿩൒෼ʹฉ͍͍ͯͩ͘͞

  View Slide

 14. λʔήοτͷ޿͞
  ✓ ӳޠͰൃ৴͢Δॏཁੑ
  ✓ READMEΛӳޠͰॻ͍ͨΒ͓ऴ͍Ͱ͸ͳ͍
  ✓ ʰ͸ͯϒϗοτΤϯτϦʔʱΑΓ

  ʰHacker News Top Linksʱͷํ͕10ഒྲྀೖ͕͋ͬͨ
  View Slide

 15. ར༻ͷϋʔυϧ
  ✓ Ұ໨ݟͯԿ͕͍ͨ͠ͷ͔ϋοΩϦ෼͔Δ
  ✓ Headless Chrome + Crawler = Headless Chrome
  Crawler
  ✓ READMEͰϝϦοτΛҰ൪࠷ॳʹΞϐʔϧ͓ͯ͘͠
  ✓ ͦΕͰ΋෼͔Βͳ͍ਓ޲͚ʹɺFAQΛ༻ҙ͢Δ
  ✓ ͍͍ͩͨͷਓ͸࠷ॳͷ਺ߦ͔͠ಡ·ͳ͍
  ✓ ը૾΍ϩΰͰ΋ʮ؆୯͞ʯ͸ΞϐʔϧͰ͖Δ

  View Slide

 16. ίʔυϦʔσΟϯάͷϋʔυϧ
  ✓ examples Λॆ࣮ͤ͞Δ
  ✓ Ұ൪ಡ·Ε͍ͯͨίϯςϯπ͸ examples ͩͬͨ
  ✓ πʔϧ͸ϑϧ׆༻͢Δ
  ✓ ESLint
  ✓ commitlint
  ✓ EditorConfig
  ✓ TypeScript / JSDoc support

  View Slide

 17. ৴པ
  ✓ όοδΛϑϧ׆༻
  ✓ ࠷৽ͷϏϧυঢ়گ͕Ұ໨Ͱ෼͔ΔΑ͏ʹ͢Δ
  ✓ άϦʔϯ͕ฒΜͰΔͱͳΜ͔҆৺͢Δ
  ✓ Greenkeeper࠷ڧઆ
  ✓ ࠷ޙͷίϛοτ͕൒೥લͷϓϩδΣΫτͱ͔࢖͍ͨ͘ͳ͍
  ✓ ͠͹Β͘αϘͬͯͯ΋ɺৗʹίϛοτͰ͖Δ

  View Slide

 18. Happy Niya-niya Hacking!

  View Slide