Yuji IsobeԶ͕࠷ॳʹϔουϨεChromeͰΫϩʔϥ࡞ͬͨࣄʹͳΜͶʔ͔ͳNodeֶԂ29࣌ݶ
View Slide
mineϓϩδΣΫτϚωʔδϟʔ at @yujiosakahttps://speakerdeck.com/yujiosaka/hitasurale-sitedeipuraningu
✓ Կނ͍·͞ΒΫϩʔϥͳͷ͔✓ ԿΛࢦͯ͠࡞͔ͬͨ✓ ԿΛߟ͑ͳ͕Β࡞͔ͬͨ✓ ͜Ε͔ΒͷΫϩʔϥࠓճΫϩʔϥΛ࡞ͬͨ
ڈ৭Μͳ͜ͱΛͬͨ…
ECZine࿈ࡌhttp://eczine.jp/article/detail/4869
ECઐՈσϏϡʔhttp://amzn.asia/aOkwFjH
ࠔͬͨ(´ɾωɾʆ)
ձࣾͰΤϯδχΞͩͱ ࢥΘΕͳ͘ͳ͖ͬͯͨorz
ݸࣾຖʹνϡʔχϯάΛߦ͏Ӧۀಉߦʹग़͔͚Δ৽نϓϩμΫτͷఏҊӦۀࢿྉΛॻ͖࢝ΊΔϓϨεϦϦʔεΛॻ͖࢝ΊΔ͍͚͑ͯͳ͍Ұઢ←AIΤϯδχΞͰ͢͠← ٕज़Ӧۀ͔ͳ← BizDevͩΑͶ← ͓ɺ͓͏…←͍͋ͭ͏ ɹΤϯδχΞ͡ΌͶʔΘ
Ͱ͖ΕΤϯδχΞͱͯ͠ Ұੜ൧Λ৯͍͖͍ͬͯͨ
ձࣾͰΤϯδχΞͱͯ͠ͷଚݫΛ࠶ͼऔΓ͢
ͦΜͳ͋Δ࣌…
ϔουϨεChromeΛΔhttps://developers.google.com/web/updates/2017/04/headless-chrome?hl=ja
✓ Chrome͕ϔουϨεϞʔυͰىಈͰ͖Δ✓ ChromeͷىಈΦϓγϣϯʹʮ--headessʯΛՃ͑Δ͚ͩ✓ දతͳϔουϨεϒϥβͱ͍͑PhantomJS✓ ߴͰ҆ఆͯ͠ಈ࡞͢Δ✓ ඪ४ͷରԠ͕ૣ͍ʢES2017Async-Await͕͑Δʣ✓ ओͳ༻్ςετࣗಈԽͱಈతΫϩʔϥϔουϨεChromeͱ
✓ ੩తΫϩʔϥʢwgetcurlʣ✓ υΩϡϝϯτʢHTMLϑΝΠϧʣͷϦΫΤετͷΈ✓ ϑΝΠϧΛύʔε͢Δ͚ͩͳͷͰߴʹಈ࡞͢Δ✓ AngularJSɺReactɺVue.jsͰ࡞ΒΕͨSPAαΠτͰಈ࡞͠ͳ͍✓ ಈతΫϩʔϥʢPhantomJSϔουϨεChromeʣ✓ ը૾JavaScript͓ΑͼCSSΛಡΈࠐΜͰඳը·Ͱߦ͏✓ JavaScriptͷ࣮ߦ·Ͱߦ͏ͷͰҰൠతʹ✓ SPAαΠτͰैདྷͷαΠτͱಉ͡Α͏ʹಈ࡞͢Δ੩తΫϩʔϥ vs. ಈతΫϩʔϥ※ উखͳ໋໊Ͱ͢
Chrome DevTools Protocolhttps://chromedevtools.github.io/devtools-protocol/✓ ࠷৽ͷ༷Chromiumίʔυ্ͷ JSONϑΝΠϧ✓ 1࣌ؒʹ1ճGitHubͷ ϨϙδτϦʹίϐʔ ͞Ε͍ͯΔ
ϕϯνϚʔΫhttps://hackernoon.com/benchmark-headless-chrome-vs-phantomjs-e7f44c6956c
RIP PhantomJShttps://groups.google.com/forum/#!topic/phantomjs/9aI5d-LDuNE
͜Ε͔Β࢝ΊΔͳΒ ϔουϨεChrome
✓ API͕Ϩϕϧա͗ͯѻ͍͕͍͠✓ ༷͕·ͩෆ҆ఆͰ͍͔͚Δͷ͕େม✓ ηΩϡϦςΟͷϒϩοΫʹҾ͔͔ͬΔ✓ Content Security PolicyͳͲɺϢʔβʔͷอޢ͕࡞ಈͯ͠͠·͏✓ ΧδϡΞϧʹόάΛ౿Ή✓ setRequestInterceptionͷ࣮͕·࣮ͩݧஈ֊͔͠͠ࢁੵΈ
✓ Google ChromeνʔϜ͕ ϝϯςφϯε✓ ߴϨϕϧͷAPIͰϔουϨε Chrome͕ѻ͑Δϥούʔ✓ 1݄ʹv1.0.0͕ϦϦʔε͞Εͨ✓ Slackάϧʔϓ࡞ΒΕ ରԠஸೡͰૣ͍GoogleChrome / puppeteerhttps://github.com/GoogleChrome/puppeteer
ϔουϨεChromeͰΫϩʔϥ
ͬͯͭϝονϟྲྀߦͬͯΔʙʙʙ
Զ͕࠷ॳʹ࡞ͬͨ͜ͱʹͳΜͶʔ͔ͳ
ؾ͍ͮͨ
puppeteer / exampleshttps://github.com/GoogleChrome/puppeteer/tree/master/examples
ʮͬͯΈͨʯͱʮղઆʯ ͔ΓͰ࣮༻తͳͷগͳ͍
ϔουϨεChromeͰ࠷ॳͷ࣮༻తͳΫϩʔϥΛ࡞Ζ͏
✓ طଘͷΫϩʔϥ͕PromiseʹରԠ͍ͯ͠ͳ͍✓ ࢄڥͰಈ࡞͢ΔNode.jsͷΫϩʔϥ͕ͳ͔ͬͨͦͷଞͷཧ༝
✓ ࣮༻తͳΫϩʔϥͱͯ͠ඞཁͳػೳΛຬ͍ͨͯ͠Δ✓ υΩϡϝϯτ͕ӳޠͰॻ͔Ε͍ͯΔ✓ ςετ͕ेΧόʔ͞Ε͍ͯΔ✓ ࢄڥͰಈ࡞͢Δ✓ APIγϯϓϧʹอͭ✓ puppeteer / examples ʹࡌͤͯΒ͏ΰʔϧΛܾΊΔ
͜ΕͰΤϯδχΞͱͯ͠ͷ ଚݫΛऔΓ͢
…
Ͱ͖ͨhttps://github.com/yujiosaka/headless-chrome-crawler
ΰʔϧୡhttps://github.com/GoogleChrome/puppeteer/tree/master/examples
Google Developersʹసࡌhttps://developers.google.com/web/tools/puppeteer/examples
ΞΫηε͕૿͑ͯϏϏΔ
)$$SBXMFSMBVODI \NBY%FQUI ୳ࡧ͢Δ࠷େͷਂ͞NBY$PODVSSFODZ ࠷େฒྻBMMPXFE%PNBJOT ڐՄ͞Ε͍ͯΔυϝΠϯFWBMVBUF1BHF bUJUMFUFYU ϖʔδ্ͰධՁ͞ΕΔؔPO4VDDFTT SFTVMU\ޭ࣌ʹධՁ͞ΕΔؔDPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A^ ^UIFO BTZODDSBXMFS\DSBXMFSRVFVF IUUQTXXXFNJODPKQBXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^σϞ
Ϋϩʔϥ͕Ͱ͖Δ·Ͱ
✓ ʮΫϩʔϦϯάʯͱʮεΫϨΠϐϯάʯҧ͏✓ ΫϩʔϦϯάɿHTML͔ΒϦϯΫΛݟ͚ͭΔ✓ εΫϨΠϐϯάɿHTML͔Βཉ͍͠ใΛݟ͚ͭΔ✓ ͦΕͧΕ୯ମͰଘࡏͯ͠ҙຯ͕ͳ͍࠷ϛχϚϧͳΫϩʔϥ
ೋͭͷڞ௨Կ͔
HTML͔ΒɹɹɹΛݟ͚ͭΔ
ͦΕͬͯjQueryͰΑ͘Ͷʁ
jQuery: true,ϖʔδʹK2VFSZΛࣗಈૠೖv1.0.0ϦϦʔε
)$$SBXMFSMBVODI \K2VFSZUSVF FWBMVBUF1BHF bUJUMFUFYU PO4VDDFTT SFTVMU\DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A^ ^UIFO BTZODDSBXMFS\DSBXMFSRVFVF IUUQTXXXFNJODPKQBXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^example
✓ ੩తΫϩʔϥʹ׳Ε͍ͯΔͱɺ͛͢ʔ͘ײ͡Δ✓ ͻͬͦΓΤϥʔͰࢭ·ͬͯͨΓ͢ΔͱϚδͰԜΉΠϥΠϥ͠ͳ͍Ϋϩʔϥ
✓ λεΫΩϡʔͱΩϟογϡʹRedisΛ༻͍Δ✓ ෳͷαʔόͰRedisΛڞ༗ࢄڥͰಈ࡞ͤ͞Δ
cache: new RedisCache(),ΩϟογϡετϨʔδʹ3FEJTΛࢦఆv1.3.0ϦϦʔε
)$$SBXMFSMBVODI \DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMFUFYU PO4VDDFTT SFTVMU\DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A^ ^UIFO BTZODDSBXMFS\DSBXMFSRVFVF IUUQTXXXBNB[PODPKQBXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^example)$$SBXMFSMBVODI \DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMFUFYU PO4VDDFTT SFTVMU\DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A^ ^UIFO BTZODDSBXMFS\DSBXMFSRVFVF IUUQTXXXBNB[PODPKQBXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^)$$SBXMFSMBVODI \DBDIFOFX3FEJT$BDIF \IPTU QPSU^ FWBMVBUF1BHF bUJUMFUFYU PO4VDDFTT SFTVMU\DPOTPMFMPH A\SFTVMUPQUJPOTVSM^aU\SFTVMUSFTVMU^A^ ^UIFO BTZODDSBXMFS\DSBXMFSRVFVF IUUQTXXXBNB[PODPKQBXBJUDSBXMFSPO*EMF BXBJUDSBXMFSDMPTF ^
✓ ෯༏ઌ୳ࡧʢBFSʣˍਂ͞༏ઌ୳ࡧʢDFSʣ✓ robots.txtʹै͏✓ XMLαΠτϚοϓ୳ࡧ✓ σόΠεͷΤϛϡϨʔγϣϯ✓ ϖʔδͷεΫϦʔϯγϣοτ✓ JSON/CSVग़ྗͦͷଞͷػೳ
͜Ε͔ΒͷΫϩʔϥ
✓ ͜ͷΫϩʔϥͷͨΊʹαʔόʔ100ฒͯ ΫϩʔϦϯά͢ΔౕͳΜ͍ͯͳ͍͠ΊΜͲ͍͘͞✓ ίϚϯυҰൃͰࢄڥʹσϓϩΠͯ͠ཉ͍͠ݱࡏͷ՝
✓ ߏཧʰπʔϧʱʹ͍ۙ✓ AWS LambdaɺAzure Functionsɺ Google CloudFunctionsΛ༰қʹσϓϩΠɾ࣮ߦ✓ Node.js, Python, Java, Scala, C#, F#, Go, Groovy,Kotlin, PHP & SwiftΛαϙʔτ✓ ศརͳϓϥάΠϯͨ͘͞ΜServerless Frameworkͱ
yarn (npm run) deployyarn (npm run) startv2.0.0 will be…"84-BNCEBʹσϓϩΠฒྻͰΫϩʔϦϯά։࢝
Զ͕࠷ॳʹϔουϨεChromeͰΫϩʔϥ࡞ͬͨࣄʹͳΜͶʔ͔ͳ
Զ͕࠷ॳʹϔουϨεChromeͰ࣮༻తͳΫϩʔϥ࡞ͬͨࣄʹͳΜͶʔ͔ͳ
͚ͩͲຊɺࣄͰ ͬͱίʔυΛॻ͖͍ͨ
WE ARE HIRINGhttps://www.emin.co.jp/blog/news/1527/ηʔϧε