Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arañas, Webbots y Scrapers con Geb

Arañas, Webbots y Scrapers con Geb

Charla CodeMotion 2016

Sergio del Amo

November 18, 2016
Tweet

More Decks by Sergio del Amo

Other Decks in Programming

Transcript

  1. STEP 2 UNDERSTAND HOW HTML IS BUILT AND ENCAPSULATE HTML

    IN GEB PAGE AND MODULES SCRAPING WITH GEB
  2. Define the interesting parts of your pages in a concise,

    maintanable and extensible manner GEB PAGES
  3. def browser = new Browser() browser.go 'http://sergiodelamo.es' def hPage =

    browser.page HomePage hPage.subscribeToGroovyCalamari(‘[email protected]') def latestPostsPage = browser.page WordpressLatestPostsPage def posts = latestPostsPage.fetchPosts() Source: Wikia GEB PAGES ARE BLUEPRINTS FOR YOUR HTML PAGES
  4. STEP 4 OUTPUT THE INFORMATION ‣ JAVA -JAR OUTPUT-ALL.JAR ‣

    EXPOSE AN API (E.G. AWS LAMBDA + API GATEWAY) SCRAPING WITH GEB
  5. GEB EXAMPLE GRADLE HTTPS://GITHUB.COM/GEB/GEB-EXAMPLE-GRADLE The following commands will launch the

    tests with the individual browsers: ./gradlew chromeTest ./gradlew firefoxTest ./gradlew phantomJsTest To run with all, you can run: ./gradlew test MARCIN ERDMANN
  6. DIFFERENT BROWSERS Run in html unit: $ ./gradlew -Dgeb.env=htmlUnit test

    Run in PhantomsJS $ ./gradlew -Dgeb.env=phantomJs -Dphantomjs.binary.path=./phantomjs-2.1.1-macosx/bin/phantomjs test Run in Firefox $./gradlew -Dgeb.env=firefox test Run in Chrome $./gradlew -Dgeb.env=chrome -Dwebdriver.chrome.driver=./chromedriver test
  7. ?