Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Arañas, Webbots y Scrapers con Geb

Arañas, Webbots y Scrapers con Geb

Charla CodeMotion 2016

Avatar for Sergio del Amo

Sergio del Amo

November 18, 2016
Tweet

More Decks by Sergio del Amo

Other Decks in Programming

Transcript

  1. STEP 2 UNDERSTAND HOW HTML IS BUILT AND ENCAPSULATE HTML

    IN GEB PAGE AND MODULES SCRAPING WITH GEB
  2. Define the interesting parts of your pages in a concise,

    maintanable and extensible manner GEB PAGES
  3. def browser = new Browser() browser.go 'http://sergiodelamo.es' def hPage =

    browser.page HomePage hPage.subscribeToGroovyCalamari(‘[email protected]') def latestPostsPage = browser.page WordpressLatestPostsPage def posts = latestPostsPage.fetchPosts() Source: Wikia GEB PAGES ARE BLUEPRINTS FOR YOUR HTML PAGES
  4. STEP 4 OUTPUT THE INFORMATION ‣ JAVA -JAR OUTPUT-ALL.JAR ‣

    EXPOSE AN API (E.G. AWS LAMBDA + API GATEWAY) SCRAPING WITH GEB
  5. GEB EXAMPLE GRADLE HTTPS://GITHUB.COM/GEB/GEB-EXAMPLE-GRADLE The following commands will launch the

    tests with the individual browsers: ./gradlew chromeTest ./gradlew firefoxTest ./gradlew phantomJsTest To run with all, you can run: ./gradlew test MARCIN ERDMANN
  6. DIFFERENT BROWSERS Run in html unit: $ ./gradlew -Dgeb.env=htmlUnit test

    Run in PhantomsJS $ ./gradlew -Dgeb.env=phantomJs -Dphantomjs.binary.path=./phantomjs-2.1.1-macosx/bin/phantomjs test Run in Firefox $./gradlew -Dgeb.env=firefox test Run in Chrome $./gradlew -Dgeb.env=chrome -Dwebdriver.chrome.driver=./chromedriver test
  7. ?