Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Create Go WebDriver client from scratch

Create Go WebDriver client from scratch

Web application developers often use Selenium to automate UI-level browser operations.

WebDriver client, which is the foundation for browser automation, can be self-made by Go standard packages such as net/http.

In this talk, I'll briefly touch on the details of WebDriver of W3C recommendation, and then explain how to implement WebDriver client by Go.

Kazuki Higashiguchi

November 13, 2021
Tweet

More Decks by Kazuki Higashiguchi

Other Decks in Technology

Transcript

  1. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Create

    Go WebDriver client from scratch 1 2021.11.13 Go Conference 2021 Autumn Kazuki Higashiguchi (@hgsgtk)
  2. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Goal

    of the talk 1 2 3 See implemented demo Go codes to gain in-depth knowledge of relevant technologies. Learn about WebDriver. You’ll be able to imagine the implementation of browser automation. You’ll want to create something from scratch with nice standard libraries of Go. 2
  3. 3 Engineering Manager @ BASE BANK, a subsidiary of BASE

    +4 years Gopher Kazuki Higashiguchi > Twitter: @hgsgtk > GitHub: @hgsgtk
  4. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Selenium

    WebDriver Selenium WebDriver is an API that allows us to write automated tests for web applications. • Interfaces to discover and manipulate DOM elements and control of user agents • Supports different sets of browsers (ChromeDriver, geckodriver, Microsoft Edge Driver...etc) • Provides compatibilities with many programming languages (JavaScript, Python, Ruby, Go...etc) ◦ Famous Go libraries: tebeka/selenium, sclevine/agouti 5
  5. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. The

    specification of WebDriver W3C Recommendation describes the specification of WebDriver. 6
  6. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Communication

    protocol • Provides an HTTP compliant wire protocol • Consists of Local end and Remote end *1 7 User Agent Remote end Local end HTTP *1 There are two node types of remote end, intermediary node and endpoint node. See W3C document for details. ChromeDriver
  7. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. WebDriver

    protocol is organised into commands HTTP request with a method and URL defined in W3C specification represents a single command. Therefore each command produces a single HTTP response. 8 User Agent Remote end Local end HTTP ChromeDriver
  8. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Commands

    and information structure Commands • New session: POST /session • Find element: POST /session/{session id}/element • Element click: POST /session/{session id}/element/{element id}/click • Take screenshot: GET /session/{session id}/screenshot • New tab window: POST /session/{session id}/window/new • Check status: GET /status ...etc 9 Session element timeouts element id session id screenshot cookie frame window ...etc text click ...etc status
  9. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Remote

    end listens incoming HTTP requests 10 User Agent Remote end Local end ChromeDriver Listening on 9515 port For instance, ChromeDriver launched locally will start listening on port 9515.
  10. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. gowd:

    self-made WebDriver client gowd: Go WebDriver client, implementing local end protocol (https://github.com/hgsgtk/gowd) *1 12 User Agent Remote end Local end HTTP ChromeDriver *1 This is just for the presentation of Go Conference 2021 Autumn
  11. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Feature

    of gowd Feature list ◦ Open browser / Close browser ◦ Navigate to page / Get current URL ◦ Find element / Get element text ◦ Click element ◦ New window tab ◦ Take screenshot Using only the Go standard libaries 13
  12. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Common

    browser automation code steps 15 Open browser Find element Navigate to page Take screenshot Click element Close browser
  13. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > Sub agenda 16 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  14. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 1. Open browser 17 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  15. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 1.

    Open browser Common interface of webdriver client libraries. • New a WebDriver (ex. NewWebDriver(), NewChromeDriver()...) • Open a browser (ex. driver.New()...) 18
  16. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Open

    browser via curl 19 3. Opened 1. Send HTTP request 2. JSON response will be return from ChromeDriver
  17. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Internal

    implementation to open browser 20 1. Send POST /session request a. Will open a new browser 2. Decode JSON response body from remote end 3. Keep session id to use afterwards
  18. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. One

    session represents one browser 21 Session element timeouts element id session id screenshot cookie frame window ...etc text click ...etc status
  19. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 2. Close browser 22 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  20. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 2.

    Close browser 23 Send DELETE /session/{session id} request. The browser pointed by session id will close. Browser.Close() closes the browser.
  21. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 3. Navigate to page 24 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  22. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 3.

    Navigate to page 25 • Send POST /session/{session id}/url request ◦ Specify the url in the request body Browser.NavigateTo() navigates browser to a given URL.
  23. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 4. Find element 26 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  24. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 4.

    Find element 27 Browser.FindElement() find a element in the current browser by using locator strategy (“link text”). In the code example, tried to find element which link text is “More information…”.
  25. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Locator

    strategy Element locator strategy is an enumurated attribute to search for elements in the current browser. 28 • css selector: “div > a“ • link text: “More information…“ • partial link text: “Mor“ • tag name: “<h1>“ • xpath: “//div/a“
  26. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Internal

    implementation to find element 29 1. Send GET /session/{session id}/element request a. Specify the locator strategy and search value in the request body 2. Decode JSON response body from remote en 3. Keep element id to use afterwards
  27. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Internal

    implementation to get element id element-6066-11e4-a52e-4f735466c ecf is string constant, called web element identifier. *1 30 JSON response Get element id from JSON response. *1 The old WebDriver JSON protocol uses `ELEMENT` key.
  28. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Element

    represents an element 31 Session element timeouts element id session id screenshot cookie frame window ...etc text click ...etc status
  29. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 5. Click element 32 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  30. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 5.

    Click element Element.Click() clicks the element pointed by element id. Send POST /session/{session id}/element/{element id}/click request 33
  31. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive > 6. Take screenshot 34 1. Open browser 2. Close browser 3. Navigate to page 4. Find element 5. Click element 6. Take screenshot Open browser Find element Navigate to page Take screenshot Click element Close browser
  32. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. 6.

    Take screenshot 35 example.com.png Browser.TakeScreenshot() takes the screenshot of current browser.
  33. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Internal

    implementation to take screenshot 1. Send GET /session/{sessionID}/screenshot 2. Decode JSON response body from remote end 3. Decode Base64 encoded screenshot image 36 Remote end Local end Base64 encoded string ChromeDriver
  34. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Remote

    end returns base64-encoded PNG image 37 https://source.chromium.org/chromium/chr omium/src/+/master:chrome/test/chromed river/chrome/web_view.h;l=232 *1 PNG has been created as a lossless image format. It’s supposed to exactly preserve all details of an image. *2 Lossy format (like JPEG) produce much smaller files, because they don’t save unnecessary details. W3C Recommendation describes the specification of screenshot. • Dumps a snapshot as a loseless PNG image *1*2 • PNG image will be returned as a Base64 encoded string ChromeDriver / web_view.h
  35. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Deep

    dive into Base64 encoding There are two types of “Base64 encoding” defined in RFC4648, RFC3548. • Base64: Base 64 Encoding • Base64url: Base 64 Encoding with URL and Filename Safe Alphabet 38 https://pkg.go.dev/encoding/[email protected]#pkg-variables src/encoding/base64/base64.go
  36. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Difference

    between Base64 and Base64url 39 https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/encoding/base64/base64.go;l=35 In Base64url, Replace + -> - / -> _ src/encoding/base64/base64.go
  37. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. More

    Base N data encodings 40 Standard Definition and usecase Go implementation Base16 RFC4648 Base32 RFC4648 encoding/base32 Base36 Used by URL shortening service (ex. TinyURL) *1 Base45 Draft IETF Specification Dasio/base45 Base58 Used by bitcoin address itchyny/base58-go Base64 RFC4648 encoding/base64 Base85 (Ascii85) RFC1924 Used by Adboe’s PostScript and PDF Base91 (basE91) developed by Joachim Henke Base92, 94, 95 - *1 https://en.wikipedia.org/wiki/Binary-to-text_encoding#Base58
  38. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. Summary

    1 2 3 WebDriver client communicates with remote end (ex. ChromeDriver) by HTTP protocol WebDriver is Interfaces to discover and manipulate DOM elements and control of user agents WebDriver client can be made by only Go standard libraries (ex. net/http, encoding/json, encoding/base64) 41
  39. © 2012-2019 BASE, Inc. © 2012-2021 BASE BANK, Inc. More

    BASE! #basebank-code-reading-ja 42 BASE BANK holds Go code reading party on a regular basis. The next session will be on 2021.11.25 (Thu). • #basebank-code-reading-ja (in Gophers slack workspace) • basebank/gophers-code-reading-party (GitHub repository) https://github.com/basebank/gophers-code-reading-pa rty/issues/16