Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Create Go WebDriver client from scratch

Create Go WebDriver client from scratch

Web application developers often use Selenium to automate UI-level browser operations.

WebDriver client, which is the foundation for browser automation, can be self-made by Go standard packages such as net/http.

In this talk, I'll briefly touch on the details of WebDriver of W3C recommendation, and then explain how to implement WebDriver client by Go.

Kazuki Higashiguchi
PRO

November 13, 2021
Tweet

More Decks by Kazuki Higashiguchi

Other Decks in Technology

Transcript

  1. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Create Go WebDriver client
    from scratch
    1
    2021.11.13 Go Conference 2021 Autumn
    Kazuki Higashiguchi (@hgsgtk)

    View Slide

  2. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Goal of the talk
    1


    See implemented demo Go codes to gain in-depth
    knowledge of relevant technologies.
    Learn about WebDriver. You’ll be able to imagine the
    implementation of browser automation.
    You’ll want to create something from scratch with
    nice standard libraries of Go.
    2

    View Slide

  3. 3
    Engineering Manager @ BASE BANK, a subsidiary of BASE
    +4 years Gopher
    Kazuki Higashiguchi
    > Twitter: @hgsgtk
    > GitHub: @hgsgtk

    View Slide

  4. Overview of WebDriver

    View Slide

  5. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Selenium WebDriver
    Selenium WebDriver is an API that allows us to write
    automated tests for web applications.
    ● Interfaces to discover and manipulate DOM elements and control of user agents
    ● Supports different sets of browsers (ChromeDriver, geckodriver, Microsoft Edge
    Driver...etc)
    ● Provides compatibilities with many programming languages (JavaScript, Python,
    Ruby, Go...etc)
    ○ Famous Go libraries: tebeka/selenium, sclevine/agouti
    5

    View Slide

  6. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    The specification of WebDriver
    W3C Recommendation describes the specification of
    WebDriver.
    6

    View Slide

  7. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Communication protocol
    ● Provides an HTTP compliant wire protocol
    ● Consists of Local end and Remote end *1
    7
    User
    Agent
    Remote
    end
    Local end
    HTTP
    *1 There are two node types of remote end, intermediary node and endpoint node. See W3C document for details.
    ChromeDriver

    View Slide

  8. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    WebDriver protocol is organised into commands
    HTTP request with a method and URL defined in W3C specification represents
    a single command. Therefore each command produces a single HTTP response.
    8
    User
    Agent
    Remote
    end
    Local end
    HTTP
    ChromeDriver

    View Slide

  9. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Commands and information structure
    Commands
    ● New session: POST /session
    ● Find element: POST /session/{session id}/element
    ● Element click: POST /session/{session id}/element/{element id}/click
    ● Take screenshot: GET /session/{session id}/screenshot
    ● New tab window: POST /session/{session id}/window/new
    ● Check status: GET /status ...etc
    9
    Session element
    timeouts
    element id
    session id
    screenshot
    cookie frame
    window
    ...etc
    text
    click
    ...etc
    status

    View Slide

  10. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Remote end listens incoming HTTP requests
    10
    User Agent
    Remote end
    Local end
    ChromeDriver
    Listening on 9515 port
    For instance, ChromeDriver launched locally will start listening on port 9515.

    View Slide

  11. WebDriver client
    created from scratch

    View Slide

  12. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    gowd: self-made WebDriver client
    gowd: Go WebDriver client, implementing local end protocol
    (https://github.com/hgsgtk/gowd) *1
    12
    User
    Agent
    Remote
    end
    Local end
    HTTP
    ChromeDriver
    *1 This is just for the presentation of Go Conference 2021 Autumn

    View Slide

  13. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Feature of gowd
    Feature list
    ○ Open browser / Close browser
    ○ Navigate to page / Get current URL
    ○ Find element / Get element text
    ○ Click element
    ○ New window tab
    ○ Take screenshot
    Using only the Go standard libaries
    13

    View Slide

  14. Deep dive into
    WebDriver client

    View Slide

  15. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Common browser automation code steps
    15
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  16. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > Sub agenda
    16
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  17. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 1. Open browser
    17
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  18. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    1. Open browser
    Common interface of webdriver client libraries.
    ● New a WebDriver (ex. NewWebDriver(), NewChromeDriver()...)
    ● Open a browser (ex. driver.New()...)
    18

    View Slide

  19. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Open browser via curl
    19
    3. Opened
    1. Send HTTP request
    2. JSON response will be return from ChromeDriver

    View Slide

  20. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Internal implementation to open browser
    20
    1. Send POST /session request
    a. Will open a new browser
    2. Decode JSON response body from
    remote end
    3. Keep session id to use afterwards

    View Slide

  21. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    One session represents one browser
    21
    Session element
    timeouts
    element id
    session id
    screenshot
    cookie frame
    window
    ...etc
    text
    click
    ...etc
    status

    View Slide

  22. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 2. Close browser
    22
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  23. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    2. Close browser
    23
    Send DELETE /session/{session id}
    request.
    The browser pointed by session id will
    close.
    Browser.Close() closes the browser.

    View Slide

  24. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 3. Navigate to page
    24
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  25. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    3. Navigate to page
    25
    ● Send POST /session/{session id}/url
    request
    ○ Specify the url in the request body
    Browser.NavigateTo() navigates
    browser to a given URL.

    View Slide

  26. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 4. Find element
    26
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  27. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    4. Find element
    27
    Browser.FindElement() find a element in the current browser by using locator strategy
    (“link text”).
    In the code example, tried to find element which link text is “More information…”.

    View Slide

  28. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Locator strategy
    Element locator strategy is an enumurated attribute to search for elements in the
    current browser.
    28
    ● css selector: “div > a“
    ● link text: “More information…“
    ● partial link text: “Mor“
    ● tag name: ““
    ● xpath: “//div/a“

    View Slide

  29. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Internal implementation to find element
    29
    1. Send GET /session/{session id}/element
    request
    a. Specify the locator strategy and search
    value in the request body
    2. Decode JSON response body from remote
    en
    3. Keep element id to use afterwards

    View Slide

  30. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Internal implementation to get element id
    element-6066-11e4-a52e-4f735466c
    ecf is string constant, called web
    element identifier. *1
    30
    JSON response
    Get element id from JSON response.
    *1 The old WebDriver JSON protocol uses `ELEMENT` key.

    View Slide

  31. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Element represents an element
    31
    Session element
    timeouts
    element id
    session id
    screenshot
    cookie frame
    window
    ...etc
    text
    click
    ...etc
    status

    View Slide

  32. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 5. Click element
    32
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  33. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    5. Click element
    Element.Click() clicks the element pointed by element id.
    Send POST /session/{session id}/element/{element id}/click request
    33

    View Slide

  34. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive > 6. Take screenshot
    34
    1. Open browser
    2. Close browser
    3. Navigate to page
    4. Find element
    5. Click element
    6. Take screenshot
    Open
    browser
    Find
    element
    Navigate
    to page
    Take
    screenshot
    Click
    element
    Close
    browser

    View Slide

  35. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    6. Take screenshot
    35
    example.com.png
    Browser.TakeScreenshot() takes the screenshot of current browser.

    View Slide

  36. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Internal implementation to take screenshot
    1. Send GET
    /session/{sessionID}/screenshot
    2. Decode JSON response body from
    remote end
    3. Decode Base64 encoded screenshot
    image
    36
    Remote
    end
    Local end
    Base64
    encoded string
    ChromeDriver

    View Slide

  37. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Remote end returns base64-encoded PNG image
    37
    https://source.chromium.org/chromium/chr
    omium/src/+/master:chrome/test/chromed
    river/chrome/web_view.h;l=232
    *1 PNG has been created as a lossless image format. It’s supposed to exactly preserve all details of an image.
    *2 Lossy format (like JPEG) produce much smaller files, because they don’t save unnecessary details.
    W3C Recommendation describes the specification of screenshot.
    ● Dumps a snapshot as a loseless PNG image *1*2
    ● PNG image will be returned as a Base64 encoded string
    ChromeDriver / web_view.h

    View Slide

  38. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Deep dive into Base64 encoding
    There are two types of “Base64 encoding” defined in RFC4648, RFC3548.
    ● Base64: Base 64 Encoding
    ● Base64url: Base 64 Encoding with URL and Filename Safe Alphabet
    38
    https://pkg.go.dev/encoding/[email protected]#pkg-variables
    src/encoding/base64/base64.go

    View Slide

  39. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Difference between Base64 and Base64url
    39
    https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/encoding/base64/base64.go;l=35
    In Base64url, Replace
    + -> -
    / -> _
    src/encoding/base64/base64.go

    View Slide

  40. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    More Base N data encodings
    40
    Standard Definition and usecase Go implementation
    Base16 RFC4648
    Base32 RFC4648 encoding/base32
    Base36 Used by URL shortening service (ex. TinyURL) *1
    Base45 Draft IETF Specification Dasio/base45
    Base58 Used by bitcoin address itchyny/base58-go
    Base64 RFC4648 encoding/base64
    Base85 (Ascii85) RFC1924 Used by Adboe’s PostScript and PDF
    Base91 (basE91) developed by Joachim Henke
    Base92, 94, 95 -
    *1 https://en.wikipedia.org/wiki/Binary-to-text_encoding#Base58

    View Slide

  41. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    Summary
    1


    WebDriver client communicates with remote end
    (ex. ChromeDriver) by HTTP protocol
    WebDriver is Interfaces to discover and manipulate
    DOM elements and control of user agents
    WebDriver client can be made by only Go standard
    libraries (ex. net/http, encoding/json, encoding/base64)
    41

    View Slide

  42. © 2012-2019 BASE, Inc.
    © 2012-2021 BASE BANK, Inc.
    More BASE! #basebank-code-reading-ja
    42
    BASE BANK holds Go code reading party on a regular basis.
    The next session will be on 2021.11.25 (Thu).
    ● #basebank-code-reading-ja
    (in Gophers slack workspace)
    ● basebank/gophers-code-reading-party
    (GitHub repository)
    https://github.com/basebank/gophers-code-reading-pa
    rty/issues/16

    View Slide