Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"High-bandwidth HTTP downloads: unpeeling the onion" by Bruce Merry

Pycon ZA
October 08, 2020

"High-bandwidth HTTP downloads: unpeeling the onion" by Bruce Merry

The MeerKAT radio telescope produces massive volumes of data. We provide a data access library for scientists to retrieve the data, but our initial implementation using boto had disappointing performance when used on a high-speed (25 Gb/s) network. On investigation, we found that boto wraps requests wraps urllib3 wraps http.client, and these wrapping layers introduce a lot of overheads that limit bandwidth. I'll walk through all the steps involved in getting data from the socket into a final response, show how this reduces throughput, and describe our solution to achieve bandwidths of multiple gigabytes per second.

Pycon ZA

October 08, 2020
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. www.ska.ac.za
    HIGH-BANDWIDTH
    HTTP
    PyCon ZA
    Bruce Merry
    www.ska.ac.za

    View Slide

  2. www.ska.ac.za
    What do astronomers care about HTTP?

    View Slide

  3. www.ska.ac.za
    What do astronomers care about HTTP?
    Figure: MeerKAT storage cluster

    View Slide

  4. www.ska.ac.za
    What do astronomers care about HTTP?
    Figure: MeerKAT storage cluster
    PB of storage

    View Slide

  5. www.ska.ac.za
    What do astronomers care about HTTP?
    Figure: MeerKAT storage cluster
    PB of storage accessed through S -compatible interface.

    View Slide

  6. www.ska.ac.za
    What do astronomers care about HTTP?
    Figure: MeerKAT storage cluster
    PB of storage accessed through S -compatible interface.
    Python library (katdal) presents high-level interface.

    View Slide

  7. www.ska.ac.za
    What do astronomers care about HTTP?
    Figure: MeerKAT storage cluster
    PB of storage accessed through S -compatible interface.
    Python library (katdal) presents high-level interface.
    Aim for Gb/s ( GB/s).

    View Slide

  8. www.ska.ac.za
    Rules of the benchmarking game
    • Fetch a GB file from localhost.
    • Return the content as a bytes.
    • Use a single thread.
    • No TLS, no content encoding, no transfer encoding.

    View Slide

  9. www.ska.ac.za
    First result (requests)
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s

    View Slide

  10. www.ska.ac.za
    First result (requests)
    requests
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    Python
    3.6.12

    View Slide

  11. www.ska.ac.za
    Why so slow?
    Will PyPy save the day?
    requests
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    Python
    3.6.12

    View Slide

  12. www.ska.ac.za
    Why so slow?
    Will PyPy save the day? No.
    requests
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    647
    Python
    3.6.12
    PyPy 7.3.1

    View Slide

  13. www.ska.ac.za
    Inefficient memory model
    • bytes is immutable: have to copy to change
    • bytearray is zero-initialized
    • BytesIO.getvalue() makes a copy

    View Slide

  14. www.ska.ac.za
    Inefficient library implementations
    • Sometimes libraries make more copies than needed.
    • Sometimes they work in tiny chunks.

    View Slide

  15. www.ska.ac.za
    Layers
    requests
    urllib
    http.client
    socket

    View Slide

  16. www.ska.ac.za
    Test code
    def load_requests(url: str) -> bytes:
    return requests.get(url).content
    def load_urllib3(url: str) -> bytes:
    return urllib3.PoolManager().request('GET', url).data
    def load_httpclient(url: str) -> bytes:
    parts = urllib.parse.urlparse(url)
    conn = http.client.HTTPConnection(parts.netloc)
    conn.request('GET', parts.path)
    resp = conn.getresponse()
    return resp.read(resp.length)
    def load_socket_read(url: str) -> bytes:
    # Code is much too long for a slide
    ...

    View Slide

  17. www.ska.ac.za
    Results
    requests urllib3 httpclient socket-read
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    1371
    896
    3219
    Python
    3.6.12

    View Slide

  18. www.ska.ac.za
    Requests — unnecessary chunking
    CONTENT_CHUNK_SIZE = 10 * 1024
    ...
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''

    View Slide

  19. www.ska.ac.za
    Requests — unnecessary chunking
    CONTENT_CHUNK_SIZE = 10 * 1024
    ...
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
    • kiB is too small to amortize overheads
    • bytes.join involves a copy

    View Slide

  20. www.ska.ac.za
    Requests — unnecessary chunking
    10 kiB 1 MiB
    Chunk size
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    1037
    647
    772
    Python
    3.6.12
    PyPy 7.3.1

    View Slide

  21. www.ska.ac.za
    Requests — an alternative
    We can bypass Response.content:
    with requests.get(url, stream=True) as resp:
    return resp.raw.read()

    View Slide

  22. www.ska.ac.za
    Requests — an alternative
    requests requests-stream urllib3
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    501
    1318 1371
    Python
    3.6.12

    View Slide

  23. www.ska.ac.za
    What’s up with http.client?
    Let’s look inside HTTPResponse.read:

    View Slide

  24. www.ska.ac.za
    What’s up with http.client?
    Let’s look inside HTTPResponse.read:
    if amt is not None:
    # Amount is given, implement using readinto
    b = bytearray(amt)
    n = self.readinto(b)
    return memoryview(b)[:n].tobytes()

    View Slide

  25. www.ska.ac.za
    What’s up with http.client?
    Let’s look inside HTTPResponse.read:
    if amt is not None:
    # Amount is given, implement using readinto
    b = bytearray(amt)
    n = self.readinto(b)
    return memoryview(b)[:n].tobytes()
    • Allocate some memory, and zero-fill it.
    • Read the data into that memory.
    • Make a copy of it.

    View Slide

  26. www.ska.ac.za
    What if we don’t specify an amount?
    Then it’s implemented with via _safe_read instead:
    def _safe_read(self, amt):
    s = []
    while amt > 0:
    chunk = self.fp.read(min(amt, MAXAMOUNT))
    if not chunk:
    raise IncompleteRead(b''.join(s), amt)
    s.append(chunk)
    amt -= len(chunk)
    return b"".join(s)

    View Slide

  27. www.ska.ac.za
    What if we don’t specify an amount?
    Then it’s implemented with via _safe_read instead:
    def _safe_read(self, amt):
    s = []
    while amt > 0:
    chunk = self.fp.read(min(amt, MAXAMOUNT))
    if not chunk:
    raise IncompleteRead(b''.join(s), amt)
    s.append(chunk)
    amt -= len(chunk)
    return b"".join(s)
    At least MAXAMOUNT = 1048576

    View Slide

  28. www.ska.ac.za
    What if we don’t specify an amount?
    httpclient httpclient-na
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    896
    1334
    Python
    3.6.12

    View Slide

  29. www.ska.ac.za
    It’ll be better — one day
    httpclient httpclient-na
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    896
    1334
    892
    3233
    3258 3236
    Python
    3.6.12
    3.8.2
    master

    View Slide

  30. www.ska.ac.za
    More results
    requests requests-stream urllib3 httpclient-na
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    Python
    3.6.12
    3.8.2
    master
    PyPy 7.3.1

    View Slide

  31. www.ska.ac.za
    Other libraries
    httpx tornado aiohttp
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    709
    868
    996
    711
    481
    1090
    Python
    3.6.12
    PyPy 7.3.1

    View Slide

  32. www.ska.ac.za
    So what do we do about Python .6?
    Let’s relax the rules and return a numpy array.
    def load_requests_np(url: str) -> np.ndarray:
    with requests.get(url, stream=True) as resp:
    data = np.empty(int(resp.headers['Content-length']), np.uint8)
    resp.raw.readinto(memoryview(data))
    return data
    This gets us MB/s.

    View Slide

  33. www.ska.ac.za
    So what do we do about Python .6?
    Let’s relax the rules and return a numpy array.
    def load_requests_np(url: str) -> np.ndarray:
    with requests.get(url, stream=True) as resp:
    data = np.empty(int(resp.headers['Content-length']), np.uint8)
    resp.raw.readinto(memoryview(data))
    return data
    This gets us 6 MB/s.

    View Slide

  34. www.ska.ac.za
    So what do we do about Python .6?
    Let’s relax the rules and return a numpy array.
    def load_requests_np(url: str) -> np.ndarray:
    with requests.get(url, stream=True) as resp:
    data = np.empty(int(resp.headers['Content-length']), np.uint8)
    resp.raw.readinto(memoryview(data))
    return data
    This gets us 6 MB/s. This time it’s urllib :
    def readinto(self, b):
    temp = self.read(len(b))
    if len(temp) == 0:
    return 0
    else:
    b[: len(temp)] = temp
    return len(temp)

    View Slide

  35. www.ska.ac.za
    Now what?
    Solution: use response.raw._fp.readinto.

    View Slide

  36. www.ska.ac.za
    Now what?
    Solution: use response.raw._fp.readinto.
    requests-np requests-np-fp
    0
    500
    1000
    1500
    2000
    2500
    3000
    3500
    MB/s
    764
    3033
    Python
    3.6.12

    View Slide

  37. www.ska.ac.za
    Summary
    People who write HTTP libraries don’t optimize for throughput.

    View Slide

  38. www.ska.ac.za
    Summary
    People who write HTTP libraries don’t optimize for throughput.
    But sometimes you can do something about it.

    View Slide

  39. www.ska.ac.za
    References
    https://github.com/ska-sa/pyconza2020-httpbench
    https://bugs.python.org/issue21644
    https://bugs.python.org/issue36050
    https://bugs.python.org/issue36051
    https://bugs.python.org/issue41002
    https://github.com/psf/requests/issues/5503
    https://github.com/urllib3/urllib3/issues/1540

    View Slide

  40. www.ska.ac.za
    SARAO, a business unit of the National Research Foundation.
    The South African Radio Astronomy Observatory (SARAO) spearheads South Africa’s activities in the Square Kilometre Array Radio Telescope,
    commonly known as the SKA, in engineering, science and construction. SARAO is a National Facility managed by the National Research Foundation
    and incorporates radio astronomy instruments and programmes such as the MeerKAT and KAT- telescopes in the Karoo, the Hartebeesthoek Radio
    Astronomy Observatory (HartRAO) in Gauteng, the African Very Long Baseline Interferometry (AVN) programme in nine African countries as well as the
    associated human capital development and commercialisation endeavours.
    Contact information
    Bruce Merry
    Senior Science Processing Developer
    Email: [email protected]

    View Slide