"High-bandwidth HTTP downloads: unpeeling the onion" by Bruce Merry

www.ska.ac.za HIGH-BANDWIDTH HTTP PyCon ZA Bruce Merry www.ska.ac.za

www.ska.ac.za What do astronomers care about HTTP?

www.ska.ac.za What do astronomers care about HTTP? Figure: MeerKAT storage
cluster

cluster PB of storage

cluster PB of storage accessed through S -compatible interface.

cluster PB of storage accessed through S -compatible interface. Python library (katdal) presents high-level interface.

cluster PB of storage accessed through S -compatible interface. Python library (katdal) presents high-level interface. Aim for Gb/s ( GB/s).

www.ska.ac.za Rules of the benchmarking game • Fetch a GB
ﬁle from localhost. • Return the content as a bytes. • Use a single thread. • No TLS, no content encoding, no transfer encoding.

www.ska.ac.za First result (requests) 0 500 1000 1500 2000 2500
3000 3500 MB/s

www.ska.ac.za First result (requests) requests 0 500 1000 1500 2000
2500 3000 3500 MB/s 501 Python 3.6.12

www.ska.ac.za Why so slow? Will PyPy save the day? requests
0 500 1000 1500 2000 2500 3000 3500 MB/s 501 Python 3.6.12

www.ska.ac.za Why so slow? Will PyPy save the day? No.
requests 0 500 1000 1500 2000 2500 3000 3500 MB/s 501 647 Python 3.6.12 PyPy 7.3.1

www.ska.ac.za Inefﬁcient memory model • bytes is immutable: have to
copy to change • bytearray is zero-initialized • BytesIO.getvalue() makes a copy

www.ska.ac.za Inefﬁcient library implementations • Sometimes libraries make more copies
than needed. • Sometimes they work in tiny chunks.

www.ska.ac.za Layers requests urllib http.client socket

www.ska.ac.za Test code def load_requests(url: str) -> bytes: return requests.get(url).content
def load_urllib3(url: str) -> bytes: return urllib3.PoolManager().request('GET', url).data def load_httpclient(url: str) -> bytes: parts = urllib.parse.urlparse(url) conn = http.client.HTTPConnection(parts.netloc) conn.request('GET', parts.path) resp = conn.getresponse() return resp.read(resp.length) def load_socket_read(url: str) -> bytes: # Code is much too long for a slide ...

www.ska.ac.za Results requests urllib3 httpclient socket-read 0 500 1000 1500
2000 2500 3000 3500 MB/s 501 1371 896 3219 Python 3.6.12

www.ska.ac.za Requests — unnecessary chunking CONTENT_CHUNK_SIZE = 10 * 1024
... self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''

www.ska.ac.za Requests — unnecessary chunking CONTENT_CHUNK_SIZE = 10 * 1024
... self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b'' • kiB is too small to amortize overheads • bytes.join involves a copy

www.ska.ac.za Requests — unnecessary chunking 10 kiB 1 MiB Chunk
size 0 500 1000 1500 2000 2500 3000 3500 MB/s 501 1037 647 772 Python 3.6.12 PyPy 7.3.1

www.ska.ac.za Requests — an alternative We can bypass Response.content: with
requests.get(url, stream=True) as resp: return resp.raw.read()

www.ska.ac.za Requests — an alternative requests requests-stream urllib3 0 500
1000 1500 2000 2500 3000 3500 MB/s 501 1318 1371 Python 3.6.12

www.ska.ac.za What’s up with http.client? Let’s look inside HTTPResponse.read:

www.ska.ac.za What’s up with http.client? Let’s look inside HTTPResponse.read: if
amt is not None: # Amount is given, implement using readinto b = bytearray(amt) n = self.readinto(b) return memoryview(b)[:n].tobytes()

www.ska.ac.za What’s up with http.client? Let’s look inside HTTPResponse.read: if
amt is not None: # Amount is given, implement using readinto b = bytearray(amt) n = self.readinto(b) return memoryview(b)[:n].tobytes() • Allocate some memory, and zero-ﬁll it. • Read the data into that memory. • Make a copy of it.

www.ska.ac.za What if we don’t specify an amount? Then it’s
implemented with via _safe_read instead: def _safe_read(self, amt): s = [] while amt > 0: chunk = self.fp.read(min(amt, MAXAMOUNT)) if not chunk: raise IncompleteRead(b''.join(s), amt) s.append(chunk) amt -= len(chunk) return b"".join(s)

www.ska.ac.za What if we don’t specify an amount? Then it’s
implemented with via _safe_read instead: def _safe_read(self, amt): s = [] while amt > 0: chunk = self.fp.read(min(amt, MAXAMOUNT)) if not chunk: raise IncompleteRead(b''.join(s), amt) s.append(chunk) amt -= len(chunk) return b"".join(s) At least MAXAMOUNT = 1048576

www.ska.ac.za What if we don’t specify an amount? httpclient httpclient-na
0 500 1000 1500 2000 2500 3000 3500 MB/s 896 1334 Python 3.6.12

www.ska.ac.za It’ll be better — one day httpclient httpclient-na 0
500 1000 1500 2000 2500 3000 3500 MB/s 896 1334 892 3233 3258 3236 Python 3.6.12 3.8.2 master

www.ska.ac.za More results requests requests-stream urllib3 httpclient-na 0 500 1000
1500 2000 2500 3000 3500 MB/s Python 3.6.12 3.8.2 master PyPy 7.3.1

www.ska.ac.za Other libraries httpx tornado aiohttp 0 500 1000 1500
2000 2500 3000 3500 MB/s 709 868 996 711 481 1090 Python 3.6.12 PyPy 7.3.1

www.ska.ac.za So what do we do about Python .6? Let’s
relax the rules and return a numpy array. def load_requests_np(url: str) -> np.ndarray: with requests.get(url, stream=True) as resp: data = np.empty(int(resp.headers['Content-length']), np.uint8) resp.raw.readinto(memoryview(data)) return data This gets us MB/s.

relax the rules and return a numpy array. def load_requests_np(url: str) -> np.ndarray: with requests.get(url, stream=True) as resp: data = np.empty(int(resp.headers['Content-length']), np.uint8) resp.raw.readinto(memoryview(data)) return data This gets us 6 MB/s.

relax the rules and return a numpy array. def load_requests_np(url: str) -> np.ndarray: with requests.get(url, stream=True) as resp: data = np.empty(int(resp.headers['Content-length']), np.uint8) resp.raw.readinto(memoryview(data)) return data This gets us 6 MB/s. This time it’s urllib : def readinto(self, b): temp = self.read(len(b)) if len(temp) == 0: return 0 else: b[: len(temp)] = temp return len(temp)

www.ska.ac.za Now what? Solution: use response.raw._fp.readinto.

www.ska.ac.za Now what? Solution: use response.raw._fp.readinto. requests-np requests-np-fp 0 500
1000 1500 2000 2500 3000 3500 MB/s 764 3033 Python 3.6.12

www.ska.ac.za Summary People who write HTTP libraries don’t optimize for
throughput.

www.ska.ac.za Summary People who write HTTP libraries don’t optimize for
throughput. But sometimes you can do something about it.

www.ska.ac.za References https://github.com/ska-sa/pyconza2020-httpbench https://bugs.python.org/issue21644 https://bugs.python.org/issue36050 https://bugs.python.org/issue36051 https://bugs.python.org/issue41002 https://github.com/psf/requests/issues/5503 https://github.com/urllib3/urllib3/issues/1540

www.ska.ac.za SARAO, a business unit of the National Research Foundation.
The South African Radio Astronomy Observatory (SARAO) spearheads South Africa’s activities in the Square Kilometre Array Radio Telescope, commonly known as the SKA, in engineering, science and construction. SARAO is a National Facility managed by the National Research Foundation and incorporates radio astronomy instruments and programmes such as the MeerKAT and KAT- telescopes in the Karoo, the Hartebeesthoek Radio Astronomy Observatory (HartRAO) in Gauteng, the African Very Long Baseline Interferometry (AVN) programme in nine African countries as well as the associated human capital development and commercialisation endeavours. Contact information Bruce Merry Senior Science Processing Developer Email: [email protected]

"High-bandwidth HTTP downloads: unpeeling the o...

"High-bandwidth HTTP downloads: unpeeling the onion" by Bruce Merry

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript