Slide 1

Slide 1 text

Let's get you started with asynchronous programming Ryan Varley PyData Global | 3rd December 2024

Slide 2

Slide 2 text

Assumption You have used python but not used async (much) Aim You leave here with a basic understanding, and write some async code! Warning Async is a tool, not everything is a nail This talk

Slide 3

Slide 3 text

Why is it confusing? ● Paradigm is different ● Multiple frameworks ● It has changed throughout python 3* ● It behaves differently in different environments** ● Rabbit holes*** * asyncio became standard library in 3.4, await and async def became a thing in 3.5. This is not a history. ** More soon *** Were already on the 3rd *

Slide 4

Slide 4 text

Assume everything I say has a

Slide 5

Slide 5 text

What is I/O? Its network calls 󰤇 ● API Requests ● Saving files ○ To the cloud / another machine (network) ○ To local disk (fast, maxes write anyway) ● DB requests ○ How many dbs run on the same machine? Network! ○ Async not well supported by db libraries.

Slide 6

Slide 6 text

We are going to focus on asyncio ● Its part of the standard library since 3.4, it's here to stay ● There are alternatives but we won't discuss them ○ Older: Tornado, gevent, twisted ○ Modern: asyncio, Curio ○ Latest: Trio, AnyIO, UVloop ● Everything here is in python 3.12.2 ● Code here should work in 3.7 onwards and mostly work in 3.5 onwards, though i haven't verified it.

Slide 7

Slide 7 text

What does async do? ● Lets us do other things while we wait for I/O ○ Generally more I/O ● For example we want to make 20 API calls where each call takes 1s ○ Syncronously ~20s ○ Asyncronously ~1s

Slide 8

Slide 8 text

ASync vs Multithreading vs Multiprocesing *times are made up and exaggerated

Slide 9

Slide 9 text

Let’s write one! Sync version import requests API_URL = "http://localhost:8000" def get_video(video_id): response = requests.get( f"{API_URL}/videos/{video_id}", ) response.raise_for_status() return response.json() get_video(video_id="656c94f1875e6b1ap3ae5f19") { 'id': '656c94f1875e6b1ap3ae5f19', 'title': 'JWST: Looking Beyond The Pretty Pictures', … }

Slide 10

Slide 10 text

Let’s write one! Async version import aiohttp async def get_video_async(video_id): async with aiohttp.ClientSession() as session: async with session.get( f"{API_URL}/videos/{video_id}" ) as response: response.raise_for_status() return await response.json() get_video_async("656c94f1875e6b1ap3ae5f19") Generally you would reuse this Ensures resources are released correctly Need to await the response Not requests async

Slide 11

Slide 11 text

Coroutines A coroutine is a special function that can be paused and resumed async def: defines a function that returns a coroutine instead of the return value This coroutine is lazy and must be awaited (“run”) to get the return value

Slide 12

Slide 12 text

How do we run this coroutine? If it ran now then it would be synchronous await blocks subsequent code in the same coroutine video = await get_video_async("656c94f1875e6b1ap3ae5f19") video = await get_video_async("656c94f1875e6b1ap3ae5f19") channel = await get_video_channel(video[‘channel_id’])

Slide 13

Slide 13 text

How do we run this coroutine? FastAPI Await in an async endpoint @app.get("/video") async def get_video(video_id: str): response = await get_video_async(video_id) return response Python REPL File "", line 1 SyntaxError: 'await' outside function IPython / Juptyer {'id': '656c94f1666e6b1de3ae5f19', 'title': 'JWST: Looking Beyond The Pretty Pictures', … Script / pytest File "", line 1 SyntaxError: 'await' outside function python -m asyncio video = await get_video_async("656c94f1875e6b1ap3ae5f19")

Slide 14

Slide 14 text

In a script we need to start the event loop ourselves asyncio.run(MY_ENTRY_FUNCTION()) {'id': ‘656c94f1875e6b1ap3ae5f1 9’, 'title': 'JWST: Looking Beyond The Pretty Pictures'...} If you try to run this in Jupyter you will get RuntimeError: asyncio.run() cannot be called from a running event loop The event loop is running! You need to await instead. import asyncio async def main(): video = await get_video_async("656c94f1666e6b1de3ae5f19") print(video) if __name__ == "__main__": asyncio.run(main())

Slide 15

Slide 15 text

If its async, it needs to be async all the way* asyncio.run(main()) asyncio.run(main2()) sync sync async async sync Everything in here must be async… or there’s no point (mostly)

Slide 16

Slide 16 text

The simplest entry point ● If you are using FastAPI already convert some endpoints to be async! ● If you use Jupyter, next time you write requests, use async FastAPI @app.get("/video") async def get_video(video_id: str): response = await get_video_async(video_id) return response

Slide 17

Slide 17 text

await will run the coroutine if Python REPL Started with python -m asyncio IPython / Jupyter Works FastAPI Endpoint is defined with async def Pytest You use a plugin (e.g. pytest-asyncio, anyio) Script asyncio.run() is used to call the top level function

Slide 18

Slide 18 text

Let’s make 100 calls to an endpoint async def get_video_async(video_id, session): async with session.get( f"{API_URL}/videos/{video_id}", ) as response: response.raise_for_status() return await response.json() async def fetch_all_videos(video_ids): connector = aiohttp.TCPConnector(limit=100, limit_per_host=10) async with aiohttp.ClientSession(connector=connector) as session: tasks = [get_video_async(video_id, session) for video_id in video_ids] return await asyncio.gather(*tasks) videos = await my_code.fetch_all_videos(VIDEO_IDS) 0.2s to run 1 2.1-12s to run 100 1.8-2.7s to run 100 (max 10 at a time) asyncio.gather Groups a list of awaitables into one awaitable running them concurrently Expand Define limits Session passed in

Slide 19

Slide 19 text

Let’s do something more complicated Content warnings service Video service 1. Get the video data 2. Get the transcript 3. Run our content warnings model 4. Update video with content warnings

Slide 20

Slide 20 text

Let's write a job API_URL = "http://localhost:8000" async def get_video(video_id, session): async with session.get( f"{API_URL}/videos/{video_id}" ) as response: response.raise_for_status() return await response.json() async def get_video_transcript(video_id, session): async with session.get( f"{API_URL}/videos/{video_id}/transcript" ) as response: response.raise_for_status() return await response.text() def is_nasa(text): # Simulate model taking 1s [hashlib.sha512(b"a" * 10**8).hexdigest() for i in range(10)] if "nasa" in text.lower().split(): return True return False

Slide 21

Slide 21 text

Let's write a job async def generate_content_warnings(video_id, session): video = await get_video(video_id, session) transcript = await get_video_transcript(video_id, session) text = f"{video["title"]} {video["description"]} {transcript}" warning_ids = [] result = is_nasa(text) if result: warning_ids.append("nasa") await save_video_content_warnings(video_id, warning_ids) async def main(): async with aiohttp.ClientSession() as session: tasks = [ generate_content_warnings(video_id, session) for video_id in VIDEO_IDS ] await asyncio.gather(*tasks) if __name__ == "__main__": asyncio.run(main())

Slide 22

Slide 22 text

What would the synchronous version do? 3m 26s

Slide 23

Slide 23 text

Let’s run it! 1m 22s https://www.youtube.com/watch?v=gRCMZuAJvAk

Slide 24

Slide 24 text

We are overwhelming the API - Limit per host async def main(): async with aiohttp.ClientSession() as session: tasks = [ generate_content_warnings(video_id, session) for video_id in VIDEO_IDS ] await asyncio.gather(*tasks) async def main(): connector = aiohttp.TCPConnector( limit=100, limit_per_host=10) async with aiohttp.ClientSession(connector=connector) as session: tasks = [ generate_content_warnings(video_id, session) for video_id in VIDEO_IDS ] await asyncio.gather(*tasks) But let’s do something else async with session.get( f"{API_URL}/videos/{video_id}" ) as response:

Slide 25

Slide 25 text

We are overwhelming the API - Semaphores async def get_video(video_id, session): async with session.get( f"{API_URL}/videos/{video_id}" ) as response: response.raise_for_status() return await response.json() async def get_video_transcript(video_id, session): async with session.get( f"{API_URL}/videos/{video_id}/transcript" ) as response: response.raise_for_status() return await response.text() video_semaphore = asyncio.Semaphore(5) transcript_semaphore = asyncio.Semaphore(10) async def get_video(video_id, session): async with video_semaphore: async with session.get( f"{API_URL}/videos/{video_id}", ) as response: response.raise_for_status() return await response.json() async def get_video_transcript(video_id, session): async with transcript_semaphore: async with session.get( …

Slide 26

Slide 26 text

We are running video and transcript in sequence async def generate_content_warnings(video_id, session): video = await get_video(video_id, session) transcript = await get_video_transcript(video_id, session) text = f"{video["title"]} {video["description"]} {transcript}" warning_ids = [] result = is_nasa(text) if result: warning_ids.append("nasa") await save_video_content_warnings(video_id, warning_ids) async def generate_content_warnings(video_id, session): video_task = get_video(video_id, session) transcript_task = get_video_transcript(video_id, session) video, transcript = await asyncio.gather( video_task, transcript_task) text = f"{video["title"]} {video["description"]} {transcript}" warning_ids = [] result = is_nasa(text) if result: warning_ids.append("nasa") await save_video_content_warnings(video_id, warning_ids)

Slide 27

Slide 27 text

The model run is blocking async def generate_content_warnings(video_id, session): video_task = get_video(video_id, session) transcript_task = get_video_transcript(video_id, session) video, transcript = await asyncio.gather( video_task, transcript_task) text = f"{video["title"]} {video["description"]} {transcript}" warning_ids = [] loop = asyncio.get_running_loop() result = await loop.run_in_executor( None, is_nasa, text, video_id) if result: warning_ids.append("nasa") await save_video_content_warnings(video_id, warning_ids) async def generate_content_warnings(video_id, session): video_task = get_video(video_id, session) transcript_task = get_video_transcript(video_id, session) video, transcript = await asyncio.gather( video_task, transcript_task ) text = f"{video["title"]} {video["description"]} {transcript}" warning_ids = [] result = is_nasa(text) if result: warning_ids.append("nasa") await save_video_content_warnings(video_id, warning_ids)

Slide 28

Slide 28 text

Aside - run_in_executor loop = asyncio.get_running_loop() result = await loop.run_in_executor( None, is_nasa, text, video_id) Will run in a thread, can still block from concurrent.futures import ProcessPoolExecutor loop = asyncio.get_running_loop() with ProcessPoolExecutor() as executor: result = await loop.run_in_executor(executor, is_nasa, text, video_id) Is now multiprocessing, but comes with the same pitfalls

Slide 29

Slide 29 text

Let’s run it! https://www.youtube.com/watch?v=aFd1GjTRfig

Slide 30

Slide 30 text

Let’s run it! 13s! 1st attempt 1m 22s sync 3m 26s

Slide 31

Slide 31 text

So are you ready to use async? ● If you went away and used async for the first time let me know! ● I would love to hear your feedback Thank you! Ryan Varley PyData Global | 3rd December 2024 https://www.linkedin.com/in/ryanvarley/ https://rynv.uk/async-pydata-global-24/