Exploring Concurrency in Python & AWS

FROM THREADS TO LAMBDAS EXPLORING CONCURRENCY IN PYTHON AND AWS

FROM THREADS TO LAMBDAS AND LAMBDAS WITH THREADS EXPLORING CONCURRENCY
IN PYTHON AND AWS

BACKGROUND PROBLEM: INTRA-S3 BACKUPS ▸ Daily backup of 250 objects,
from one bucket to another, each object ~ 600-800 MB. ▸ Initial bulk backup of 6 months: 45000 objects. ~ 25PB

FROM A FOR LOOP TO THREADS BEFORE THE THREADS PASS
#1 for obj in src_objects: dest_obj.copy_from( CopySource={ 'Bucket': obj.bucket_name, 'Key': obj.key })

FROM A FOR LOOP TO THREADS BEFORE THE THREADS PASS
#1 for obj in src_objects: dest_obj.copy_from( CopySource={ 'Bucket': obj.bucket_name, 'Key': obj.key }) EXECUTION TIME: 1 hr 45 mins !

CONCURRENCY IN PYTHON CONCURRENCY IN PYTHON ▸ asyncio ▸ Event
loops, asynchronous IO ▸ concurrent.futures ▸ high level abstractions: ThreadPoolExecutor and ProcessPoolExcutor ▸ threading ▸ low level constructs: build your own solution based on thread, semaphores and locks ▸ multiprocessing ▸ similar to threading, but for processes

CONCURRENCY IN PYTHON CONCURRENT.FUTURES - USES AND LIMITATIONS ▸ ProcessPoolExecutor
▸ Multiple Python processes across CPUs ▸ Good for CPU intensive tasks ▸ ThreadPoolExecutor ▸ Threads run inside a single Python interpreter. ▸ Only one thread can run at a time, because of the GIL (Global Interpreter Lock) ▸ Good for I/O. When a thread is blocked on IO, it releases the GIL, which gets acquired by another thread.

CONCURRENCY IN PYTHON GIL ▸ http://www.dabeaz.com/python/UnderstandingGIL.pdf

PYTHON THREADS ARE GREAT AT DOING NOTHING David Beazley CONCURRENCY
IN PYTHON

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES def copy(obj, bucket, key):
result = obj.copy_from(CopySource={'Bucket': bucket, 'Key': key}) return (result['ResponseMetadata']['HTTPStatusCode']) with futures.ThreadPoolExecutor(max_workers=100) as executor: todo = [] for src_object in src_objects: dest_obj = s3.Object(dest_bucket,dest_key) future = executor.submit(copy, dest_obj, src_object.bucket, src_object.key) todo.append(future) results = [] for future in futures.as_completed(todo): res = future.result() results.append(res) print(len(results))

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES: RESULT Execution Time =
1 minute 40 seconds

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES def copy(obj, bucket, key):
result = obj.copy_from(CopySource={'Bucket': bucket, 'Key': key}) return (result['ResponseMetadata']['HTTPStatusCode']) def task(prefix, src_bucket, src_objects, dest_bucket): with futures.ThreadPoolExecutor(max_workers=100) as executor: for src_object in src_objects: dest_obj = s3.Object(dest_bucket,dest_key) future = executor.submit(copy, dest_obj, src_object.bucket, src_object.key) with futures.ThreadPoolExecutor(max_workers=31) as task_executor: while date != datetime(2016, 6, 31): date = date + relativedelta(days=1) prefix = date.strftime("%Y-%m-%d") future = task_executor.submit(task, prefix, src_bucket, src_objects,dest_bucket) tasks.append(future)

CONCURRENCY IN PYTHON BEWARE BOTO ▸ https://github.com/boto/botocore/issues/766 “Support for customizing
the max connections and max pools”

CONCURRENCY ON AWS LAMBDA: BEFORE THE MEETUP ▸ Single function
with multithreaded code. ▸ Timeouts and higher resource consumption. ▸ Moved the code out of lambda to an ec2 instance.

CONCURRENCY ON AWS LAMBDA: AFTER THE MEETUP ▸ Suggestion from
the meetup: invoke many lambdas. ▸ Sounded costly.

the meetup: invoke many lambdas. ▸ Sounded costly. ▸ …but isn’t !

the meetup: invoke many lambdas. ▸ Sounded costly. ▸ …but isn’t ! ▸ (can be if you are not careful)

CONCURRENCY ON AWS LAMBDA AND SNS ▸ Single lambda function
publishes messages to SNS. The payload for each message contains an s3 object’s attributes. Another lambda function subscribed to the SNS topic executes s3 copy api call for each object.

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS …WITH THREADS !

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS …WITH THREADS !
def publish_sns(key): client.publish( TopicArn='arn:aws:sns:us-east-1:123456789:s3copy', Message=key, Subject=key ) return response def lambda_handler(*args): for obj in src_objects: with futures.ThreadPoolExecutor(max_workers=10) as executor: future = executor.submit(publish_sns,obj.key)

CONCURRENCY ON AWS LAMBDA: CONSUME FROM SNS, S3 COPY def
lambda_handler(event, context): obj.copy_from( CopySource={ 'Bucket': ‘my-bucket’, ‘Key': event['Records'][0]['Sns'] ['Message']})

CONCURRENCY ON AWS LAMBDA: CONCURRENCY ▸ An invocation of the
lambda function as the unit of concurrency. ▸ For event sources that are not stream-based: concurrency = events per second * function duration ▸ concurrency = 20 * 30 = 600 ▸ By default, 100 concurrent executions is the safety limit, invocations after that are throttled. Can be increased on request. ▸ Retries on errors

lambda function as the unit of concurrency. ▸ For event sources that are not stream-based: concurrency = events per second * function duration ▸ concurrency = 20 * 30 = 600 ▸ By default, 100 concurrent executions is the safety limit, invocations after that are throttled. Can be increased on request. ▸ Retries on errors LITTLE’S LAW

lambda function as the unit of concurrency. ▸ For event sources that are not stream-based: concurrency = events per second * function duration ▸ concurrency = 20 * 30 = 600 ▸ By default, 100 concurrent executions is the safety limit, invocations after that are throttled. Can be increased on request. ▸ Retries on errors IF YOUR DATA DONT FIT LL, CHANGE YOUR DATA ! - NEIL GUNTHER

CONCURRENCY ON AWS LAMBDA: CONCURRENCY CALCULATING EVENTS RATE def publish_sns(client,
dest_key): global reqs t0 = pc() client.publish( TopicArn='arn:aws:sns:us-east-1:562810932035:s3copy', Message=dest_key, Subject=dest_key ) t1 = pc() - t0 ptime += t1 reqs += 1 return response def metrics(): global reqs if (reqs != 0): reqs_per_sec.append(reqs) times.append(ptime) if reqs == 251: sys.exit() threading.Timer(2, metrics).start() metrics() # throughput = reqs_per_sec[n] - reqs_per_sec[n-1] / 2) # [5, 20, 60, 100] -> ( 60 - 20 ) / 2 = 20 reqs/sec # Confirm with LL with response times using N = XR

CONCURRENCY ON AWS LAMBDA: USEFUL METRICS ▸ Invocations ▸ Alert
if invocations < number of s3 objects to copy ▸ Throttles ▸ Not a problem for simple jobs without a time constraint ▸ Duration ▸ Errors ▸ Also useful for monitoring

CONCURRENCY ON AWS LAMBDA: RESULT ▸ Execution time = max(Last
Modiﬁed Time) - min(Last Modiﬁed Time) 2 minutes 40 seconds

REFERENCES REFERENCES

REFERENCES REFERENCES ▸ http://www.dabeaz.com/GIL/ ▸ http://perfdynamics.blogspot.de/search/label/Little%27s %20law

QUESTIONS ? MOHIT CHAWLA SYSTEMS ENGINEER, SMAATO INC. https://alcy.github.io

Exploring Concurrency in Python & AWS

Exploring Concurrency in Python & AWS

alcy

Other Decks in Programming

Featured

Transcript

FROM THREADS TO LAMBDAS EXPLORING CONCURRENCY IN PYTHON AND AWS

FROM THREADS TO LAMBDAS AND LAMBDAS WITH THREADS EXPLORING CONCURRENCY

BACKGROUND PROBLEM: INTRA-S3 BACKUPS ▸ Daily backup of 250 objects,

FROM A FOR LOOP TO THREADS BEFORE THE THREADS PASS

FROM A FOR LOOP TO THREADS BEFORE THE THREADS PASS

CONCURRENCY IN PYTHON CONCURRENCY IN PYTHON ▸ asyncio ▸ Event

CONCURRENCY IN PYTHON CONCURRENT.FUTURES - USES AND LIMITATIONS ▸ ProcessPoolExecutor

CONCURRENCY IN PYTHON GIL ▸ http://www.dabeaz.com/python/UnderstandingGIL.pdf

PYTHON THREADS ARE GREAT AT DOING NOTHING David Beazley CONCURRENCY

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES def copy(obj, bucket, key):

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES: RESULT Execution Time =

CONCURRENCY IN PYTHON PASS #2 CONCURRENT.FUTURES def copy(obj, bucket, key):

CONCURRENCY IN PYTHON BEWARE BOTO ▸ https://github.com/boto/botocore/issues/766 “Support for customizing

CONCURRENCY ON AWS LAMBDA: BEFORE THE MEETUP ▸ Single function

CONCURRENCY ON AWS LAMBDA: AFTER THE MEETUP ▸ Suggestion from

CONCURRENCY ON AWS LAMBDA: AFTER THE MEETUP ▸ Suggestion from

CONCURRENCY ON AWS LAMBDA: AFTER THE MEETUP ▸ Suggestion from

CONCURRENCY ON AWS LAMBDA AND SNS ▸ Single lambda function

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS …WITH THREADS !

CONCURRENCY ON AWS LAMBDA: PUBLISH TO SNS …WITH THREADS !

CONCURRENCY ON AWS LAMBDA: CONSUME FROM SNS, S3 COPY def

CONCURRENCY ON AWS LAMBDA: CONCURRENCY ▸ An invocation of the

CONCURRENCY ON AWS LAMBDA: CONCURRENCY ▸ An invocation of the

CONCURRENCY ON AWS LAMBDA: CONCURRENCY ▸ An invocation of the

CONCURRENCY ON AWS LAMBDA: CONCURRENCY CALCULATING EVENTS RATE def publish_sns(client,

CONCURRENCY ON AWS LAMBDA: USEFUL METRICS ▸ Invocations ▸ Alert

CONCURRENCY ON AWS LAMBDA: RESULT ▸ Execution time = max(Last

REFERENCES REFERENCES

REFERENCES REFERENCES ▸ http://www.dabeaz.com/GIL/ ▸ http://perfdynamics.blogspot.de/search/label/Little%27s %20law

QUESTIONS ? MOHIT CHAWLA SYSTEMS ENGINEER, SMAATO INC. https://alcy.github.io