Analyzing data at any scale with AWS Lambda Danilo Poccia Chief Evangelist (EMEA) AWS

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Event-driven metadata extraction and enrichment Monte Carlo method, ensemble forecasting, ensemble learning Using Amazon Elastic File System (Amazon EFS) to host Lambda function dependencies Machine learning inference examples Advanced use cases and demos Agenda

Event-driven metadata extraction + enrichment

Physical sciences and engineering Computational physics, physical chemistry, aerodynamics, fluid dynamics, wireless networks Finance and business Evaluate the risk and uncertainty that would affect the outcome of different decision options Mathematics For example, execute 10 / 100 / 1,000 Lambda functions, each performing one or more random sampling What’s the area of the “cloud” shape? How many hits and misses? The number of hits divided by the total number of runs tends to the same ratio as the two areas Monte Carlo methods R E P E A T E D R A N D O M S A M P L I N G A N D S T A T I S T I C A L A N A L Y S I S

This set of forecasts aims to give a better indication of the range of possible future states Useful when there is uncertainty/error in the input parameters, or a system is highly sensitive to initial conditions – such as for chaotic dynamical systems, like weather forecast For example, execute 10 / 100 / 1,000 Lambda functions, each with slightly different parameters in input, and then compare the results If 56 out of 100 weather forecast simulations expect rain, then, “There is a 56% chance of rain” Ensemble forecasting I N S T E A D O F M A K I N G A S I N G L E F O R E C A S T O F T H E M O S T L I K E L Y S C E N A R I O , A S E T ( O R E N S E M B L E ) O F F O R E C A S T S I S P R O D U C E D Input Output

For example, run 10 / 100 / 1,000 Lambda functions training different (and relatively simple) machine learning algorithms To run inference, you can combine all or a subset of the results Ensemble learning U S I N G M U L T I P L E L E A R N I N G A L G O R I T H M S T O O B T A I N B E T T E R P R E D I C T I V E P E R F O R M A N C E T H A N W H A T C O U L D B E O B T A I N E D F R O M A N Y O F T H E C O N S T I T U E N T L E A R N I N G A L G O R I T H M S A L O N E Models Dataset Training 0 1 0 0 0 0 Predictions Ensemble’s Prediction

For example, for Python 3.8 • NumPy 1.19.0 • SciPy 1.5.1 AWS Layer for Python: NumPy and SciPy

• Extends the reach of Lambda to new uses cases • Machine learning training § CPU-based, 15 minutes time limit § Improving results with ensemble forecasts or ensemble learning • Machine learning inference § Using CPUs § Evaluate your latency requirements (sync vs async invocations) § Use Provisioned Concurrency to avoid cold starts • You can use an Amazon Elastic Compute Cloud (Amazon EC2) instance to install dependencies and copy them to an Amazon EFS file system § Some tools created by the AWS community are great! For example – Using Amazon EFS for hosting dependencies

• Using PyTorch Machine learning inference (with Amazon EFS) {"bird_class": "106.Horned_Puffin"} {"bird_class": "053.Western_Grebe"}

Slide 10 text • Using TensorFlow Machine learning inference (with Amazon EFS)

Slide 11 text • Using XGBoost Machine learning inference (with Amazon EFS)

• Add the following dependencies to your requirements.txt numpy scipy scikit-learn panda matplotlib ipython jupyter jupyterlab papermill Using data science dependencies in Python

• Mount Amazon EFS at instance launch, by default it will be on /mnt/efs/fs1/ • Then, you can use these commands (on Amazon Linux 2): § sudo yum update –y # reboot if kernel is updated § sudo mkdir –p /mnt/efs/fs1/DataScience/lib § sudo chown -R ec2-user:ec2-user /mnt/efs/fs1/DataScience § sudo amazon-linux-extras install python3.8 § pip3.8 install --user wheel § pip3.8 install -t /mnt/efs/fs1/DataScience/lib -r requirements.txt Using data science dependencies in Python

• Create Amazon EFS access point for /DataScience § On Amazon Linux, you can use ec2-user’s UID and GID • Connect Lambda function to the Amazon Virtual Private Cloud (Amazon VPC) § To connect to other AWS services, such as Amazon S3 or Amazon DynamoDB, use Amazon VPC Endpoints § To reach the public internet, use private subnets + NAT Gateway • Add file system to function using Amazon EFS access point § For example, mount under /mnt/DataScience • Configure function environment to use dependencies § For example in Python set PYTHONPATH = /mnt/DataScience/lib Using data science dependencies in Python



from sklearn.ensemble import RandomForestClassifier import json def lambda_handler(event, context): clf = RandomForestClassifier(random_state=0) X = [[ 1, 2, 3], [11, 12, 13]] # 2 samples, 3 features y = [0, 1] # classes of each sample, y) # fitting the classifier A = [[4, 5, 6], [14, 15, 16], [3, 2, 1], [17, 15, 13]] result = { 'type': 'RandomForestClassifier', 'predict({})'.format(X): '{}'.format(clf.predict(X)), 'predict({})'.format(A): '{}'.format(clf.predict(A)) } return { 'statusCode': 200, 'body': json.dumps(result) } Random Forest Classifier U S I N G S C I K I T - L E A R N F O R C L A S S I F I C A T I O N Fitting the classifier Getting predictions Returned by Amazon API Gateway

from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression . . . import json pipe = make_pipeline( # create a pipeline object StandardScaler(), LogisticRegression(random_state=0) ) X, y = load_iris(return_X_y=True) # load the iris dataset X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) # split train / test sets, y_train) # fit the whole pipeline def lambda_handler(event, context): result = { 'accuracy_score': accuracy_score(pipe.predict(X_test), y_test) } return { 'statusCode': 200, 'body': json.dumps(result) } Logistic Regression U S I N G S C I K I T - L E A R N F O R R E G R E S S I O N U S I N G A P I P E L I N E Training Accuracy returned by Amazon API Gateway Pipeline

import matplotlib import matplotlib.pyplot as plt . . . plt.figure(figsize=(len(anomaly_algorithms) * 2 + 3, 12.5)) plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01) . . . img_data = io.BytesIO() # You can’t just do plt.savefig(img_data, format='png') s3 = boto3.resource('s3') bucket = s3.Bucket(OUTPUT_BUCKET) bucket.put_object(Body=img_data, ContentType='image/png', Key=OUTPUT_KEY, ACL='public-read') image_url = 'https://{}{}'.format(OUTPUT_BUCKET, OUTPUT_KEY) def lambda_handler(event, context): return { 'statusCode': 302, # 301 would be permanent and cached 'headers': { 'Location': image_url } } Matplotlib Image on S3 U S I N G M A T P L O T L I B W I T H A M A Z O N S 3 A N D A M A Z O N A P I G A T E W A Y Uploading to an Amazon S3 bucket HTTP redirect to the Amazon S3 object



• Use cases § A periodic report to execute with different parameters depending on time/date § Run a notebook and then automate actions based on the result § Run a notebook as part of a workflow • For more info § § Running Jupyter notebooks using Papermill A T O O L F O R P A R A M E T E R I Z I N G A N D E X E C U T I N G J U P Y T E R N O T E B O O K S

Running Jupyter notebooks using Papermill E V E N T - D R I V E N J U P Y T E R N O T E B O O K S O N A M A Z O N S 3 Use Amazon S3 user-defined metadata for parameters Upload Jupyter notebook to Amazon S3 S3 user-defined metadata is limited to 2 KB in size

A M A Z O N S 3 U R L S Y N T A X I S S U P P O R T E D B Y P A P E R M I L L T O R E A D A N D W R I T E N O T E B O O K S import papermill as pm . . . sys.path.append("/opt/bin") sys.path.append("/opt/python") os.environ["IPYTHONDIR"]='/tmp/ipythondir' . . . input_notebook = 's3://{}/{}'.format(bucket, key) output_notebook = 's3://{}/{}'.format(OUTPUT_BUCKET, key) . . . pm.execute_notebook( input_notebook, output_notebook, parameters = parameters ) Using Papermill with Amazon S3 Executing Jupyter notebook Using S3 URLs for input and output To run Python inside Jupyter

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Takeaways

• Use event-driven architectures for metadata extraction, enrichment, and indexing • For more advanced use cases, manage dependencies using an Amazon EFS file system to use tools like Scikit-learn and Matplotlib • Simplify machine learning inference with frameworks such as PyTorch and TensorFlow running in Lambda functions • Automate Jupyter notebooks execution using Papermill Takeaways

@danilop