Analyzing data at any scale with AWS Lambda

Slide 1

Slide 1 text

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Event-driven metadata extraction and enrichment Monte Carlo method, ensemble forecasting, ensemble learning Using Amazon Elastic File System (Amazon EFS) to host Lambda function dependencies Machine learning inference examples Advanced use cases and demos Agenda

Slide 3

Slide 3 text

Event-driven metadata extraction + enrichment

Slide 4

Slide 4 text

Physical sciences and engineering Computational physics, physical chemistry, aerodynamics, fluid dynamics, wireless networks Finance and business Evaluate the risk and uncertainty that would affect the outcome of different decision options Mathematics For example, execute 10 / 100 / 1,000 Lambda functions, each performing one or more random sampling What’s the area of the “cloud” shape? How many hits and misses? The number of hits divided by the total number of runs tends to the same ratio as the two areas Monte Carlo methods R E P E A T E D R A N D O M S A M P L I N G A N D S T A T I S T I C A L A N A L Y S I S

Slide 5

Slide 5 text

This set of forecasts aims to give a better indication of the range of possible future states Useful when there is uncertainty/error in the input parameters, or a system is highly sensitive to initial conditions – such as for chaotic dynamical systems, like weather forecast For example, execute 10 / 100 / 1,000 Lambda functions, each with slightly different parameters in input, and then compare the results If 56 out of 100 weather forecast simulations expect rain, then, “There is a 56% chance of rain” Ensemble forecasting I N S T E A D O F M A K I N G A S I N G L E F O R E C A S T O F T H E M O S T L I K E L Y S C E N A R I O , A S E T ( O R E N S E M B L E ) O F F O R E C A S T S I S P R O D U C E D Input Output

Slide 6

Slide 6 text

For example, run 10 / 100 / 1,000 Lambda functions training different (and relatively simple) machine learning algorithms To run inference, you can combine all or a subset of the results Ensemble learning U S I N G M U L T I P L E L E A R N I N G A L G O R I T H M S T O O B T A I N B E T T E R P R E D I C T I V E P E R F O R M A N C E T H A N W H A T C O U L D B E O B T A I N E D F R O M A N Y O F T H E C O N S T I T U E N T L E A R N I N G A L G O R I T H M S A L O N E Models Dataset Training 0 1 0 0 0 0 Predictions Ensemble’s Prediction

Slide 7

Slide 7 text

For example, for Python 3.8 • NumPy 1.19.0 • SciPy 1.5.1 AWS Layer for Python: NumPy and SciPy

Slide 8

Slide 8 text

• Extends the reach of Lambda to new uses cases • Machine learning training § CPU-based, 15 minutes time limit § Improving results with ensemble forecasts or ensemble learning • Machine learning inference § Using CPUs § Evaluate your latency requirements (sync vs async invocations) § Use Provisioned Concurrency to avoid cold starts • You can use an Amazon Elastic Compute Cloud (Amazon EC2) instance to install dependencies and copy them to an Amazon EFS file system § Some tools created by the AWS community are great! For example – https://github.com/lambci/cmda Using Amazon EFS for hosting dependencies

Slide 9

Slide 9 text

• Using PyTorch Machine learning inference (with Amazon EFS) {"bird_class": "106.Horned_Puffin"} {"bird_class": "053.Western_Grebe"} https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/

Slide 10

Slide 10 text

https://aws.amazon.com/blogs/compute/building-deep-learning-inference-with-aws-lambda-and-amazon-efs/ • Using TensorFlow Machine learning inference (with Amazon EFS)

Slide 11

Slide 11 text

https://aws.amazon.com/blogs/compute/pay-as-you-go-machine-learning-inference-with-aws-lambda/ • Using XGBoost Machine learning inference (with Amazon EFS)

Slide 12

Slide 12 text

• Add the following dependencies to your requirements.txt numpy scipy scikit-learn panda matplotlib ipython jupyter jupyterlab papermill Using data science dependencies in Python

Slide 13

Slide 13 text

• Mount Amazon EFS at instance launch, by default it will be on /mnt/efs/fs1/ • Then, you can use these commands (on Amazon Linux 2): § sudo yum update –y # reboot if kernel is updated § sudo mkdir –p /mnt/efs/fs1/DataScience/lib § sudo chown -R ec2-user:ec2-user /mnt/efs/fs1/DataScience § sudo amazon-linux-extras install python3.8 § pip3.8 install --user wheel § pip3.8 install -t /mnt/efs/fs1/DataScience/lib -r requirements.txt Using data science dependencies in Python

Slide 14

Slide 14 text

• Create Amazon EFS access point for /DataScience § On Amazon Linux, you can use ec2-user’s UID and GID • Connect Lambda function to the Amazon Virtual Private Cloud (Amazon VPC) § To connect to other AWS services, such as Amazon S3 or Amazon DynamoDB, use Amazon VPC Endpoints § To reach the public internet, use private subnets + NAT Gateway • Add file system to function using Amazon EFS access point § For example, mount under /mnt/DataScience • Configure function environment to use dependencies § For example in Python set PYTHONPATH = /mnt/DataScience/lib Using data science dependencies in Python

Slide 15

Slide 15 text

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

from sklearn.ensemble import RandomForestClassifier import json def lambda_handler(event, context): clf = RandomForestClassifier(random_state=0) X = [[ 1, 2, 3], [11, 12, 13]] # 2 samples, 3 features y = [0, 1] # classes of each sample clf.fit(X, y) # fitting the classifier A = [[4, 5, 6], [14, 15, 16], [3, 2, 1], [17, 15, 13]] result = { 'type': 'RandomForestClassifier', 'predict({})'.format(X): '{}'.format(clf.predict(X)), 'predict({})'.format(A): '{}'.format(clf.predict(A)) } return { 'statusCode': 200, 'body': json.dumps(result) } Random Forest Classifier U S I N G S C I K I T - L E A R N F O R C L A S S I F I C A T I O N Fitting the classifier Getting predictions Returned by Amazon API Gateway

Slide 18

Slide 18 text

from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression . . . import json pipe = make_pipeline( # create a pipeline object StandardScaler(), LogisticRegression(random_state=0) ) X, y = load_iris(return_X_y=True) # load the iris dataset X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) # split train / test sets pipe.fit(X_train, y_train) # fit the whole pipeline def lambda_handler(event, context): result = { 'accuracy_score': accuracy_score(pipe.predict(X_test), y_test) } return { 'statusCode': 200, 'body': json.dumps(result) } Logistic Regression U S I N G S C I K I T - L E A R N F O R R E G R E S S I O N U S I N G A P I P E L I N E Training Accuracy returned by Amazon API Gateway Pipeline

Slide 19

Slide 19 text

import matplotlib import matplotlib.pyplot as plt . . . plt.figure(figsize=(len(anomaly_algorithms) * 2 + 3, 12.5)) plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01) . . . img_data = io.BytesIO() # You can’t just do plt.show() plt.savefig(img_data, format='png') img_data.seek(0) s3 = boto3.resource('s3') bucket = s3.Bucket(OUTPUT_BUCKET) bucket.put_object(Body=img_data, ContentType='image/png', Key=OUTPUT_KEY, ACL='public-read') image_url = 'https://{}.s3.amazonaws.com/{}'.format(OUTPUT_BUCKET, OUTPUT_KEY) def lambda_handler(event, context): return { 'statusCode': 302, # 301 would be permanent and cached 'headers': { 'Location': image_url } } Matplotlib Image on S3 U S I N G M A T P L O T L I B W I T H A M A Z O N S 3 A N D A M A Z O N A P I G A T E W A Y Uploading to an Amazon S3 bucket HTTP redirect to the Amazon S3 object

Slide 20

Slide 20 text

Slide 21

Slide 21 text

• Use cases § A periodic report to execute with different parameters depending on time/date § Run a notebook and then automate actions based on the result § Run a notebook as part of a workflow • For more info § https://papermill.readthedocs.io/ § https://github.com/nteract/papermill Running Jupyter notebooks using Papermill A T O O L F O R P A R A M E T E R I Z I N G A N D E X E C U T I N G J U P Y T E R N O T E B O O K S

Slide 22

Slide 22 text

Running Jupyter notebooks using Papermill E V E N T - D R I V E N J U P Y T E R N O T E B O O K S O N A M A Z O N S 3 Use Amazon S3 user-defined metadata for parameters Upload Jupyter notebook to Amazon S3 S3 user-defined metadata is limited to 2 KB in size

Slide 23

Slide 23 text

A M A Z O N S 3 U R L S Y N T A X I S S U P P O R T E D B Y P A P E R M I L L T O R E A D A N D W R I T E N O T E B O O K S import papermill as pm . . . sys.path.append("/opt/bin") sys.path.append("/opt/python") os.environ["IPYTHONDIR"]='/tmp/ipythondir' . . . input_notebook = 's3://{}/{}'.format(bucket, key) output_notebook = 's3://{}/{}'.format(OUTPUT_BUCKET, key) . . . pm.execute_notebook( input_notebook, output_notebook, parameters = parameters ) Using Papermill with Amazon S3 Executing Jupyter notebook Using S3 URLs for input and output To run Python inside Jupyter

Slide 24

Slide 24 text

Slide 25

Slide 25 text

• Use event-driven architectures for metadata extraction, enrichment, and indexing • For more advanced use cases, manage dependencies using an Amazon EFS file system to use tools like Scikit-learn and Matplotlib • Simplify machine learning inference with frameworks such as PyTorch and TensorFlow running in Lambda functions • Automate Jupyter notebooks execution using Papermill Takeaways