Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Step into the Future with AWS Step Functions

Step into the Future with AWS Step Functions

Abstract: In this hands on training, you will learn how to extract, transform and load data from an API into (no longer parquet files) csv files in S3 using lambdas run in parallel with step functions. This is possibly eventually a performant and cost effective solution to extract and store your data for analysis at scale.

Dana Engebretson

August 24, 2017
Tweet

More Decks by Dana Engebretson

Other Decks in Technology

Transcript

  1. Intro Definitions Implementation Resources @bigdana Step into the Future with

    AWS Step Functions Dana Engebretson Performance Engineer SPS Commerce
  2. Intro Definitions Implementation Resources @bigdana Step into the Future with

    AWS Step Functions Abstract: In this hands on training, you will learn how to extract, transform and load data from an API into parquet files in S3 using lambdas run in parallel with step functions. This is a performant and cost effective solution to extract and store your data for analysis at scale.
  3. Intro Definitions Implementation Resources @bigdana Step into the Future with

    AWS Step Functions Abstract: In this hands on training, you will learn how to extract, transform and load data from an API into parquet files csv files in S3 using lambdas run in parallel with step functions. This is possibly eventually a performant and cost effective solution to extract and store your data for analysis at scale.
  4. Intro Definitions Implementation Resources @bigdana Intended Outcomes: • Learn about

    the new serverless paradigm and how it may be used for your work • Familiarize yourself with some Amazon Web Services • Be prepared for some gotchyas and understand some pros and cons of using a serverless architectural approach
  5. Intro Definitions Implementation Resources @bigdana An Example Problem • You

    want to collect data from an api once a day • You don’t want to manage your own server to run it
  6. Intro Definitions Implementation Resources @bigdana Let’s Break Down the Problem

    1. Call an API 2. Process the Data 3. Write the Data somewhere 4. Invoke this Process
  7. Intro Definitions Implementation Resources @bigdana copy your .py file into

    your build folder and zip the contents (the contents! Not the folder itself)
  8. Intro Definitions Implementation Resources @bigdana Write Data: including fastparquet Race

    Condition with pip install Attempted Solution: install with conda
  9. Intro Definitions Implementation Resources @bigdana Write Data: including pandas pip

    install from Mac doesn’t include numpy properly Solution: Use Docker and install from a linux machine Or use this resource: https://nervous.io/python/aws/lambda/2016/02/17/ scipy-pandas-lambda/
  10. Intro Definitions Implementation Resources @bigdana Write Data: writing to S3

    Your lambda needs explicit permission to write to s3
  11. Intro Definitions Implementation Resources @bigdana Write Data: writing to S3

    Solution: Create a new IAM lambda role and give it a policy that can read from and write to s3
  12. Intro Definitions Implementation Resources @bigdana Write Data: writing to S3

    In order to write a csv file, you need to first save it locally. You can only write to the /tmp/ inside your lambda Solution: write to /tmp/ J
  13. Intro Definitions Implementation Resources @bigdana Step Functions can run lambdas

    in “parallel” You can run process A and B in parallel Can you run a process dynamically in parallel? Not yet Start Get Devices Get Device Data Get Device Data Get Device Data End
  14. Intro Definitions Implementation Resources @bigdana Pros and Cons No Comment

    Maybe, eventually Yes Sure Yes Maybe No comment From: https://devops.com/go-serverless-pros-cons/
  15. Intro Definitions Implementation Resources @bigdana Conclusion Pros (AS FAR AS

    I UNDERSTAND RIGHT NOW): cheap for simple processes using the python standard library that aren’t running 24/7 or that might need to scale Cons: some kinks need to be worked out for it to be useful for data processing
  16. Intro Definitions Implementation Resources @bigdana • hello-step-functions repo to play

    with: https://github.com/danasaur/hello-step-functions • These slides: https://speakerdeck.com/bigdana/step-into- the-future-with-aws-step-functions
  17. Intro Definitions Implementation Resources @bigdana • Serverless Workflow Management with

    AWS Step Functions on Udemy: https://www.udemy.com/aws-step- functions/learn/v4/overview • Structuring Serverless Applications with Python: http://blog.brianz.bz/post/structuring-serverless- applications-with-python/ • Python Data Deployment on AWS Lambda: https://nervous.io/python/aws/lambda/2016/02/17/scipy- pandas-lambda/ • Discussion Forum: Dynamic number of Parallel tasks: https://forums.aws.amazon.com/thread.jspa?threadID=244 196