Introduction to AWS Lambda with Python

Introduction to AWS Lambda with Python

Cloud computing is a very elegant paradigm, providing high-level tools for distributed systems.

In particular, one can easily provision and customize virtual machines according to their computing needs.

However, is there a way to execute code in the cloud without spending time and effort on system administration?

This presentation introduces Lambda – the AWS service dedicated to serverless computing.

84cfa5aa96405be9af4874ba266785af?s=128

Gianluca Costa

December 27, 2017
Tweet

Transcript

  1. Gianluca Costa Introduction to AWS Lambda AWS Lambda with Python

    http://gianlucacosta.info/ http://gianlucacosta.info/
  2. Introduction • Cloud computing is a very elegant paradigm, providing

    high-level tools for distributed systems • In particular, one can easily provision and customize virtual machines according to their computing needs • However, is there a way to execute code in the cloud without spending time and effort on system administration? • This presentation introduces Lambda – the AWS service dedicated to serverless computing – and was inspired by the material listed in the bibliography
  3. Functional Programming (FP) • «Functional programming is a programming paradigm

    - a style of building the structure and elements of computer programs - that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.» - Wikipedia • There are different points of view on FP - expressed by various languages such as Haskell, Elm, Scala, … → but the very idea of the paradigm is the concept of function as core computational unit
  4. Common benefits of (purely) Functional Programming • Far fewer buggy

    side-effects, that are so common when using mutable data structures • Cleaner, usually shorter code • Declarative approach to problems • Mathematical point of view • Fairly simple and easily extensible language grammars • In many ways compatible with traditional OOP in hybrid approaches – as proven by Scala; modern Java can support some functional style as well
  5. Traditional EC2 computing Virtual Machine (EC2) Servers Polling on other

    EC2 instances or AWS services In this scenario, the VM periodically checks whether a given condition has become true Long-running tasks
  6. EC2 is not for polling • Polling should generally be

    avoided – as it’s extremely inefficient: – The CPU and I/O resources are kept busy, waiting for a condition that might never get true – In case of sleeping instructions in polling cycles, the condition might be detected far later than required • Pricing for EC2 makes such waste not acceptable →Companies used to delegate polling to just one EC2 instance - which led to further problems, for example: – The instance had too many security permissions – In case of instance crash, all the polling activities crashed, too – Stopping the instance for maintenance was not feasible
  7. Events • To prevent polling, AWS has introduced events •

    Events are raised whenever a condition occurs: – Something changes in the infrastructure →for example, the status of an EC2 instance, or the file system in an S3 bucket – At fixed instants or after recurring intervals, à la Crontab
  8. AWS event sources AWS event sources S3 DynamoDB CloudWatch SNS

    Kinesis Lambda dashboard ... EC2 API Gateway
  9. AWS Lambda Lambda = Functional Programming Execution platform fully managed

    by AWS + • In AWS Lambda, you create functions, whose invocation type can be: • Synchronous, if the caller blocks waiting for the result • Asynchronous, if the caller asks to run the function and goes on, forgetting about the call • A function can run: ─ in response to events → in this case, the invocation type depends on the specific event – you can’t control it • on demand – when programmatically invoked by your code – and you can choose the invocation type + Events
  10. No maintenance, only code • The greatest benefit of Lambda

    is that you just have to write your functions and upload them • Serverless programming→AWS will take care of the underlying computing resources required to run your code: except fairly rare situations, you don’t even need to know anything about the infrastructure actually executing your code • Lambda also ensures High Availability (HA), so you don’t need to worry for single points of failures related to this service
  11. Naming conventions • From now on, we’ll be using the

    following symbols: – λ → to identify the AWS Lambda service – λf → means any project created for AWS Lambda. It can consist of one or more source files, but it’s actually handled as a single function – λf’s → one or more projects in AWS Lambda (generic plural)
  12. One service, many runtimes • Every single λf can currently

    target one of the following runtimes: – Python 3.x or 2.x – Node.js 6.x or 4.x – Java 8 – C# - via .NET Core • The programming model is slightly different from one runtime to the others →for example, strictly OOP runtimes like Java require your functions to be declared within classes • Apart from that, most concepts and ideas are shared
  13. EC2 vs λ EC2 IaaS Your VMs run until you

    stop them You can choose the technical specs for the VM – such as processor, memory and even GPU optimizations You can choose the OS via the AMI You can configure the VM via SSH You can install new software on the VM You can open ports on the VM You must deploy your artifacts for the specific services you host Monitoring, scalability, HA are up to you You must constantly update the system You must constantly enforce security λ PaaS λf’s run triggered by events or when invoked You can choose only the runtime for your λf’s and a few hardware params You should know very little of the underlying environment and you can neither access nor customize it You cannot open ports AWS takes care of every aspect related to system administration and security
  14. EC2 vs λ – Further comparison EC2 You can run

    almost anything on it Max memory can be thousands of GB Runs as long as you wish Supports IAM roles Has a local, temporary file system EBS and EFS provide further storage Network calls allowed External processes can be run Logging to CloudWatch requires SDK Can satisfy most security policies No free tier after the 1° year λ Only a few runtimes are supported. But you can use AWS SDK and 3rd-party libraries that are non-native or based on provided .so libs Memory for a λf is at most a few GB Any λf can run for a limited time Supports IAM roles Has a local, temporary directory: /tmp EBS and EFS not supported Network calls allowed External processes can be run Logging to CloudWatch is the default Might not satisfy very strict security policies (e.g., in terms of IDS, access logs, ...) Permanent free tier quota
  15. Deployment strategies λf source code λ dashboard λ service Online

    editor Compiler Zip file including all dependencies except AWS SDK AWS CLI Upload form S3 Build tools (Gradle, Jenkins, CloudFormation, ...)
  16. Deployment strategies - Explained • Each λf must be self-contained:

    – If it is very simple, not relying on external libraries, and based on a dynamic runtime (Python, Node.js, ...), it can be edited online, within the λ dashboard – For complex projects, you have to upload a zip file, which must include all the dependencies (except the AWS SDK) and whose specific directory layout depends on the chosen runtime • Working offline is not a bad idea, as you can use your traditional IDEs and tools • For more elaborated artifacts – which are fairly common in Java – having a dedicated plugin for your own build tool can save even more time • Please, consult λ documentation for updated information about bundling projects for your selected runtime
  17. IAM Roles • λf’s may require permissions – in particular,

    to access other AWS resources like S3 buckets • λ relies on the standard system of IAM roles – which can grant and require permissions with no need for credentials • When creating a λf, you can assign it an existing IAM role, or you can create a new one from a template →If you choose no template, your λf will still have permissions on CloudWatch – which is paramount for logging
  18. Further IAM considerations • λ supports cross-account invocation, provided that

    appropriate permissions have been granted • The resources accessed by a λf must reside in its same region →The Lambda@Edge project can still integrate with CloudFront, to support global event handling
  19. Why Python? • Python is a very-high-level and elegant language,

    featuring a rich standard library • It has a huge ecosystem, with excellent libraries in almost any modern domain • It is dynamic, enabling fast prototyping • The source code and the bytecode are usually compact and fast to load • Startup time and memory requirements of its virtual machine are quite low • Python shines at acting as a glue layer between very different contexts and at short, focused tasks • λ supports in-browser editing of simple Python projects
  20. AWS SDK for Python • It is called Boto 3

    and its documentation can be found at the following Internet address: https://boto3.readthedocs.io/en/latest/ • You can use Boto in your λf’s: – you’ll need to install it – for example, by using pip – for offline programming – you should not deploy it in a λ bundle, as it is a dependency already provided by AWS
  21. First λf in Python 1.In the AWS Console, click on

    Lambda to open the Lambda Dashboard 2.Click on Create function 3.Leave the default selection - Author from scratch 4.Setup these settings: 1.Name: helloPython (or whatever you prefer) 2.Runtime: Python 3.x 3.Role: Create new role from template(s) 4.Role name: myLambdaRole (or whatever you prefer) 5.Policy templates: empty
  22. Editing the λf online • You’ll notice that λ provides

    a simple but very effective online editor, supporting: – Syntax highlighting – Project tree – Multiple document interface, via tabs • Above the editor, the Handler field includes the fully-qualified name of the actual function to run when executing this λf: in the case of Python, it is: →<module name>.<function name> • Your code can have multiple public/exported functions, but just one can be the handler of its λf
  23. Structure of a λf handler • In Python, a λf

    handler is just a def function like this: def <function name>(event, context): # Code here: you can use this function just as a controller, # which executes code from all over the λf project return <result> # Returning something is optional • Its signature always includes 2 essential parameters: – event: Python dictionary (accessed via event[“<field name>”]) containing: • event information, in case of event handling • function parameters, in case of programmatic invocation – context →provides information about the execution environment • The return value can be anything: from basic values – converted to string - up to (nested) dictionaries – converted to JSON string
  24. λf context in detail Context in the Python runtime: for

    environment inspection Invocation details CloudWatch information Remaining milliseconds Memory limit Client context
  25. Testing the λf in the λ dashboard • Once a

    λf is in the dashboard – via upload or online editing – it can be invoked as much as you need via the dashboard GUI →Each test invocation is synchronous and passes data to the function via a test event • In particular, you need to press the Test button: if you have no test events defined, you’ll have to create one – via the usual JSON notation • Every test execution shows both the related CloudWatch log and the function result • You can have as many test events as you need for each λf • If you are using the online editor, you need to press the Save button in order to actually test the new code
  26. λf failure • There are situations when you just have

    to stop the execution of the λf and notify the caller of an error • How this happens depends on the runtime, because the λ programming model tries to adapt to the language chosen for the λf • In Python, you just need to raise any exception • How the exception is handled depends on the specific client
  27. Logging • Every λf creates a dedicated log group on

    CloudWatch →within that group, every runtime instance creates a log stream → Interleaved logs are therefore possible • λ automatically logs metrics related to each λf execution • In addition to this, whenever a λf writes to the standard output or employs logging facilities provided by the runtime (and customized by AWS), such output is sent into the log stream. • In Python 3, there are the following redirections to CloudWatch: – The print function – The logging module • Logging via λ does not introduce additional costs – but the ones already charged by CloudWatch do apply→CloudWatch has a permanent free tier, however it might be wise to reduce the retention period for λ log groups in dev/test environments
  28. Testing a λf offline • Any λf can be tested

    at the λ dashboard – but that is quite unpractical, as it must be uploaded • Consequently, there are dedicated libraries for testing a λf on the development PC, via traditional xUnit frameworks • The idea is creating test stubs of the 2 core objects on which every λf relies – an event and the context - and run all the code offline • Dynamic languages like Python make creating such test objects very simple
  29. λf versioning: qualifiers • At any moment, you can take

    an immutable snapshot of a λf →it is called a new version of the λf, and has: – An arbitrary description – A generated id • The $LATEST id refers to the only mutable version of a λf, which always consists in the current code of the λf – the code now ready to run on λ • In addition to versions, you can also define aliases: an alias is a tag associated with a version and having a meaningful Name • Versions and aliases are called Qualifiers
  30. Advanced λf aliases • An alias is like a version

    pointer, and it can also be changed so as to point to another version • This is especially useful to prevent code changes: provided that software components always reference only aliases, updating them simply requires switching the alias to the new λf version, right in λ’s dashboard • It is even possible to make an alias point 2 distinct versions, each with a % weight: this idea is especially useful to tentatively and gradually introduce new λf versions
  31. λ ARN • ARN = Amazon Resource Name →uniquely identifies

    a resource in AWS • Every AWS service has a dedicated schema • In the case of λ, the ARN of a λf follows this pattern: arn:aws:lambda:<region>:<account id>:function:<λf name>[:<version or alias>] • The ARN of any λf is shown in the λ dashboard, when you open the λf and even when you select a version or alias • You usually don’t need the full ARN of a λf when invoking it programmatically – just its name, and the alias as a separated invocation parameter
  32. Invoking a λf • λf’s become part of AWS –

    so they can be invoked via any SDK and, in general, from any client →so, there is no general invocation syntax • Additionally, to invoke a λf, the caller must have Invocation permissions on it • The invoking client can be; – A standalone app, generally connected to AWS as a dedicated IAM user – Another λf – which can have its own permission set • In a Python-based λf, you can invoke another λf just by using Boto – exactly as you would in a standalone app
  33. Synchronous λf invocations • Voilà a simple example showing how

    to synchronously invoke a function summing two integers (left and right, which it reads from its event object) and returning just an integer value: import boto3 client = boto3.client('lambda') response = client.invoke( FunctionName='myFunction', InvocationType='RequestResponse', Payload=b'{“left”: 80, “right”: 90}’ #You should also add the Qualifier param ) data = response["Payload"].read() functionResult = int(data)
  34. Parsing JSON results • If the function returns a dictionary

    – or, anyway, data structured as JSON - you can have it back in your client code as a simple dictionary: data = response["Payload"].read() resultDict = json.loads(data.decode("utf-8")) after you have added, at the beginning of the current λf script: import json
  35. Further details on invocation type • The InvocationType parameter in

    Boto’s invoke() method can actually take one of 3 different values: – RequestResponse→synchronous, as just seen – Event→asynchronous execution, the response body will be empty and the HTTP status in case of success is 202 - Accepted. The name is perhaps a bit misleading, as event handlers are in some cases invoked synchronously – DryRun→ the λf handler is not executed – but the infrastructure performs checks such as: • Ensuring the caller has invocation permissions on the λf • Basic input validation
  36. Binding a λf to events • Binding λf’s to events

    is quite easy – and the very origin of the λ project • To bind a λf to one or more events, you can use: – The visual editor, based on drag&drop, in the λ dashboard – Other AWS tools, such as the CLI • As mentioned earlier, the invocation type (synchronous / asynchronous) actually depends on the event type →it’s not really correct to assume that all event handling be asynchronous
  37. Hot and cold startup • What you should know about

    λ’s infrastructure is that every λf runs in a self-contained Linux environment having the selected runtime and a few programs and libraries • You can’t really tell whether a λf will have: – Cold startup: its environment must be created and initialized, adding latency to the overall execution – Hot startup: the environment for the λf is ready, so it can run immediately • Latency on cold startup increases depending on: – Size of the λf project / zip file – Latency of the underlying runtime • When having hot startup in runtimes such as Python, a λf can actually employ global variables from previous executions →This can be interesting in order to create a local, first-level cache, but it might also breach security policies and introduce nasty bugs
  38. Pricing Maximum memory The higher this parameter, the higher the

    execution price per 100ms Execution time Rounded up to the nearest 100ms Free Tier Free seconds and GB-second per month You only pay for the actual computation time However, you pay for the whole memory you have requested – even if the function only uses a small percentage of it
  39. Most effective scenarios for λ λ is brilliant in Handling

    events raised by AWS Running short, scheduled tasks Providing legacy, rarely accessed services Performing distributed computing and returning partial results Serving even millions of small HTTP requests Creating filters for the infrastructure Monitoring events, alerting, and enforcing policies Introducing Functional Programming on AWS
  40. Choosing λ or EC2 • Both services are effective for

    their specific purposes: – Long-running tasks → EC2 – Event-driven / short-running / infrequent tasks →λ • In the very end, the main difference is in terms of: – Administration simplicity → λ – Flexibility → EC2 – Pricing → you need to use tools such as AWS Pricing calculator to determine which service actually best suits your needs – despite the free tier, λ might get more expensive than EC2, in case of long-running tasks.
  41. Final considerations • λ is a constantly evolving service: –

    Hardware limits are progressively being raised – More and more events are raised by the AWS infrastructure – Further runtimes will probably get added • Always refer to λ’s official page and documentation to get the very latest details • Combined with other technologies having a perpetual free tier quota - such as DynamoDB, SNS and SQS, λ could become the core of efficient and effective computing infrastructures that you can create and maintain – for free.
  42. Bibliography • AWS Lambda: A Guide to Serverless Microservices –

    a very interesting book by Matthew Fuller • AWS Lambda’s documentation • Boto 3 - Documentation • Python – Official website • Wikipedia
  43. Thanks for your attention! ^__^