Introduction to AWS Lambda with Python

Slide 1

Slide 1 text

Gianluca Costa Introduction to AWS Lambda AWS Lambda with Python http://gianlucacosta.info/ http://gianlucacosta.info/

Slide 2

Slide 2 text

Introduction ● Cloud computing is a very elegant paradigm, providing high-level tools for distributed systems ● In particular, one can easily provision and customize virtual machines according to their computing needs ● However, is there a way to execute code in the cloud without spending time and effort on system administration? ● This presentation introduces Lambda – the AWS service dedicated to serverless computing – and was inspired by the material listed in the bibliography

Slide 3

Slide 3 text

Functional Programming (FP) ● «Functional programming is a programming paradigm - a style of building the structure and elements of computer programs - that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.» - Wikipedia ● There are different points of view on FP - expressed by various languages such as Haskell, Elm, Scala, … → but the very idea of the paradigm is the concept of function as core computational unit

Slide 4

Slide 4 text

Common benefits of (purely) Functional Programming ● Far fewer buggy side-effects, that are so common when using mutable data structures ● Cleaner, usually shorter code ● Declarative approach to problems ● Mathematical point of view ● Fairly simple and easily extensible language grammars ● In many ways compatible with traditional OOP in hybrid approaches – as proven by Scala; modern Java can support some functional style as well

Slide 5

Slide 5 text

Traditional EC2 computing Virtual Machine (EC2) Servers Polling on other EC2 instances or AWS services In this scenario, the VM periodically checks whether a given condition has become true Long-running tasks

Slide 6

Slide 6 text

EC2 is not for polling ● Polling should generally be avoided – as it’s extremely inefficient: – The CPU and I/O resources are kept busy, waiting for a condition that might never get true – In case of sleeping instructions in polling cycles, the condition might be detected far later than required ● Pricing for EC2 makes such waste not acceptable →Companies used to delegate polling to just one EC2 instance - which led to further problems, for example: – The instance had too many security permissions – In case of instance crash, all the polling activities crashed, too – Stopping the instance for maintenance was not feasible

Slide 7

Slide 7 text

Events ● To prevent polling, AWS has introduced events ● Events are raised whenever a condition occurs: – Something changes in the infrastructure →for example, the status of an EC2 instance, or the file system in an S3 bucket – At fixed instants or after recurring intervals, à la Crontab

Slide 8

Slide 8 text

AWS event sources AWS event sources S3 DynamoDB CloudWatch SNS Kinesis Lambda dashboard ... EC2 API Gateway

Slide 9

Slide 9 text

AWS Lambda Lambda = Functional Programming Execution platform fully managed by AWS + ● In AWS Lambda, you create functions, whose invocation type can be: ● Synchronous, if the caller blocks waiting for the result ● Asynchronous, if the caller asks to run the function and goes on, forgetting about the call ● A function can run: ─ in response to events → in this case, the invocation type depends on the specific event – you can’t control it ● on demand – when programmatically invoked by your code – and you can choose the invocation type + Events

Slide 10

Slide 10 text

No maintenance, only code ● The greatest benefit of Lambda is that you just have to write your functions and upload them ● Serverless programming→AWS will take care of the underlying computing resources required to run your code: except fairly rare situations, you don’t even need to know anything about the infrastructure actually executing your code ● Lambda also ensures High Availability (HA), so you don’t need to worry for single points of failures related to this service

Slide 11

Slide 11 text

Naming conventions ● From now on, we’ll be using the following symbols: – λ → to identify the AWS Lambda service – λf → means any project created for AWS Lambda. It can consist of one or more source files, but it’s actually handled as a single function – λf’s → one or more projects in AWS Lambda (generic plural)

Slide 12

Slide 12 text

One service, many runtimes ● Every single λf can currently target one of the following runtimes: – Python 3.x or 2.x – Node.js 6.x or 4.x – Java 8 – C# - via .NET Core ● The programming model is slightly different from one runtime to the others →for example, strictly OOP runtimes like Java require your functions to be declared within classes ● Apart from that, most concepts and ideas are shared

Slide 13

Slide 13 text

EC2 vs λ EC2 IaaS Your VMs run until you stop them You can choose the technical specs for the VM – such as processor, memory and even GPU optimizations You can choose the OS via the AMI You can configure the VM via SSH You can install new software on the VM You can open ports on the VM You must deploy your artifacts for the specific services you host Monitoring, scalability, HA are up to you You must constantly update the system You must constantly enforce security λ PaaS λf’s run triggered by events or when invoked You can choose only the runtime for your λf’s and a few hardware params You should know very little of the underlying environment and you can neither access nor customize it You cannot open ports AWS takes care of every aspect related to system administration and security

Slide 14

Slide 14 text

EC2 vs λ – Further comparison EC2 You can run almost anything on it Max memory can be thousands of GB Runs as long as you wish Supports IAM roles Has a local, temporary file system EBS and EFS provide further storage Network calls allowed External processes can be run Logging to CloudWatch requires SDK Can satisfy most security policies No free tier after the 1° year λ Only a few runtimes are supported. But you can use AWS SDK and 3rd-party libraries that are non-native or based on provided .so libs Memory for a λf is at most a few GB Any λf can run for a limited time Supports IAM roles Has a local, temporary directory: /tmp EBS and EFS not supported Network calls allowed External processes can be run Logging to CloudWatch is the default Might not satisfy very strict security policies (e.g., in terms of IDS, access logs, ...) Permanent free tier quota

Slide 15

Slide 15 text

Deployment strategies λf source code λ dashboard λ service Online editor Compiler Zip file including all dependencies except AWS SDK AWS CLI Upload form S3 Build tools (Gradle, Jenkins, CloudFormation, ...)

Slide 16

Slide 16 text

Deployment strategies - Explained ● Each λf must be self-contained: – If it is very simple, not relying on external libraries, and based on a dynamic runtime (Python, Node.js, ...), it can be edited online, within the λ dashboard – For complex projects, you have to upload a zip file, which must include all the dependencies (except the AWS SDK) and whose specific directory layout depends on the chosen runtime ● Working offline is not a bad idea, as you can use your traditional IDEs and tools ● For more elaborated artifacts – which are fairly common in Java – having a dedicated plugin for your own build tool can save even more time ● Please, consult λ documentation for updated information about bundling projects for your selected runtime

Slide 17

Slide 17 text

IAM Roles ● λf’s may require permissions – in particular, to access other AWS resources like S3 buckets ● λ relies on the standard system of IAM roles – which can grant and require permissions with no need for credentials ● When creating a λf, you can assign it an existing IAM role, or you can create a new one from a template →If you choose no template, your λf will still have permissions on CloudWatch – which is paramount for logging

Slide 18

Slide 18 text

Further IAM considerations ● λ supports cross-account invocation, provided that appropriate permissions have been granted ● The resources accessed by a λf must reside in its same region →The Lambda@Edge project can still integrate with CloudFront, to support global event handling

Slide 19

Slide 19 text

Why Python? ● Python is a very-high-level and elegant language, featuring a rich standard library ● It has a huge ecosystem, with excellent libraries in almost any modern domain ● It is dynamic, enabling fast prototyping ● The source code and the bytecode are usually compact and fast to load ● Startup time and memory requirements of its virtual machine are quite low ● Python shines at acting as a glue layer between very different contexts and at short, focused tasks ● λ supports in-browser editing of simple Python projects

Slide 20

Slide 20 text

AWS SDK for Python ● It is called Boto 3 and its documentation can be found at the following Internet address: https://boto3.readthedocs.io/en/latest/ ● You can use Boto in your λf’s: – you’ll need to install it – for example, by using pip – for offline programming – you should not deploy it in a λ bundle, as it is a dependency already provided by AWS

Slide 21

Slide 21 text

First λf in Python 1.In the AWS Console, click on Lambda to open the Lambda Dashboard 2.Click on Create function 3.Leave the default selection - Author from scratch 4.Setup these settings: 1.Name: helloPython (or whatever you prefer) 2.Runtime: Python 3.x 3.Role: Create new role from template(s) 4.Role name: myLambdaRole (or whatever you prefer) 5.Policy templates: empty

Slide 22

Slide 22 text

Editing the λf online ● You’ll notice that λ provides a simple but very effective online editor, supporting: – Syntax highlighting – Project tree – Multiple document interface, via tabs ● Above the editor, the Handler field includes the fully-qualified name of the actual function to run when executing this λf: in the case of Python, it is: →. ● Your code can have multiple public/exported functions, but just one can be the handler of its λf

Slide 23

Slide 23 text

Structure of a λf handler ● In Python, a λf handler is just a def function like this: def (event, context): # Code here: you can use this function just as a controller, # which executes code from all over the λf project return # Returning something is optional ● Its signature always includes 2 essential parameters: – event: Python dictionary (accessed via event[“”]) containing: ● event information, in case of event handling ● function parameters, in case of programmatic invocation – context →provides information about the execution environment ● The return value can be anything: from basic values – converted to string - up to (nested) dictionaries – converted to JSON string

Slide 24

Slide 24 text

λf context in detail Context in the Python runtime: for environment inspection Invocation details CloudWatch information Remaining milliseconds Memory limit Client context

Slide 25

Slide 25 text

Testing the λf in the λ dashboard ● Once a λf is in the dashboard – via upload or online editing – it can be invoked as much as you need via the dashboard GUI →Each test invocation is synchronous and passes data to the function via a test event ● In particular, you need to press the Test button: if you have no test events defined, you’ll have to create one – via the usual JSON notation ● Every test execution shows both the related CloudWatch log and the function result ● You can have as many test events as you need for each λf ● If you are using the online editor, you need to press the Save button in order to actually test the new code

Slide 26

Slide 26 text

λf failure ● There are situations when you just have to stop the execution of the λf and notify the caller of an error ● How this happens depends on the runtime, because the λ programming model tries to adapt to the language chosen for the λf ● In Python, you just need to raise any exception ● How the exception is handled depends on the specific client

Slide 27

Slide 27 text

Logging ● Every λf creates a dedicated log group on CloudWatch →within that group, every runtime instance creates a log stream → Interleaved logs are therefore possible ● λ automatically logs metrics related to each λf execution ● In addition to this, whenever a λf writes to the standard output or employs logging facilities provided by the runtime (and customized by AWS), such output is sent into the log stream. ● In Python 3, there are the following redirections to CloudWatch: – The print function – The logging module ● Logging via λ does not introduce additional costs – but the ones already charged by CloudWatch do apply→CloudWatch has a permanent free tier, however it might be wise to reduce the retention period for λ log groups in dev/test environments

Slide 28

Slide 28 text

Testing a λf offline ● Any λf can be tested at the λ dashboard – but that is quite unpractical, as it must be uploaded ● Consequently, there are dedicated libraries for testing a λf on the development PC, via traditional xUnit frameworks ● The idea is creating test stubs of the 2 core objects on which every λf relies – an event and the context - and run all the code offline ● Dynamic languages like Python make creating such test objects very simple

Slide 29

Slide 29 text

λf versioning: qualifiers ● At any moment, you can take an immutable snapshot of a λf →it is called a new version of the λf, and has: – An arbitrary description – A generated id ● The $LATEST id refers to the only mutable version of a λf, which always consists in the current code of the λf – the code now ready to run on λ ● In addition to versions, you can also define aliases: an alias is a tag associated with a version and having a meaningful Name ● Versions and aliases are called Qualifiers

Slide 30

Slide 30 text

Advanced λf aliases ● An alias is like a version pointer, and it can also be changed so as to point to another version ● This is especially useful to prevent code changes: provided that software components always reference only aliases, updating them simply requires switching the alias to the new λf version, right in λ’s dashboard ● It is even possible to make an alias point 2 distinct versions, each with a % weight: this idea is especially useful to tentatively and gradually introduce new λf versions

Slide 31

Slide 31 text

λ ARN ● ARN = Amazon Resource Name →uniquely identifies a resource in AWS ● Every AWS service has a dedicated schema ● In the case of λ, the ARN of a λf follows this pattern: arn:aws:lambda:::function:<λf name>[:] ● The ARN of any λf is shown in the λ dashboard, when you open the λf and even when you select a version or alias ● You usually don’t need the full ARN of a λf when invoking it programmatically – just its name, and the alias as a separated invocation parameter

Slide 32

Slide 32 text

Invoking a λf ● λf’s become part of AWS – so they can be invoked via any SDK and, in general, from any client →so, there is no general invocation syntax ● Additionally, to invoke a λf, the caller must have Invocation permissions on it ● The invoking client can be; – A standalone app, generally connected to AWS as a dedicated IAM user – Another λf – which can have its own permission set ● In a Python-based λf, you can invoke another λf just by using Boto – exactly as you would in a standalone app

Slide 33

Slide 33 text

Synchronous λf invocations ● Voilà a simple example showing how to synchronously invoke a function summing two integers (left and right, which it reads from its event object) and returning just an integer value: import boto3 client = boto3.client('lambda') response = client.invoke( FunctionName='myFunction', InvocationType='RequestResponse', Payload=b'{“left”: 80, “right”: 90}’ #You should also add the Qualifier param ) data = response["Payload"].read() functionResult = int(data)

Slide 34

Slide 34 text

Parsing JSON results ● If the function returns a dictionary – or, anyway, data structured as JSON - you can have it back in your client code as a simple dictionary: data = response["Payload"].read() resultDict = json.loads(data.decode("utf-8")) after you have added, at the beginning of the current λf script: import json

Slide 35

Slide 35 text

Further details on invocation type ● The InvocationType parameter in Boto’s invoke() method can actually take one of 3 different values: – RequestResponse→synchronous, as just seen – Event→asynchronous execution, the response body will be empty and the HTTP status in case of success is 202 - Accepted. The name is perhaps a bit misleading, as event handlers are in some cases invoked synchronously – DryRun→ the λf handler is not executed – but the infrastructure performs checks such as: ● Ensuring the caller has invocation permissions on the λf ● Basic input validation

Slide 36

Slide 36 text

Binding a λf to events ● Binding λf’s to events is quite easy – and the very origin of the λ project ● To bind a λf to one or more events, you can use: – The visual editor, based on drag&drop, in the λ dashboard – Other AWS tools, such as the CLI ● As mentioned earlier, the invocation type (synchronous / asynchronous) actually depends on the event type →it’s not really correct to assume that all event handling be asynchronous

Slide 37

Slide 37 text

Hot and cold startup ● What you should know about λ’s infrastructure is that every λf runs in a self-contained Linux environment having the selected runtime and a few programs and libraries ● You can’t really tell whether a λf will have: – Cold startup: its environment must be created and initialized, adding latency to the overall execution – Hot startup: the environment for the λf is ready, so it can run immediately ● Latency on cold startup increases depending on: – Size of the λf project / zip file – Latency of the underlying runtime ● When having hot startup in runtimes such as Python, a λf can actually employ global variables from previous executions →This can be interesting in order to create a local, first-level cache, but it might also breach security policies and introduce nasty bugs

Slide 38

Slide 38 text

Pricing Maximum memory The higher this parameter, the higher the execution price per 100ms Execution time Rounded up to the nearest 100ms Free Tier Free seconds and GB-second per month You only pay for the actual computation time However, you pay for the whole memory you have requested – even if the function only uses a small percentage of it

Slide 39

Slide 39 text

Most effective scenarios for λ λ is brilliant in Handling events raised by AWS Running short, scheduled tasks Providing legacy, rarely accessed services Performing distributed computing and returning partial results Serving even millions of small HTTP requests Creating filters for the infrastructure Monitoring events, alerting, and enforcing policies Introducing Functional Programming on AWS

Slide 40

Slide 40 text

Choosing λ or EC2 ● Both services are effective for their specific purposes: – Long-running tasks → EC2 – Event-driven / short-running / infrequent tasks →λ ● In the very end, the main difference is in terms of: – Administration simplicity → λ – Flexibility → EC2 – Pricing → you need to use tools such as AWS Pricing calculator to determine which service actually best suits your needs – despite the free tier, λ might get more expensive than EC2, in case of long-running tasks.

Slide 41

Slide 41 text

Final considerations ● λ is a constantly evolving service: – Hardware limits are progressively being raised – More and more events are raised by the AWS infrastructure – Further runtimes will probably get added ● Always refer to λ’s official page and documentation to get the very latest details ● Combined with other technologies having a perpetual free tier quota - such as DynamoDB, SNS and SQS, λ could become the core of efficient and effective computing infrastructures that you can create and maintain – for free.

Slide 42

Slide 42 text

Bibliography ● AWS Lambda: A Guide to Serverless Microservices – a very interesting book by Matthew Fuller ● AWS Lambda’s documentation ● Boto 3 - Documentation ● Python – Official website ● Wikipedia

Slide 43

Slide 43 text

Thanks for your attention! ^__^