Introduction to AWS Lambda with Python

Gianluca Costa Introduction to AWS Lambda AWS Lambda with Python
http://gianlucacosta.info/ http://gianlucacosta.info/

Introduction • Cloud computing is a very elegant paradigm, providing
high-level tools for distributed systems • In particular, one can easily provision and customize virtual machines according to their computing needs • However, is there a way to execute code in the cloud without spending time and effort on system administration? • This presentation introduces Lambda – the AWS service dedicated to serverless computing – and was inspired by the material listed in the bibliography

Functional Programming (FP) • «Functional programming is a programming paradigm
- a style of building the structure and elements of computer programs - that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.» - Wikipedia • There are different points of view on FP - expressed by various languages such as Haskell, Elm, Scala, … → but the very idea of the paradigm is the concept of function as core computational unit

Common benefits of (purely) Functional Programming • Far fewer buggy
side-effects, that are so common when using mutable data structures • Cleaner, usually shorter code • Declarative approach to problems • Mathematical point of view • Fairly simple and easily extensible language grammars • In many ways compatible with traditional OOP in hybrid approaches – as proven by Scala; modern Java can support some functional style as well

Traditional EC2 computing Virtual Machine (EC2) Servers Polling on other
EC2 instances or AWS services In this scenario, the VM periodically checks whether a given condition has become true Long-running tasks

EC2 is not for polling • Polling should generally be
avoided – as it’s extremely inefficient: – The CPU and I/O resources are kept busy, waiting for a condition that might never get true – In case of sleeping instructions in polling cycles, the condition might be detected far later than required • Pricing for EC2 makes such waste not acceptable →Companies used to delegate polling to just one EC2 instance - which led to further problems, for example: – The instance had too many security permissions – In case of instance crash, all the polling activities crashed, too – Stopping the instance for maintenance was not feasible

Events • To prevent polling, AWS has introduced events •
Events are raised whenever a condition occurs: – Something changes in the infrastructure →for example, the status of an EC2 instance, or the file system in an S3 bucket – At fixed instants or after recurring intervals, à la Crontab

AWS event sources AWS event sources S3 DynamoDB CloudWatch SNS
Kinesis Lambda dashboard ... EC2 API Gateway

AWS Lambda Lambda = Functional Programming Execution platform fully managed
by AWS + • In AWS Lambda, you create functions, whose invocation type can be: • Synchronous, if the caller blocks waiting for the result • Asynchronous, if the caller asks to run the function and goes on, forgetting about the call • A function can run: ─ in response to events → in this case, the invocation type depends on the specific event – you can’t control it • on demand – when programmatically invoked by your code – and you can choose the invocation type + Events

No maintenance, only code • The greatest benefit of Lambda
is that you just have to write your functions and upload them • Serverless programming→AWS will take care of the underlying computing resources required to run your code: except fairly rare situations, you don’t even need to know anything about the infrastructure actually executing your code • Lambda also ensures High Availability (HA), so you don’t need to worry for single points of failures related to this service

Naming conventions • From now on, we’ll be using the
following symbols: – λ → to identify the AWS Lambda service – λf → means any project created for AWS Lambda. It can consist of one or more source files, but it’s actually handled as a single function – λf’s → one or more projects in AWS Lambda (generic plural)

One service, many runtimes • Every single λf can currently
target one of the following runtimes: – Python 3.x or 2.x – Node.js 6.x or 4.x – Java 8 – C# - via .NET Core • The programming model is slightly different from one runtime to the others →for example, strictly OOP runtimes like Java require your functions to be declared within classes • Apart from that, most concepts and ideas are shared

EC2 vs λ EC2 IaaS Your VMs run until you
stop them You can choose the technical specs for the VM – such as processor, memory and even GPU optimizations You can choose the OS via the AMI You can configure the VM via SSH You can install new software on the VM You can open ports on the VM You must deploy your artifacts for the specific services you host Monitoring, scalability, HA are up to you You must constantly update the system You must constantly enforce security λ PaaS λf’s run triggered by events or when invoked You can choose only the runtime for your λf’s and a few hardware params You should know very little of the underlying environment and you can neither access nor customize it You cannot open ports AWS takes care of every aspect related to system administration and security

EC2 vs λ – Further comparison EC2 You can run
almost anything on it Max memory can be thousands of GB Runs as long as you wish Supports IAM roles Has a local, temporary file system EBS and EFS provide further storage Network calls allowed External processes can be run Logging to CloudWatch requires SDK Can satisfy most security policies No free tier after the 1° year λ Only a few runtimes are supported. But you can use AWS SDK and 3rd-party libraries that are non-native or based on provided .so libs Memory for a λf is at most a few GB Any λf can run for a limited time Supports IAM roles Has a local, temporary directory: /tmp EBS and EFS not supported Network calls allowed External processes can be run Logging to CloudWatch is the default Might not satisfy very strict security policies (e.g., in terms of IDS, access logs, ...) Permanent free tier quota

Deployment strategies λf source code λ dashboard λ service Online
editor Compiler Zip file including all dependencies except AWS SDK AWS CLI Upload form S3 Build tools (Gradle, Jenkins, CloudFormation, ...)

Deployment strategies - Explained • Each λf must be self-contained:
– If it is very simple, not relying on external libraries, and based on a dynamic runtime (Python, Node.js, ...), it can be edited online, within the λ dashboard – For complex projects, you have to upload a zip file, which must include all the dependencies (except the AWS SDK) and whose specific directory layout depends on the chosen runtime • Working offline is not a bad idea, as you can use your traditional IDEs and tools • For more elaborated artifacts – which are fairly common in Java – having a dedicated plugin for your own build tool can save even more time • Please, consult λ documentation for updated information about bundling projects for your selected runtime

IAM Roles • λf’s may require permissions – in particular,
to access other AWS resources like S3 buckets • λ relies on the standard system of IAM roles – which can grant and require permissions with no need for credentials • When creating a λf, you can assign it an existing IAM role, or you can create a new one from a template →If you choose no template, your λf will still have permissions on CloudWatch – which is paramount for logging

Further IAM considerations • λ supports cross-account invocation, provided that
appropriate permissions have been granted • The resources accessed by a λf must reside in its same region →The Lambda@Edge project can still integrate with CloudFront, to support global event handling

Why Python? • Python is a very-high-level and elegant language,
featuring a rich standard library • It has a huge ecosystem, with excellent libraries in almost any modern domain • It is dynamic, enabling fast prototyping • The source code and the bytecode are usually compact and fast to load • Startup time and memory requirements of its virtual machine are quite low • Python shines at acting as a glue layer between very different contexts and at short, focused tasks • λ supports in-browser editing of simple Python projects

AWS SDK for Python • It is called Boto 3
and its documentation can be found at the following Internet address: https://boto3.readthedocs.io/en/latest/ • You can use Boto in your λf’s: – you’ll need to install it – for example, by using pip – for offline programming – you should not deploy it in a λ bundle, as it is a dependency already provided by AWS

First λf in Python 1.In the AWS Console, click on
Lambda to open the Lambda Dashboard 2.Click on Create function 3.Leave the default selection - Author from scratch 4.Setup these settings: 1.Name: helloPython (or whatever you prefer) 2.Runtime: Python 3.x 3.Role: Create new role from template(s) 4.Role name: myLambdaRole (or whatever you prefer) 5.Policy templates: empty

Editing the λf online • You’ll notice that λ provides
a simple but very effective online editor, supporting: – Syntax highlighting – Project tree – Multiple document interface, via tabs • Above the editor, the Handler field includes the fully-qualified name of the actual function to run when executing this λf: in the case of Python, it is: →<module name>.<function name> • Your code can have multiple public/exported functions, but just one can be the handler of its λf

Structure of a λf handler • In Python, a λf
handler is just a def function like this: def <function name>(event, context): # Code here: you can use this function just as a controller, # which executes code from all over the λf project return <result> # Returning something is optional • Its signature always includes 2 essential parameters: – event: Python dictionary (accessed via event[“<field name>”]) containing: • event information, in case of event handling • function parameters, in case of programmatic invocation – context →provides information about the execution environment • The return value can be anything: from basic values – converted to string - up to (nested) dictionaries – converted to JSON string

λf context in detail Context in the Python runtime: for
environment inspection Invocation details CloudWatch information Remaining milliseconds Memory limit Client context

Testing the λf in the λ dashboard • Once a
λf is in the dashboard – via upload or online editing – it can be invoked as much as you need via the dashboard GUI →Each test invocation is synchronous and passes data to the function via a test event • In particular, you need to press the Test button: if you have no test events defined, you’ll have to create one – via the usual JSON notation • Every test execution shows both the related CloudWatch log and the function result • You can have as many test events as you need for each λf • If you are using the online editor, you need to press the Save button in order to actually test the new code

λf failure • There are situations when you just have
to stop the execution of the λf and notify the caller of an error • How this happens depends on the runtime, because the λ programming model tries to adapt to the language chosen for the λf • In Python, you just need to raise any exception • How the exception is handled depends on the specific client

Logging • Every λf creates a dedicated log group on
CloudWatch →within that group, every runtime instance creates a log stream → Interleaved logs are therefore possible • λ automatically logs metrics related to each λf execution • In addition to this, whenever a λf writes to the standard output or employs logging facilities provided by the runtime (and customized by AWS), such output is sent into the log stream. • In Python 3, there are the following redirections to CloudWatch: – The print function – The logging module • Logging via λ does not introduce additional costs – but the ones already charged by CloudWatch do apply→CloudWatch has a permanent free tier, however it might be wise to reduce the retention period for λ log groups in dev/test environments

Testing a λf offline • Any λf can be tested
at the λ dashboard – but that is quite unpractical, as it must be uploaded • Consequently, there are dedicated libraries for testing a λf on the development PC, via traditional xUnit frameworks • The idea is creating test stubs of the 2 core objects on which every λf relies – an event and the context - and run all the code offline • Dynamic languages like Python make creating such test objects very simple

λf versioning: qualifiers • At any moment, you can take
an immutable snapshot of a λf →it is called a new version of the λf, and has: – An arbitrary description – A generated id • The $LATEST id refers to the only mutable version of a λf, which always consists in the current code of the λf – the code now ready to run on λ • In addition to versions, you can also define aliases: an alias is a tag associated with a version and having a meaningful Name • Versions and aliases are called Qualifiers

Advanced λf aliases • An alias is like a version
pointer, and it can also be changed so as to point to another version • This is especially useful to prevent code changes: provided that software components always reference only aliases, updating them simply requires switching the alias to the new λf version, right in λ’s dashboard • It is even possible to make an alias point 2 distinct versions, each with a % weight: this idea is especially useful to tentatively and gradually introduce new λf versions

λ ARN • ARN = Amazon Resource Name →uniquely identifies
a resource in AWS • Every AWS service has a dedicated schema • In the case of λ, the ARN of a λf follows this pattern: arn:aws:lambda:<region>:<account id>:function:<λf name>[:<version or alias>] • The ARN of any λf is shown in the λ dashboard, when you open the λf and even when you select a version or alias • You usually don’t need the full ARN of a λf when invoking it programmatically – just its name, and the alias as a separated invocation parameter

Invoking a λf • λf’s become part of AWS –
so they can be invoked via any SDK and, in general, from any client →so, there is no general invocation syntax • Additionally, to invoke a λf, the caller must have Invocation permissions on it • The invoking client can be; – A standalone app, generally connected to AWS as a dedicated IAM user – Another λf – which can have its own permission set • In a Python-based λf, you can invoke another λf just by using Boto – exactly as you would in a standalone app

Synchronous λf invocations • Voilà a simple example showing how
to synchronously invoke a function summing two integers (left and right, which it reads from its event object) and returning just an integer value: import boto3 client = boto3.client('lambda') response = client.invoke( FunctionName='myFunction', InvocationType='RequestResponse', Payload=b'{“left”: 80, “right”: 90}’ #You should also add the Qualifier param ) data = response["Payload"].read() functionResult = int(data)

Parsing JSON results • If the function returns a dictionary
– or, anyway, data structured as JSON - you can have it back in your client code as a simple dictionary: data = response["Payload"].read() resultDict = json.loads(data.decode("utf-8")) after you have added, at the beginning of the current λf script: import json

Further details on invocation type • The InvocationType parameter in
Boto’s invoke() method can actually take one of 3 different values: – RequestResponse→synchronous, as just seen – Event→asynchronous execution, the response body will be empty and the HTTP status in case of success is 202 - Accepted. The name is perhaps a bit misleading, as event handlers are in some cases invoked synchronously – DryRun→ the λf handler is not executed – but the infrastructure performs checks such as: • Ensuring the caller has invocation permissions on the λf • Basic input validation

Binding a λf to events • Binding λf’s to events
is quite easy – and the very origin of the λ project • To bind a λf to one or more events, you can use: – The visual editor, based on drag&drop, in the λ dashboard – Other AWS tools, such as the CLI • As mentioned earlier, the invocation type (synchronous / asynchronous) actually depends on the event type →it’s not really correct to assume that all event handling be asynchronous

Hot and cold startup • What you should know about
λ’s infrastructure is that every λf runs in a self-contained Linux environment having the selected runtime and a few programs and libraries • You can’t really tell whether a λf will have: – Cold startup: its environment must be created and initialized, adding latency to the overall execution – Hot startup: the environment for the λf is ready, so it can run immediately • Latency on cold startup increases depending on: – Size of the λf project / zip file – Latency of the underlying runtime • When having hot startup in runtimes such as Python, a λf can actually employ global variables from previous executions →This can be interesting in order to create a local, first-level cache, but it might also breach security policies and introduce nasty bugs

Pricing Maximum memory The higher this parameter, the higher the
execution price per 100ms Execution time Rounded up to the nearest 100ms Free Tier Free seconds and GB-second per month You only pay for the actual computation time However, you pay for the whole memory you have requested – even if the function only uses a small percentage of it

Most effective scenarios for λ λ is brilliant in Handling
events raised by AWS Running short, scheduled tasks Providing legacy, rarely accessed services Performing distributed computing and returning partial results Serving even millions of small HTTP requests Creating filters for the infrastructure Monitoring events, alerting, and enforcing policies Introducing Functional Programming on AWS

Choosing λ or EC2 • Both services are effective for
their specific purposes: – Long-running tasks → EC2 – Event-driven / short-running / infrequent tasks →λ • In the very end, the main difference is in terms of: – Administration simplicity → λ – Flexibility → EC2 – Pricing → you need to use tools such as AWS Pricing calculator to determine which service actually best suits your needs – despite the free tier, λ might get more expensive than EC2, in case of long-running tasks.

Final considerations • λ is a constantly evolving service: –
Hardware limits are progressively being raised – More and more events are raised by the AWS infrastructure – Further runtimes will probably get added • Always refer to λ’s official page and documentation to get the very latest details • Combined with other technologies having a perpetual free tier quota - such as DynamoDB, SNS and SQS, λ could become the core of efficient and effective computing infrastructures that you can create and maintain – for free.

Bibliography • AWS Lambda: A Guide to Serverless Microservices –
a very interesting book by Matthew Fuller • AWS Lambda’s documentation • Boto 3 - Documentation • Python – Official website • Wikipedia

Thanks for your attention! ^__^

Introduction to AWS Lambda with Python

Introduction to AWS Lambda with Python

More Decks by Gianluca Costa

Other Decks in Technology

Featured

Transcript