Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to AWS Lambda with Python

Introduction to AWS Lambda with Python

Cloud computing is a very elegant paradigm, providing high-level tools for distributed systems.

In particular, one can easily provision and customize virtual machines according to their computing needs.

However, is there a way to execute code in the cloud without spending time and effort on system administration?

This presentation introduces Lambda – the AWS service dedicated to serverless computing.

Gianluca Costa

December 27, 2017
Tweet

More Decks by Gianluca Costa

Other Decks in Technology

Transcript

  1. Gianluca Costa
    Introduction to
    AWS Lambda
    AWS Lambda
    with Python
    http://gianlucacosta.info/
    http://gianlucacosta.info/

    View Slide

  2. Introduction

    Cloud computing is a very elegant paradigm, providing
    high-level tools for distributed systems

    In particular, one can easily provision and customize
    virtual machines according to their computing needs

    However, is there a way to execute code in the cloud
    without spending time and effort on system
    administration?

    This presentation introduces Lambda – the AWS service
    dedicated to serverless computing – and was inspired
    by the material listed in the bibliography

    View Slide

  3. Functional Programming (FP)

    «Functional programming is a programming
    paradigm - a style of building the structure and
    elements of computer programs - that treats
    computation as the evaluation of mathematical
    functions and avoids changing-state and
    mutable data.» - Wikipedia

    There are different points of view on FP -
    expressed by various languages such as Haskell,
    Elm, Scala, … → but the very idea of the
    paradigm is the concept of function as core
    computational unit

    View Slide

  4. Common benefits of (purely)
    Functional Programming

    Far fewer buggy side-effects, that are so common
    when using mutable data structures

    Cleaner, usually shorter code

    Declarative approach to problems

    Mathematical point of view

    Fairly simple and easily extensible language
    grammars

    In many ways compatible with traditional OOP in
    hybrid approaches – as proven by Scala; modern
    Java can support some functional style as well

    View Slide

  5. Traditional EC2 computing
    Virtual Machine (EC2)
    Servers
    Polling on other EC2
    instances or AWS services
    In this scenario, the VM periodically checks
    whether a given condition has become true
    Long-running
    tasks

    View Slide

  6. EC2 is not for polling

    Polling should generally be avoided – as it’s extremely
    inefficient:
    – The CPU and I/O resources are kept busy, waiting for a
    condition that might never get true
    – In case of sleeping instructions in polling cycles, the
    condition might be detected far later than required

    Pricing for EC2 makes such waste not acceptable
    →Companies used to delegate polling to just one EC2
    instance - which led to further problems, for example:
    – The instance had too many security permissions
    – In case of instance crash, all the polling activities crashed, too
    – Stopping the instance for maintenance was not feasible

    View Slide

  7. Events

    To prevent polling, AWS has introduced events

    Events are raised whenever a condition occurs:
    – Something changes in the infrastructure
    →for example, the status of an EC2 instance, or the
    file system in an S3 bucket
    – At fixed instants or after recurring intervals, à la
    Crontab

    View Slide

  8. AWS event sources
    AWS event sources
    S3
    DynamoDB
    CloudWatch
    SNS
    Kinesis
    Lambda dashboard ...
    EC2
    API Gateway

    View Slide

  9. AWS Lambda
    Lambda
    = Functional
    Programming
    Execution platform fully
    managed by AWS
    +

    In AWS Lambda, you create functions, whose invocation
    type can be:

    Synchronous, if the caller blocks waiting for the result

    Asynchronous, if the caller asks to run the function and
    goes on, forgetting about the call

    A function can run:

    in response to events → in this case, the invocation
    type depends on the specific event – you can’t control it

    on demand – when programmatically invoked by your
    code – and you can choose the invocation type
    +
    Events

    View Slide

  10. No maintenance, only code

    The greatest benefit of Lambda is that you just
    have to write your functions and upload them

    Serverless programming→AWS will take care of
    the underlying computing resources required to run
    your code: except fairly rare situations, you don’t
    even need to know anything about the
    infrastructure actually executing your code

    Lambda also ensures High Availability (HA), so
    you don’t need to worry for single points of failures
    related to this service

    View Slide

  11. Naming conventions

    From now on, we’ll be using the following
    symbols:
    – λ → to identify the AWS Lambda service
    – λf → means any project created for AWS Lambda. It
    can consist of one or more source files, but it’s actually
    handled as a single function
    – λf’s → one or more projects in AWS Lambda
    (generic plural)

    View Slide

  12. One service, many runtimes

    Every single λf can currently target one of the following
    runtimes:
    – Python 3.x or 2.x
    – Node.js 6.x or 4.x
    – Java 8
    – C# - via .NET Core

    The programming model is slightly different from one
    runtime to the others
    →for example, strictly OOP runtimes like Java require
    your functions to be declared within classes

    Apart from that, most concepts and ideas are shared

    View Slide

  13. EC2 vs λ
    EC2
    IaaS
    Your VMs run until you stop them
    You can choose the technical specs for
    the VM – such as processor, memory
    and even GPU optimizations
    You can choose the OS via the AMI
    You can configure the VM via SSH
    You can install new software on the VM
    You can open ports on the VM
    You must deploy your artifacts for the
    specific services you host
    Monitoring, scalability, HA are up to you
    You must constantly update the system
    You must constantly enforce security
    λ
    PaaS
    λf’s run triggered by events or when invoked
    You can choose only the runtime for your λf’s
    and a few hardware params
    You should know very little of the underlying
    environment and you can neither access nor
    customize it
    You cannot open ports
    AWS takes care of every aspect related to
    system administration and security

    View Slide

  14. EC2 vs λ – Further comparison
    EC2
    You can run almost anything on it
    Max memory can be thousands of GB
    Runs as long as you wish
    Supports IAM roles
    Has a local, temporary file system
    EBS and EFS provide further storage
    Network calls allowed
    External processes can be run
    Logging to CloudWatch requires SDK
    Can satisfy most security policies
    No free tier after the 1° year
    λ
    Only a few runtimes are supported. But you
    can use AWS SDK and 3rd-party libraries that
    are non-native or based on provided .so libs
    Memory for a λf is at most a few GB
    Any λf can run for a limited time
    Supports IAM roles
    Has a local, temporary directory: /tmp
    EBS and EFS not supported
    Network calls allowed
    External processes can be run
    Logging to CloudWatch is the default
    Might not satisfy very strict security policies
    (e.g., in terms of IDS, access logs, ...)
    Permanent free tier quota

    View Slide

  15. Deployment strategies
    λf source code
    λ dashboard
    λ
    service
    Online editor
    Compiler
    Zip file including
    all dependencies
    except AWS SDK
    AWS CLI
    Upload form
    S3
    Build tools (Gradle,
    Jenkins, CloudFormation, ...)

    View Slide

  16. Deployment strategies - Explained

    Each λf must be self-contained:
    – If it is very simple, not relying on external libraries, and based on a
    dynamic runtime (Python, Node.js, ...), it can be edited online,
    within the λ dashboard
    – For complex projects, you have to upload a zip file, which must
    include all the dependencies (except the AWS SDK) and whose
    specific directory layout depends on the chosen runtime

    Working offline is not a bad idea, as you can use your
    traditional IDEs and tools

    For more elaborated artifacts – which are fairly common in
    Java – having a dedicated plugin for your own build tool can
    save even more time

    Please, consult λ documentation for updated information
    about bundling projects for your selected runtime

    View Slide

  17. IAM Roles

    λf’s may require permissions – in particular, to
    access other AWS resources like S3 buckets

    λ relies on the standard system of IAM roles –
    which can grant and require permissions with no
    need for credentials

    When creating a λf, you can assign it an existing
    IAM role, or you can create a new one from a
    template
    →If you choose no template, your λf will still have
    permissions on CloudWatch – which is paramount
    for logging

    View Slide

  18. Further IAM considerations

    λ supports cross-account invocation, provided
    that appropriate permissions have been granted

    The resources accessed by a λf must reside in its
    same region
    →The Lambda@Edge project can still integrate
    with CloudFront, to support global event handling

    View Slide

  19. Why Python?

    Python is a very-high-level and elegant language,
    featuring a rich standard library

    It has a huge ecosystem, with excellent libraries in
    almost any modern domain

    It is dynamic, enabling fast prototyping

    The source code and the bytecode are usually compact
    and fast to load

    Startup time and memory requirements of its virtual
    machine are quite low

    Python shines at acting as a glue layer between very
    different contexts and at short, focused tasks

    λ supports in-browser editing of simple Python projects

    View Slide

  20. AWS SDK for Python

    It is called Boto 3 and its documentation can be
    found at the following Internet address:
    https://boto3.readthedocs.io/en/latest/

    You can use Boto in your λf’s:
    – you’ll need to install it – for example, by using pip – for
    offline programming
    – you should not deploy it in a λ bundle, as it is a
    dependency already provided by AWS

    View Slide

  21. First λf in Python
    1.In the AWS Console, click on Lambda to open the
    Lambda Dashboard
    2.Click on Create function
    3.Leave the default selection - Author from scratch
    4.Setup these settings:
    1.Name: helloPython (or whatever you prefer)
    2.Runtime: Python 3.x
    3.Role: Create new role from template(s)
    4.Role name: myLambdaRole (or whatever you prefer)
    5.Policy templates: empty

    View Slide

  22. Editing the λf online

    You’ll notice that λ provides a simple but very effective online
    editor, supporting:
    – Syntax highlighting
    – Project tree
    – Multiple document interface, via tabs

    Above the editor, the Handler field includes the fully-qualified
    name of the actual function to run when executing this λf: in
    the case of Python, it is:
    →.

    Your code can have multiple public/exported functions, but
    just one can be the handler of its λf

    View Slide

  23. Structure of a λf handler

    In Python, a λf handler is just a def function like this:
    def (event, context):
    # Code here: you can use this function just as a controller,
    # which executes code from all over the λf project
    return # Returning something is optional

    Its signature always includes 2 essential parameters:
    – event: Python dictionary (accessed via event[“”]) containing:

    event information, in case of event handling

    function parameters, in case of programmatic invocation
    – context →provides information about the execution environment

    The return value can be anything: from basic values – converted to
    string - up to (nested) dictionaries – converted to JSON string

    View Slide

  24. λf context in detail
    Context in the Python runtime:
    for environment inspection
    Invocation details CloudWatch information
    Remaining milliseconds Memory limit
    Client context

    View Slide

  25. Testing the λf in the λ dashboard

    Once a λf is in the dashboard – via upload or online editing
    – it can be invoked as much as you need via the
    dashboard GUI
    →Each test invocation is synchronous and passes data to
    the function via a test event

    In particular, you need to press the Test button: if you have
    no test events defined, you’ll have to create one – via the
    usual JSON notation

    Every test execution shows both the related CloudWatch
    log and the function result

    You can have as many test events as you need for each λf

    If you are using the online editor, you need to press the
    Save button in order to actually test the new code

    View Slide

  26. λf failure

    There are situations when you just have to stop
    the execution of the λf and notify the caller of an
    error

    How this happens depends on the runtime,
    because the λ programming model tries to adapt
    to the language chosen for the λf

    In Python, you just need to raise any exception

    How the exception is handled depends on the
    specific client

    View Slide

  27. Logging

    Every λf creates a dedicated log group on CloudWatch
    →within that group, every runtime instance creates a log stream
    → Interleaved logs are therefore possible

    λ automatically logs metrics related to each λf execution

    In addition to this, whenever a λf writes to the standard output or
    employs logging facilities provided by the runtime (and customized
    by AWS), such output is sent into the log stream.

    In Python 3, there are the following redirections to CloudWatch:
    – The print function
    – The logging module

    Logging via λ does not introduce additional costs – but the ones
    already charged by CloudWatch do apply→CloudWatch has a
    permanent free tier, however it might be wise to reduce the
    retention period for λ log groups in dev/test environments

    View Slide

  28. Testing a λf offline

    Any λf can be tested at the λ dashboard – but that
    is quite unpractical, as it must be uploaded

    Consequently, there are dedicated libraries for
    testing a λf on the development PC, via traditional
    xUnit frameworks

    The idea is creating test stubs of the 2 core
    objects on which every λf relies – an event and
    the context - and run all the code offline

    Dynamic languages like Python make creating
    such test objects very simple

    View Slide

  29. λf versioning: qualifiers

    At any moment, you can take an immutable snapshot
    of a λf →it is called a new version of the λf, and has:
    – An arbitrary description
    – A generated id

    The $LATEST id refers to the only mutable version of
    a λf, which always consists in the current code of
    the λf – the code now ready to run on λ

    In addition to versions, you can also define aliases: an
    alias is a tag associated with a version and having a
    meaningful Name

    Versions and aliases are called Qualifiers

    View Slide

  30. Advanced λf aliases

    An alias is like a version pointer, and it can also
    be changed so as to point to another version

    This is especially useful to prevent code
    changes: provided that software components
    always reference only aliases, updating them
    simply requires switching the alias to the new λf
    version, right in λ’s dashboard

    It is even possible to make an alias point 2
    distinct versions, each with a % weight: this
    idea is especially useful to tentatively and
    gradually introduce new λf versions

    View Slide

  31. λ ARN

    ARN = Amazon Resource Name →uniquely identifies a
    resource in AWS

    Every AWS service has a dedicated schema

    In the case of λ, the ARN of a λf follows this pattern:
    arn:aws:lambda:::function:<λf
    name>[:]

    The ARN of any λf is shown in the λ dashboard, when you
    open the λf and even when you select a version or alias

    You usually don’t need the full ARN of a λf when invoking
    it programmatically – just its name, and the alias as a
    separated invocation parameter

    View Slide

  32. Invoking a λf

    λf’s become part of AWS – so they can be invoked via
    any SDK and, in general, from any client →so, there is
    no general invocation syntax

    Additionally, to invoke a λf, the caller must have
    Invocation permissions on it

    The invoking client can be;
    – A standalone app, generally connected to AWS as a
    dedicated IAM user
    – Another λf – which can have its own permission set

    In a Python-based λf, you can invoke another λf just by
    using Boto – exactly as you would in a standalone app

    View Slide

  33. Synchronous λf invocations

    Voilà a simple example showing how to synchronously invoke a function
    summing two integers (left and right, which it reads from its event object)
    and returning just an integer value:
    import boto3
    client = boto3.client('lambda')
    response = client.invoke(
    FunctionName='myFunction',
    InvocationType='RequestResponse',
    Payload=b'{“left”: 80, “right”: 90}’ #You should also add the Qualifier
    param
    )
    data = response["Payload"].read()
    functionResult = int(data)

    View Slide

  34. Parsing JSON results

    If the function returns a dictionary – or, anyway,
    data structured as JSON - you can have it back in
    your client code as a simple dictionary:
    data = response["Payload"].read()
    resultDict = json.loads(data.decode("utf-8"))
    after you have added, at the beginning of the
    current λf script:
    import json

    View Slide

  35. Further details on invocation type

    The InvocationType parameter in Boto’s invoke()
    method can actually take one of 3 different values:
    – RequestResponse→synchronous, as just seen
    – Event→asynchronous execution, the response body will
    be empty and the HTTP status in case of success is 202 -
    Accepted. The name is perhaps a bit misleading, as
    event handlers are in some cases invoked synchronously
    – DryRun→ the λf handler is not executed – but the
    infrastructure performs checks such as:

    Ensuring the caller has invocation permissions on the λf

    Basic input validation

    View Slide

  36. Binding a λf to events

    Binding λf’s to events is quite easy – and the very
    origin of the λ project

    To bind a λf to one or more events, you can use:
    – The visual editor, based on drag&drop, in the λ
    dashboard
    – Other AWS tools, such as the CLI

    As mentioned earlier, the invocation type
    (synchronous / asynchronous) actually depends on
    the event type →it’s not really correct to assume
    that all event handling be asynchronous

    View Slide

  37. Hot and cold startup

    What you should know about λ’s infrastructure is that every λf runs
    in a self-contained Linux environment having the selected
    runtime and a few programs and libraries

    You can’t really tell whether a λf will have:
    – Cold startup: its environment must be created and initialized, adding
    latency to the overall execution
    – Hot startup: the environment for the λf is ready, so it can run immediately

    Latency on cold startup increases depending on:
    – Size of the λf project / zip file
    – Latency of the underlying runtime

    When having hot startup in runtimes such as Python, a λf can
    actually employ global variables from previous executions →This
    can be interesting in order to create a local, first-level cache, but
    it might also breach security policies and introduce nasty bugs

    View Slide

  38. Pricing
    Maximum memory
    The higher this parameter,
    the higher the execution
    price per 100ms
    Execution time
    Rounded up to the
    nearest 100ms
    Free Tier
    Free seconds and
    GB-second per month
    You only pay
    for the actual
    computation time
    However, you pay for the whole memory
    you have requested – even if the function
    only uses a small percentage of it

    View Slide

  39. Most effective scenarios for λ
    λ is brilliant in
    Handling events
    raised by AWS
    Running short,
    scheduled tasks
    Providing legacy, rarely
    accessed services
    Performing distributed
    computing and returning
    partial results
    Serving even millions
    of small HTTP requests
    Creating filters for
    the infrastructure
    Monitoring events, alerting,
    and enforcing policies
    Introducing Functional
    Programming on AWS

    View Slide

  40. Choosing λ or EC2

    Both services are effective for their specific purposes:
    – Long-running tasks → EC2
    – Event-driven / short-running / infrequent tasks →λ

    In the very end, the main difference is in terms of:
    – Administration simplicity → λ
    – Flexibility → EC2
    – Pricing → you need to use tools such as AWS Pricing
    calculator to determine which service actually best suits
    your needs – despite the free tier, λ might get more
    expensive than EC2, in case of long-running tasks.

    View Slide

  41. Final considerations

    λ is a constantly evolving service:
    – Hardware limits are progressively being raised
    – More and more events are raised by the AWS
    infrastructure
    – Further runtimes will probably get added

    Always refer to λ’s official page and documentation to
    get the very latest details

    Combined with other technologies having a
    perpetual free tier quota - such as DynamoDB,
    SNS and SQS, λ could become the core of efficient
    and effective computing infrastructures that you
    can create and maintain – for free.

    View Slide

  42. Bibliography

    AWS Lambda: A Guide to Serverless Microservices – a
    very interesting book by Matthew Fuller

    AWS Lambda’s documentation

    Boto 3 - Documentation

    Python – Official website

    Wikipedia

    View Slide

  43. Thanks for your attention! ^__^

    View Slide