Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Serverless Computation Environment with Python by Ric da Silva

7b0645f018c0bddc8ce3900ccc3ba70c?s=47 Pycon ZA
October 09, 2020

Building a Serverless Computation Environment with Python by Ric da Silva

This talk will be a tour and demo of a virtual machine, compiler, and DSL called Teal (https://condense9.com), and is for anyone who builds data processing systems (or is interested in language design!).

Teal is for when you need to run sequences of tasks on a serverless platform (e.g. AWS Lambda), but don't want the huge burden of all the glue-infrastructure. It's an alternative to AWS Step Functions or Apache Airflow for building workflows with many steps, branches and loops.

The goal is to be able to take existing Python code and get it running with as little complexity as possible, and without any long-running infrastructure. It’s easy to deploy one Lambda function (e.g. with the well known Serverless Framework), but it’s much harder to write several functions and pass data between them. The Teal project started as an experiment to see whether it’d be feasible to simplify this by using familiar programming constructs (async/await concurrency, variables, functions, etc) to describe the workflows.

There will be a live demo (via screen-share) of using Teal to build, test and deploy a non-trivial data pipeline, taking just a few minutes from start to finish. This might have taken several hours before, or longer if you didn’t have orchestrator or task-runner infrastructure set up already. Follow along with the code on GitHub: https://github.com/condense9/teal-demos.

Prior knowledge of serverless is useful, but not essential - we'll briefly cover the necessary concepts.

7b0645f018c0bddc8ce3900ccc3ba70c?s=128

Pycon ZA

October 09, 2020
Tweet

Transcript

  1. Serverless Computation Environment Ric da Silva @rmhsilva condense9.com Building a

    with Python, naturally
  2. Hark Beyond infrastructure (as-code) Compilers 2

  3. Serverless Computation Functions as a Service (FaaS) 3

  4. 4 So many tools & services: deployment, logging, monitoring, authentication,

    … A “Typical” Serverless Architecture
  5. Project: Spec Python functions: f, g, h. Input: some parameter,

    x. Output: store value y in a database. y = f(g(h(x))) Constraints: - You must use AWS Lambda. - f, g, h take several minutes each to run. 5 Disclaimer: don’t use these function names at home!
  6. Version 1 6 f(x) g(x) h(x) {“x”: “foo”} Store y

    = f(g(h(x))) ✔ Asynchronous invoke
  7. Version 1: Testing 1. Unit-test f, h, g 2. Build

    an integration test 3. localstack 4. Staging env, CI 7 f(x) g(x) h(x) {“x”: “foo”} Store https://localstack.cloud/
  8. Version 2 8 f(x) g(x) h(x) {“x”: “foo”} Store Inputs

    AWS Simple Queue Service Failures (“Dead Letter” Queue)
  9. Hark Checklist ☐ Use Python ☐ Local testing ☐ Contextual

    debugging ☐ Operational support ☐ Easy cross-cloud portability 9
  10. Let’s try Hark.

  11. Series Computation 11

  12. Parallel Computation Conditionals 12

  13. Lists, mapping 13

  14. API Endpoint 14 (But slowww for now)

  15. More Stacktraces Upload triggers (AWS S3) Custom permissions (IAM statements)

    Custom build scripts (eg ./build.sh) Lambda configuration (memory, extra layers, etc) 15
  16. 16 a = get_future(“a”) if a.resolved: data_stack.push(a.value) else: this_thread.wait_for(a) wait_for:

    sort of like an implicit callback. This thread physically stops. When a resolves, this thread is continued.
  17. f(x) data stack call stack trigger(fn, args) IP=0 f(x) Hark

    “CPU” Threads Waiting? Yes No IP=1 stdout Instructions async g(…) Finish New Thread (fn, args) New Thread Continue Thread (thread_id) Runtime Executable, Futures, CPU State, Stacks Memory (DynamoDB) …
  18. until finished 18 Trigger API Deploy app code List sessions

    Get logs/results/… S3 Uploads Thread state Futures Hark & Python code New Session SNS status notifications Application data sources/sinks Logs Events StdOut Results Other AWS Services Hark CLI tool CI/CD Instance Data Control API API Gateway Manual Trigger Event data Runtime Hark Key-Value Store + Bucket ✨ https://github.com/boto/boto3 https://github.com/pynamodb/PynamoDB https://github.com/prompt-toolkit/python-prompt-toolkit https://github.com/pavdmyt/yaspin https://github.com/CITGuru/PyInquirer/
  19. 19 .hk import(f, src.main) fn main(x) { if x >

    5 { print(“nope”) } else { Lex+Parse Optimise Compile Hark Executale Better AST Tail-call recursion, in particular https://commons.wikimedia.org/wiki/File:Abstract_syntax_tree_for_Euclidean_algorithm.svg - Import statements - Numbers, strings, … - Operations (+, *, …) - Function definitions - Assignments (x = 5) - Function Calls - … Convert the AST into byte-code that the Hark VM can execute Abstract Syntax Tree (AST) 01101 01011 Pack dataclasses, attrs functools typing parsy, sly https://github.com/python-attrs/attrs https://github.com/python-parsy/parsy https://github.com/dabeaz/sly ✨ Executable = Byte-Code + Python function references + Symbol table + Debugging info Byte-code
  20. Checklist ✔ Use Python ✔ Local testing ✔ Contextual debugging

    ☐ Operational support ☐ True cross-cloud portability 20
  21. Limitations Bandwidth (to/from session memory) Lambda startup time Account concurrency

    limits Others? 21
  22. Hark Beyond infrastructure?! condense9.com, github/condense9 @rmhsilva 22