Slide 1

Slide 1 text

Don't Worry About Servers Still Worry About Metrics FaaS Measurement Fundamentals @smithclay New Relic Gluecon 5/24/17

Slide 2

Slide 2 text

Metrics are what we measure *hopefully useful things

Slide 3

Slide 3 text

λ This Thing Appeared The Magic Hat of Werner Vogels How do we understand it?

Slide 4

Slide 4 text

Hyped tech wish list Metrics Trends Alerting CTA Logging Detail Tracing Cause Analytics All of the above

Slide 5

Slide 5 text

MALTA Observability Index for FaaS Metrics Alerting Logging Tracing Analytics Maturity Level

Slide 6

Slide 6 text

Built-in FaaS Metrics* Error Count Function Invocation Count Function Duration (ms) * not comprehensive, but the important ones

Slide 7

Slide 7 text

Why does function invocation time vary so much?

Slide 8

Slide 8 text

Event Trigger 1. Invoke λ 2. Run 3. End Result Error Timeout or or

Slide 9

Slide 9 text

Cold Start vs Warm Start Event Trigger Handler Code Warm Function Invocation Time Create Initialize Handler Code Cold

Slide 10

Slide 10 text

What's inside AWS Lambda? "It's containers" — Person waving their hands

Slide 11

Slide 11 text

λ: Running Commands for Discovery const exec = require('child_process').exec; exports.handler = (trigger, cb) => { exec('whoami', (err, stdout) => { console.log(stdout); return cb(null); }); } [LOG TIME] sbxuser_1066

Slide 12

Slide 12 text

λ is a UNIX system?! I know this!

Slide 13

Slide 13 text

Let's run SSH in λ λ ssh process SSH Tunnel Firewall: no inbound ports

Slide 14

Slide 14 text

SSH in Lambda Architecture λ node.js wrapper go sshd binary (x64) Go SSH Crypto Libs process.exec() https://github.com/smithclay/faassh

Slide 15

Slide 15 text

Max Session Length: 5 minutes (custom prompt optional) https://github.com/smithclay/faassh

Slide 16

Slide 16 text

Info from /proc 2x Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz cat /proc/cpuinfo 3857664 kB cat /proc/meminfo ixgbevf (EC2 10Gbps Network Driver) cat /proc/modules c4.large EC2 Compute-Optimized Instance (?)

Slide 17

Slide 17 text

c4.large instance λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λs in theory on a VM 10 Gbps λ = Running 128 MB Function λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ = Not Running 128 MB Function

Slide 18

Slide 18 text

Frozen functions help avoid cold starts https://www.kernel.org/doc/Documentation/cgroup-v1/freezer- subsystem.txt cgroup freezer subsystem λ λ λ

Slide 19

Slide 19 text

Internals Recap It's just containers on a VM Functions frozen when not running No magic unikernels involved

Slide 20

Slide 20 text

How do we measure and prevent cold starts? "Use Kubernetes" — Troll

Slide 21

Slide 21 text

Cold Start Discovery var SO_SO_COLD = true; exports.handler = function(trigger, cb) { console.log('Cold? %s', SO_SO_COLD); SO_SO_COLD = false; return callback(cb) } https://github.com/smithclay/lambda-proc-info

Slide 22

Slide 22 text

Warming automation Scheduled Event λ Only effective for non- concurrent execution! 4 minute interval http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

Slide 23

Slide 23 text

Sending cold start events to analytics λ console.log(coldStart) Logs λ POST to Event DB (Insights) CloudWatch Log Filter Trigger

Slide 24

Slide 24 text

Cold Starts Visualized ~7 hrs ~8 hrs

Slide 25

Slide 25 text

λ Host Uptime Cold starts happen when hosts change! ~8hrs

Slide 26

Slide 26 text

λ Host Subnet Hopping 10.13 10.12 10.11 10.13 10.12 10.12 10.12 10. 11 # of AZs in us-west-2: 3

Slide 27

Slide 27 text

What's the maximum concurrency of your function? one > 1 A scheduled event will warm it until host retires. More advanced strategy needed*

Slide 28

Slide 28 text

"Advanced" Strategy for i in `seq 1 $NUM_EXECUTIONS`; do echo "[$i] Executing $AWS_LAMBDA_FUNCTION_NAME..." aws lambda invoke ... done https://gist.github.com/smithclay/e89dfe35fe2a4938db56bb12df76777c

Slide 29

Slide 29 text

Multiple containers running on a single host to serve parallel requests Tracking /proc/sys/kernel/random/boot_id and hostname Confirmed: cold start happens on container init.

Slide 30

Slide 30 text

So is this just a container PaaS? High-availability/multiple zones Elastic fleet of compute-optimized VMs A very good scheduling algorithm Design (freezing, limits, etc) for very fast invocation Only if your PaaS has...

Slide 31

Slide 31 text

FaaS in Production Reality λ λ Dev Prod Orchestration (!?) Version/Deploy Monitoring Security Cold Start Mgmt The "learning cliff" Great tweet from @mfdii

Slide 32

Slide 32 text

FaaS Isn't a Silver Bullet "I've got a fast, computationally-intensive task that I need to perform occasionally in response to a well-defined event that isn't that sensitive to latency." —The Ideal FaaS Developer // TO DO: measure && share results

Slide 33

Slide 33 text

Thanks. @smithclay New Relic Gluecon 5/24/17