An ARM and a Leg
Or: How I learned to stop
worrying and love Graviton
T o m e r G a b e l
Tel-Aviv, 29 May 2024
//
Slide 2
Slide 2 text
First things first
1. This isn’t a sales pitch
- I don’t work for Amazon
- I don’t know your situation
Photo: Money by 401(K) 2012 (CC)
Slide 3
Slide 3 text
First things first
1. This isn’t a sales pitch
- I don’t work for Amazon
- I don’t know your situation
2. This isn’t really about Graviton
- arm64 is all the rage
- Many options out there
Slide 4
Slide 4 text
Second things second
1. Hi, I’m Tomer Gabel!
- Engineer, architect, grump
- Freelancer & consultant
Slide 5
Slide 5 text
Second things second
1. Hi, I’m Tomer Gabel!
- Engineer, architect, grump
- Freelancer & consultant
2. Helped a large client
migrate to Graviton
Slide 6
Slide 6 text
Second things second
1. Hi, I’m Tomer Gabel!
- Engineer, architect, grump
- Freelancer & consultant
2. Helped a large client
migrate to Graviton
3. My opinions are my own
Slide 7
Slide 7 text
Let’s rock & roll
Slide 8
Slide 8 text
Why bother?
1. arm64 is abuzz but isn’t new
- Old hat for embedded software
- Virtual monopoly in mobile
- New to desktop- and server-class
Photos: Raimond Spekking , Wandelopa, Skitterphoto (CC)
Slide 9
Slide 9 text
Why bother?
2. arm64-based servers promise
better value for money
- Forter runs thousands of nodes
- Cost savings aggregate quickly
Photo: Stack of coins by Jam Willem Doormembal (CC)
Slide 10
Slide 10 text
Why bother?
3. Main development environment
(MacOS) is now on arm64, requiring:
- arm64 builds to work locally
- arm64 on server to debug effectively
Photo: M2 Macbook Air Starlight model by KKPCW (CC)
Slide 11
Slide 11 text
Graviton at
“We build systems to protect eCommerce
from fraud and abuse. We take pride in
building the foundations for a safer Internet
at massive scale.” --forter.dev
Slide 12
Slide 12 text
“We build systems to protect eCommerce
from fraud and abuse. We take pride in
building the foundations for a safer Internet
at massive scale.” --forter.dev
Graviton at
Slide 13
Slide 13 text
eCommerce
safer
at scale
• High reliability, low latency
• Security reigns supreme
• Everything is auditable
• Tightly regulated
• Risk-averse environment
Graviton at
Slide 14
Slide 14 text
1. Heterogenous workloads
- Directly on VMs in EC2
- Dockerized on EC2
- Containerized on EKS
2. Polyglot stack
- Python, Node.js, JVM…
Graviton at
Slide 15
Slide 15 text
This represents the
worst-case scenario
for migration.
Slide 16
Slide 16 text
The Two Towers
Virtual Machines (EC2)
Provisioning (Chef)
Forter setup Dependencies
Base image build (Packer)
Initial setup Prerequisites
Image source
Ubuntu 22.04 / CIS-hardened
Docker Containers
Service layers
App code Glue logic
Forter base images
Customization “Blessed” stacks
Image source
ubuntu:22.04 alpine:3.8
Slide 17
Slide 17 text
The first milestone
Bring-up
• Deployment infrastructure
• Compatible base image (Packer)
Provision
• Chef + Ruby gems
• Base recipes, components
Serve
• Bootstrap base images
• Update components as needed
Slide 18
Slide 18 text
arm64 support in Linux is old hat
• But the ecosystem… ugh
• Trouble vectors include:
- Docker <= 19.x
- Chef cookbooks (docker, lvm)
- Vagrant + AWS provider
- Python 2.x broken on Ubuntu!
- No Node.js <14 builds
There’s
a pattern
here…
Photo: Gorilla Scratching Head by Eric Kilby (CC)
Slide 19
Slide 19 text
Extending the build system
1. Custom build system
- Jenkins + Pipeline + plugins
2. Self-hosted runners
- Same base images
- Same provisioning flow
- Same deployment
infrastructure
Slide 20
Slide 20 text
Bootstrapping
1. Emulation does work!
- qemu
- binfmt
Slide 21
Slide 21 text
Bootstrapping
1. Emulation does work!
- qemu
- binfmt
2. Well, kind of…
- Docker version
- Bugs all the way down
Slide 22
Slide 22 text
Extending the build system
Slide 23
Slide 23 text
1. Bootstrapped build system with x64/arm64 native runners
2. Full stack of arm64 Docker base images
3. Modified Jenkinsfile with support for multiple architectures
Slide 24
Slide 24 text
Will it blend?
“… cluster is behaving well with read
latency of P95=P99=1ms …”
vs.
“~20% decrease in supported RPS"
Slide 25
Slide 25 text
Will it blend?
“… cluster is behaving well with read
latency of P95=P99=1ms …”
vs.
“~20% decrease in supported RPS"
It depends.
You knew this
was coming.
Slide 26
Slide 26 text
arm64 migration: Key Takeaways
1. Migrating is easier than you think, although:
- Homegrown systems may require delicate surgery
- It forces you to pay technical debt
2. The most useful advice bar none:
- Use emulation for bootstrapping only
- Use uname –m and docker buildx imagetools
Slide 27
Slide 27 text
tomer@substrate.co.il
@substrate_eng
https://github.com/holograph
Thank you for listening
Questions?
Substrate