Slide 1

Slide 1 text

An ARM and a Leg Or: How I learned to stop worrying and love Graviton T o m e r G a b e l Tel-Aviv, 29 May 2024 //

Slide 2

Slide 2 text

First things first 1. This isn’t a sales pitch - I don’t work for Amazon - I don’t know your situation Photo: Money by 401(K) 2012 (CC)

Slide 3

Slide 3 text

First things first 1. This isn’t a sales pitch - I don’t work for Amazon - I don’t know your situation 2. This isn’t really about Graviton - arm64 is all the rage - Many options out there

Slide 4

Slide 4 text

Second things second 1. Hi, I’m Tomer Gabel! - Engineer, architect, grump - Freelancer & consultant

Slide 5

Slide 5 text

Second things second 1. Hi, I’m Tomer Gabel! - Engineer, architect, grump - Freelancer & consultant 2. Helped a large client migrate to Graviton

Slide 6

Slide 6 text

Second things second 1. Hi, I’m Tomer Gabel! - Engineer, architect, grump - Freelancer & consultant 2. Helped a large client migrate to Graviton 3. My opinions are my own

Slide 7

Slide 7 text

Let’s rock & roll

Slide 8

Slide 8 text

Why bother? 1. arm64 is abuzz but isn’t new - Old hat for embedded software - Virtual monopoly in mobile - New to desktop- and server-class Photos: Raimond Spekking , Wandelopa, Skitterphoto (CC)

Slide 9

Slide 9 text

Why bother? 2. arm64-based servers promise better value for money - Forter runs thousands of nodes - Cost savings aggregate quickly Photo: Stack of coins by Jam Willem Doormembal (CC)

Slide 10

Slide 10 text

Why bother? 3. Main development environment (MacOS) is now on arm64, requiring: - arm64 builds to work locally - arm64 on server to debug effectively Photo: M2 Macbook Air Starlight model by KKPCW (CC)

Slide 11

Slide 11 text

Graviton at “We build systems to protect eCommerce from fraud and abuse. We take pride in building the foundations for a safer Internet at massive scale.” --forter.dev

Slide 12

Slide 12 text

“We build systems to protect eCommerce from fraud and abuse. We take pride in building the foundations for a safer Internet at massive scale.” --forter.dev Graviton at

Slide 13

Slide 13 text

eCommerce safer at scale • High reliability, low latency • Security reigns supreme • Everything is auditable • Tightly regulated • Risk-averse environment Graviton at

Slide 14

Slide 14 text

1. Heterogenous workloads - Directly on VMs in EC2 - Dockerized on EC2 - Containerized on EKS 2. Polyglot stack - Python, Node.js, JVM… Graviton at

Slide 15

Slide 15 text

This represents the worst-case scenario for migration.

Slide 16

Slide 16 text

The Two Towers Virtual Machines (EC2) Provisioning (Chef) Forter setup Dependencies Base image build (Packer) Initial setup Prerequisites Image source Ubuntu 22.04 / CIS-hardened Docker Containers Service layers App code Glue logic Forter base images Customization “Blessed” stacks Image source ubuntu:22.04 alpine:3.8

Slide 17

Slide 17 text

The first milestone Bring-up • Deployment infrastructure • Compatible base image (Packer) Provision • Chef + Ruby gems • Base recipes, components Serve • Bootstrap base images • Update components as needed

Slide 18

Slide 18 text

arm64 support in Linux is old hat • But the ecosystem… ugh • Trouble vectors include: - Docker <= 19.x - Chef cookbooks (docker, lvm) - Vagrant + AWS provider - Python 2.x broken on Ubuntu! - No Node.js <14 builds There’s a pattern here… Photo: Gorilla Scratching Head by Eric Kilby (CC)

Slide 19

Slide 19 text

Extending the build system 1. Custom build system - Jenkins + Pipeline + plugins 2. Self-hosted runners - Same base images - Same provisioning flow - Same deployment infrastructure

Slide 20

Slide 20 text

Bootstrapping 1. Emulation does work! - qemu - binfmt

Slide 21

Slide 21 text

Bootstrapping 1. Emulation does work! - qemu - binfmt 2. Well, kind of… - Docker version - Bugs all the way down

Slide 22

Slide 22 text

Extending the build system

Slide 23

Slide 23 text

1. Bootstrapped build system with x64/arm64 native runners 2. Full stack of arm64 Docker base images 3. Modified Jenkinsfile with support for multiple architectures

Slide 24

Slide 24 text

Will it blend? “… cluster is behaving well with read latency of P95=P99=1ms …” vs. “~20% decrease in supported RPS"

Slide 25

Slide 25 text

Will it blend? “… cluster is behaving well with read latency of P95=P99=1ms …” vs. “~20% decrease in supported RPS" It depends. You knew this was coming.

Slide 26

Slide 26 text

arm64 migration: Key Takeaways 1. Migrating is easier than you think, although: - Homegrown systems may require delicate surgery - It forces you to pay technical debt 2. The most useful advice bar none: - Use emulation for bootstrapping only - Use uname –m and docker buildx imagetools

Slide 27

Slide 27 text

tomer@substrate.co.il @substrate_eng https://github.com/holograph Thank you for listening Questions? Substrate