Slide 1

Slide 1 text

Beyond Portability: Live Migration for Evolving WebAssembly Workloads Japan Community Day at KubeCon + CloudNativeCon Japan 2025 Yuki Nakata (SAKURA internet Inc. / Future University Hakodate) Daigo Fujii (Future University Hakodate) 1

Slide 2

Slide 2 text

Self Introduction Yuki Nakata ● Researcher at SAKURA internet Inc. ● Ph.D. student at Future University Hakodate ● X: @chiku_wait Daigo Fujii ● Master's student at Future University Hakodate ● X: @fun_7776 2 2 Our Interests: OS, Virtualization, and Wasm Current Research Topic: Live Migration for Wasm

Slide 3

Slide 3 text

Cross-platform Portability of Wasm ● Write once, run anywhere … Compile to Wasm Wasm App Wasm App Wasm App Run on Any Computing Platform 3

Slide 4

Slide 4 text

Diversity of Wasm Runtimes 4 [1]Yixuan Zhang, Mugeng Liu, Haoyu Wang, Yun Ma, Gang Huang, and Xuanzhe Liu. 2025. Research on WebAssembly Runtimes: A Survey. ACM Trans. Softw. Eng. Methodol. Just Accepted (January 2025). https://doi.org/10.1145/3714465 ● 100+ runtimes with different characteristics [1] ● Wasmtime ○ High performance with JIT compilation ● WasmEdge ○ Rich extensions for AI/LLM workloads ● WAMR ○ Low memory usage for embedded systems

Slide 5

Slide 5 text

Wasm Meets Edge Computing ● Distributed heterogeneous computing ○ Different CPU architectures, OSs, and platform characteristics ● Wasm enables easy deployment of apps to various platforms ○ Achieve efficient execution apps with the most suitable runtime for each platform Wasmtime Wasm App Wasm App WAMR Wasm App 5

Slide 6

Slide 6 text

Edge Computing with Live Migration ● Move apps between machines while maintaining their execution status f(x) Offload Heavy Tasks f(x) 6

Slide 7

Slide 7 text

Edge Computing with Live Migration ● Move apps between machines while maintaining their execution status f(x) Offload Heavy Tasks Task Handoff That Follows User Mobility f(x) f(x) f(x) 7

Slide 8

Slide 8 text

Goal: Live Migration Among Heterogeneous Runtimes ● Wasm gain mobility as well as portability ○ Deploy running app/tasks closest to users ● Switch runtimes and platforms according to tasks and requirements ○ Change the platform to suit the app's processing context 8 Wasmtime Wasm App WAMR Wasm App Wasm App

Slide 9

Slide 9 text

Challenges: Depends on Runtime Implementations Wasm VM of Runtime A Frame Stack 32bit Value Stack 0xFFFFFFFF Linear Memory Program Counter Wasm VM of Runtime B Frame Stack 64bit Value Stack Linear Memory Program Counter ≠ 0xFFFFFFFFF FFFFFFF Locals:… module:… Internal:… Differences in Execution State Implementation 9

Slide 10

Slide 10 text

Challenges: Depends on Runtime Implementations Wasm VM of Runtime A Frame Stack 32bit Value Stack 0xFFFFFFFF Linear Memory Program Counter Wasm VM of Runtime B Frame Stack 64bit Value Stack Linear Memory Program Counter ≠ 0xFFFFFFFFF FFFFFFF Locals:… module:… Internal:… Differences in Execution State Implementation Program Counter Wasm VM  OS  Stack  Program   Counter Moving Execution State to Outside VM by JIT/AOT compilation 10

Slide 11

Slide 11 text

Two Different Approaches 1. Convert execution states between runtimes ○ Designed for major interpreter runtime ○ Between WAMR, WasmEdge and Wasm3 2. Self-hosted runtime for runtime neutral checkpointing/restoring (C/R) ○ Designed for JIT/AOT compilation enabled runtime 11 11

Slide 12

Slide 12 text

1. Convert Execution States Between Runtimes 12 Daigo Fujii

Slide 13

Slide 13 text

Overview Convert execution state across different runtimes 13 32bit Value Stack 0xFFFFFFFF PC 0xdeadbe 0x11111111 Wasm Bytecode Linear Memory C/R Mechanism WAMR Wasm Bytecode Linear Memory C/R Mechanism 64bit Value Stack PC 0xff1100 0x00000000 FFFFFFFF … 0x11111111 Wasm Bytecode Linear Memory C/R Mechanism 64bit Value Stack PC 0xff1100 0x00000000 FFFFFFFF … 0x11111111 Wasm 3

Slide 14

Slide 14 text

The Execution State within Wasm VM 14 Memory Instance Global Instance $g1 = 100 $g2 = 3.14 … 010101 111101 011011 ● Defined by the Wasm spec ● Memory Instance, Program Counter, and Value Stack change during execution ○ Need to checkpoint and restore … Instr … 1001 1002 1003 PC 1002 3. Value Stack ● functions’ local values ● immediate values 128 100100 3.14 … 2. Program Counter 1. Module Instance

Slide 15

Slide 15 text

Technical Challenges 1. Program Counter (PC) Counting 2. Memory Layout of the Value Stack 3. Optimized Custom Instructions WAMR/ Wasm3 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x6a (i32.add) 0x00 0x21 (local.set) WasmEdge opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x0a OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x14 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x6a operand: null OpCode 4 1 3 5 6 2 opcode: 0x36 operand: 0xff, 0x10 opcode: 0x21 operand: 0x00 OpCode 1 2 3 4 7 WasmEdge 00000000 12345678 … WAMR/ Wasm3 0010 1234 0101 5678 … 00000000 00000010 00000000 00000101 i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add Wasm3 optimization i32.add 20 30 i32.add 10 top 1 2 $func_add 15 ● Differences in implementation of execution state between runtimes ❌

Slide 16

Slide 16 text

3-A: Optimization Makes PC Different between Runtimes WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 ? Cannot Restore Omitted Instrs 16 0x6a (i32.add) 6 i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add i32.add 20 30 i32.add 10 top 1 2 $func_add Wasm3 optimization Convert Stack Push Instrs to Immediate Value

Slide 17

Slide 17 text

3-B: Optimization Makes Stacks Different Among Runtimes 10 20+30 17 i32.const 10 i32.const 20 i32.const 30 i32.add i32.add 1 2 3 4 5 $func_add i32.add 20 30 i32.add 10 top 1 2 $func_add Wasm3 optimization WAMR/WasmEdge stack Wasm3 stack 20+30 ● Optimization reversed the evaluation order of immediates. ● As a result, stack contents may differ between normal and Wasm3 at a checkpoint.

Slide 18

Slide 18 text

Solution 1: Resolving Differences in PC Counting 18 WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x6a (i32.add) 0x00 0x21 (local.set) WasmEdge opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x0a pc counting: 1 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x41 operand: 0x14 pc counting: 3 OpCode opcode: 0x36 operand: 0xff, 0x10 opcode: 0x6a operand: null pc counting: 5 OpCode 4 1 3 5 6 2 opcode: 0x36 operand: 0xff, 0x10 opcode: 0x21 operand: 0x00 pc counting: 6 OpCode 1 2 3 4 7 ● Link the PC counting based on WAMR to each opcode in WasmEdge. ● Calculate it at Wasm code load time, enabling computation without additional runtime cost.

Slide 19

Slide 19 text

Solution 2: Using a Type Stack to Resolve Value Stack Layout Differences WasmEdge 0010 12345678 0101 I32 I64 I32 … I32 I64 I32 … Type Stack conversion WAMR 0010 1234 0101 5678 … I32 I64 I32 To discern boundaries Removal zero padding Introduce a type stack 19

Slide 20

Slide 20 text

Solution 3-A: Mapping Omitted Instructions to Restore PC Correspondence 20 ● Map omitted instructions to the next instructions that consume their values ● Their values are already embedded in those instructions. ● This allows semantically correct restoration even from skipped instructions. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 i32.const 0x14 and i32.const 0x22 are skipped as execution resumes at i32.add,

Slide 21

Slide 21 text

21 ● Map omitted instructions to the next instructions that consume their values ● Their values are already embedded in those instructions. ● This allows semantically correct restoration even from skipped instructions. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 but, their values appear as arguments, so there’s no issue. WAMR 0x41 (i32.const) 0x0a 0x41 (i32.const) 0x14 0x41 (i32.const) 0x6a (i32.add) 0x22 1 2 3 4 5 6 7 8 Wasm3 0x6a (i32.add) 0x22 0x14 0x6a (i32.add) [top] 0x0a 4 1 3 5 2 0x6a (i32.add) 6 Solution 3-A: Mapping Omitted Instructions to Restore PC Correspondence

Slide 22

Slide 22 text

Solution 3-B: Using Instruction Stack Mapping to Restore Skipped Instructions 22 10 20+30 20+30 WAMR/WasmEdge Wasm3 Fill in the value 10 produced by i32.const. i32.add i32.const … Instruction Stack Introduce a instruction stack ● Instruction Stack: a stack that tracks which instruction produced each value on the value stack. ● Wasm3 uses this to identify which omitted instruction generated each value and reconstructs the stack accordingly.

Slide 23

Slide 23 text

2. Self-hosted Runtime for Runtime Neutral Checkpointing/Restoring 23 Yuki Nakata

Slide 24

Slide 24 text

Overview ● Wasm-ized Wasm runtime with C/R mechanism ○ Execute app bytecode via self-hosted runtime ○ No need to modify host Wasm runtimes ● Neutral to host runtime JIT/AOT optimization ○ Self-hosted runtime manages execution state for C/R 24 Wasm Bytecode Self-hosted Runtime Any Host Runtime Stacks Linear Memory 0x00 0xff Program Counter Stacks Linear Memory 0x00 0xff Program Counter C/R Mechanism Manage execution state

Slide 25

Slide 25 text

Technical Challenge: Overhead by Self-hosted Runtime[2] 25 Duplicate Sandbox Mechanism Explosion in the Number of Instructions Runtimes Total Instructions in Benchmark Wasm3 400,849,582,525 Self-hosted Wasm3 on Wasm3 318,276,978,517,504 794x Self-hosted Rntime App Host Runtime Sandbox Self-hosted Runtime Sandbox Duplicate validation 0x00 0xff 0x00 0xff Duplicate Boundary Check ● Protect the execution environment from malicious programs ● Known runtime performance overhead ● Wasm3: Interpreter-based OSS runtime with self-hosting support [2]Y. Nakata and K. Matsubara, “Poster: Feasibility of Runtime-Neutral Wasm Instrumentation for Edge-Cloud Workload Handover”, pp. 528–530, Dec. 2024, doi: https://doi.org/10.1109/sec62691.2024.00068.

Slide 26

Slide 26 text

26 Strategies ● Implement a original runtime designed for self-hosting ● Optimization for a self-hosted runtime 1. Reduce sandbox mechanism 2. Reduce Wasm Instructions 3. Offload Hotspots to Host Runtime

Slide 27

Slide 27 text

Reduce Sandbox Mechanism ● Remove sandboxing in the self-hosted runtime ● Maintain isolation in the host runtime sandbox ○ Execute self-hosted runtime instructions within the host sandbox 27 Self-hosted Rntime App Host Runtime Sandbox Self-hosted Runtime Sandbox Use Only Host Sandbox 0x00 0xff 0x00 0xff

Slide 28

Slide 28 text

Reduce Wasm Instructions ● Self-hosted normal interpreter converts a single instruction into multiple instructions ● Instruction handlers using inline Wasm ○ Instruction Pass-through to Host Runtime instructions processing 28 fn f64_nearest(…) -> …{ let x = value_stack.pop(); let y = x.fract(); let result = if y == 0.5 { x.floor() } else if y == -0.5 { x.ceil() } else { x.round() }; … fn f64_nearest(…) -> …{ let x = value_stack.pop(); asm!( "local.get {0}", "f64.nearest", "local.set {1}", in(local) x, out(local) result, ); … }

Slide 29

Slide 29 text

Offload Hotspots to Host Runtime ● Detects hotspots using a tracing mechanism in the self-hosted runtime ● Prohibit C/R while offloading to Host ○ The execution status of offloaded tasks exists in the host runtime 29 Host Runtime Self-hosted Wasm Runtime Wasm Bytecode f(x) f(x) f(x) Tracing Mechanism Offload to Host Runtime Find Frequently Executed Functions and Blocks

Slide 30

Slide 30 text

Demonstration with the Approach 1 Impl. 30

Slide 31

Slide 31 text

Sample: SQLite Stateful App in Wasm 31

Slide 32

Slide 32 text

Demo: Migrate the Running App on WAMR to WasmEdge 32

Slide 33

Slide 33 text

How Does Live-migration Evolve Apps? 33

Slide 34

Slide 34 text

Combine Advantages of Multiple Runtimes to Run an App. ● Wasm module initialization performance varies depending on runtimes ● Switch the runtime used for initialization and execution of instructions ○ Faster than running apps with a single runtime 34

Slide 35

Slide 35 text

Wasmtim e Handle Resource Exhaustion with the Runtime Switching 35 ● Switch runtime based on the load of the node hosting Wasm Apps ○ Low load: High throughput runtime ○ High load: Low memory consumption runtime Differences in Memory Usage between Runtimes Wasmtim e App App High Memory Load! WAMR App WAMR App

Slide 36

Slide 36 text

“Hot” Healing and Scaling for Stateful Applications ● Restore the app from a checkpoint when rebooting/scaling the app ○ maintain volatile information (e.g., cache and memory states) 36 Execution State Restore App

Slide 37

Slide 37 text

How much the Migration Improves App Response Perf. 37 Restarte d Pain caused by C/R Restore the ideal perf. JUST after the HOT restart

Slide 38

Slide 38 text

Current Status & Future Directions 38

Slide 39

Slide 39 text

Current Status ● Convert Execution States Between Runtimes ○ ✅ C/R between WAMR and WasmEdge ○ 🚧 Wasm3 support ● Self-hosted Runtime for Runtime Neutral C/R (PoC: https://github.com/oss-fun/chiwawa) ○ ✅C/R for Wasm MVP on any runtimes(e.g., WAMR, Wasmtime and WasmEdge) ○ 🚧WASI preview1 implementation (Only Supported fd_write) ○ 🚧Offload hotspots ● Wanna release our code and contribute for the OSS Wasm community ○ C/R between same runtimes ○ Looking for more practical use cases 39 今後もがんばります! 😇