$30 off During Our Annual Pro Sale. View Details »

svc-hook: hooking system calls on ARM64 by bina...

Avatar for Akira Moroo Akira Moroo
December 18, 2025

svc-hook: hooking system calls on ARM64 by binary rewriting

Avatar for Akira Moroo

Akira Moroo

December 18, 2025
Tweet

More Decks by Akira Moroo

Other Decks in Research

Transcript

  1. svc-hook: hooking system calls on ARM64 by binary rewriting Akira

    Moroo (Ricerca Security, Inc.); Hajime Tazaki, Kenichi Yasukata (IIJ Research Laboratory) ACM/IFIP Middleware 2025 Main Conference Paper Slides 1
  2. System Call • System calls are the primary interface for

    user-space programs to access OS kernel functionality. 2 user-space program kernel-space OS subsystem user-space kernel-space System Call Hook 2
  3. System Call Hook • System calls are the primary interface

    for user-space programs to access OS kernel functionality. • System call hooks allow us to intercept a system call 3 user-space program kernel-space OS subsystem user-space kernel-space System Call Hook
  4. System Call Hook • System calls are the primary interface

    for user-space programs to access OS kernel functionality. • System call hooks allow us to intercept a system call and redirect execution to a user-de fi ned hook function. 4 user-space program kernel-space OS subsystem user-space kernel-space System Call Hook user-de fi ned hook function
  5. Motivating Use Case • System call hook mechanisms allow us

    to transparently apply user- space OS subsystems to existing applications. 5 user-space program kernel-space OS subsystem user-space kernel-space user-space OS subsystem
  6. Motivating Use Case • System call hook mechanisms allow us

    to transparently apply user- space OS subsystems to existing applications. 6 user-space program kernel-space OS subsystem user-space kernel-space user-space OS subsystem Highly Performant TCP ping-pong performance Throughput [K reqs/sec] 0 100 200 300 400 500 Linux TCP stack lwIP on DPDK 8.8 times faster =user-space TCP Stack
  7. Motivating Use Case 7 user-space program kernel-space OS subsystem user-space

    kernel-space user-space OS subsystem user-de fi ned hook function • System call hook mechanisms allow us to transparently apply user- space OS subsystems to existing applications. • A system call hook can transparently glue user-space subsystems to existing applications. System Call Hook
  8. Motivating Use Case 8 user-space program kernel-space OS subsystem user-space

    kernel-space user-space OS subsystem user-de fi ned hook function There are several options • System call hook mechanisms allow us to transparently apply user- space OS subsystems to existing applications. • A system call hook can transparently glue user-space subsystems to existing applications. System Call Hook
  9. Motivating Use Case • System call hook mechanisms allow us

    to transparently apply user- space OS subsystems to existing applications • A system call hook can transparently glue user-space subsystems to existing applications. 9 user-space program kernel-space OS subsystem user-space kernel-space user-space OS subsystem user-de fi ned hook function System Call Hook Categories of system call hook mechanisms - Common Kernel Support - BPF-based Hooks - Non-upstreamed Extensions - Function Call Hooking - Binary Rewriting
  10. Groups of System Call Hook Mechanisms 10 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  11. Groups of System Call Hook Mechanisms 11 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ High overhead due to process scheduling between tracer and tracee Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  12. Groups of System Call Hook Mechanisms 12 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  13. Groups of System Call Hook Mechanisms 13 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Cannot achieve System Call Emulation due to BPF VM restrictions Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  14. Groups of System Call Hook Mechanisms 14 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  15. Groups of System Call Hook Mechanisms 15 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ modifying kernels or standard libraries. Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  16. Groups of System Call Hook Mechanisms 16 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Concerns about security, stability, and future maintenance costs Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  17. Groups of System Call Hook Mechanisms 17 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  18. Groups of System Call Hook Mechanisms 18 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ Cannot hook syscalls from unknown/invisible functions Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  19. Groups of System Call Hook Mechanisms 19 Low Performance Overhead

    System Call Emulation Easy-to-Use Instruction-level Hook Binary Rewriting ✔ ✔ ✔ ✔ Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ have advantages over other options Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  20. Problem 20 Low Performance Overhead System Call Emulation Easy-to-Use Instruction-level

    Hook Binary Rewriting ✔ ✔ ✔ ✔ Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ Previous Mechanisms: Instruction Punning, e9patch, DataHook, lazypoline, X-Containers, zpoline … Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  21. Problem 21 Low Performance Overhead System Call Emulation Easy-to-Use Instruction-level

    Hook Binary Rewriting ✔ ✔ ✔ ✔ Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ Previous Mechanisms: designed for x86 Instruction Punning, e9patch, DataHook, lazypoline, X-Containers, zpoline … Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  22. Problem 22 Low Performance Overhead System Call Emulation Easy-to-Use Instruction-level

    Hook Binary Rewriting ✔ ✔ ✔ ✔ Common Kernel Support (ptrace) ✔ ✔ ✔ BPF-based Hooks ✔ ✔ ✔ Non-upstreamed Extensions ✔ ✔ ✔ Function Call Hooking (LD_PRELOAD) ✔ ✔ ✔ Previous Mechanisms: designed for x86 for ARM64: HermiTux, ASC-Hook Instruction Punning, e9patch, DataHook, lazypoline, X-Containers, zpoline … only a few choices for ARM64 Table 1: Categories of system call hook mechanisms and their properties (§ 2).
  23. Binary Rewriting Approach • On ARM64 CPUs, an svc instruction

    triggers a system call 24 svc … … … … virtual memory
  24. Binary Rewriting Approach • On ARM64 CPUs, an svc instruction

    triggers a system call • svc: 0x01 0x00 0x00 0xd4 (#imm is 0) 25 svc … … … … virtual memory
  25. Binary Rewriting Approach • On ARM64 CPUs, an svc instruction

    triggers a system call • svc: 0x01 0x00 0x00 0xd4 (#imm is 0) • Our goal: 26 … … … … virtual memory user-de fi ned hook function svc
  26. Binary Rewriting Approach • On ARM64 CPUs, an svc instruction

    triggers a system call • svc: 0x01 0x00 0x00 0xd4 (#imm is 0) • Our goal: • replace svc with something that jumps to a user-de fi ned hook function. 27 ??? … … … … virtual memory Jump user-de fi ned hook function
  27. Binary Rewriting Approach • On ARM64 CPUs, an svc instruction

    triggers a system call • svc: 0x01 0x00 0x00 0xd4 (#imm is 0) • Our goal: • replace svc with something that jumps to a user-de fi ned hook function. 28 ??? … … … … virtual memory Jump user-de fi ned hook function Question: what should we put there?
  28. svc Replacement Primitive • Each instruction is 4 bytes fi

    xed size 29 … … … … virtual memory user-de fi ned hook function svc
  29. svc Replacement Primitive • Each instruction is 4 bytes fi

    xed size • -> We can replace svc with any other single instruction 30 … … … … virtual memory user-de fi ned hook function svc
  30. Review Existing Methods • Each instruction is 4 bytes fi

    xed size • -> We can replace svc with any other single instruction • Existing Methods: • #1: bl (HermiTux [VEE '19]) • #2: br (ASC-Hook [LCTES '25]) 31 … … … … virtual memory user-de fi ned hook function ???
  31. Review Existing Methods • Each instruction is 4 bytes fi

    xed size • -> We can replace svc with any other single instruction • Existing Methods: • #1: bl (HermiTux [VEE '19]) • #2: br (ASC-Hook [LCTES '25]) 32 … … … … virtual memory user-de fi ned hook function ???
  32. Possible Primitive #1: bl • Each instruction is 4 bytes

    fi xed size • -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option 33 … … … … virtual memory user-de fi ned hook function bl?
  33. Possible Primitive #1: bl • Each instruction is 4 bytes

    fi xed size • -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call 34 … … … … virtual memory user-de fi ned hook function bl? Jump?
  34. Pitfall: bl Breaks Return Address • Each instruction is 4

    bytes fi xed size • -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register 35 … … … … virtual memory user-de fi ned hook function bl?
  35. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 36 … … … … virtual memory user-de fi ned hook function bl? Pitfall: bl Breaks Return Address
  36. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 37 … … … … virtual memory user-de fi ned hook function bl? R1 Pitfall: bl Breaks Return Address
  37. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 38 … … … … virtual memory user-de fi ned hook function bl? pc R1 register state: pc: R1-4 x30: <original return address> Pitfall: bl Breaks Return Address
  38. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 39 … … … … virtual memory user-de fi ned hook function bl? pc R1 register state: pc: R1-4 x30: <original return address> Pitfall: bl Breaks Return Address
  39. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 40 … … … … virtual memory user-de fi ned hook function bl? pc R1 register state: pc: R1 x30: <original return address> Pitfall: bl Breaks Return Address
  40. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 41 … … … … virtual memory user-de fi ned hook function bl? pc R1 register state: pc: R1 x30: R1+4 Pitfall: bl Breaks Return Address
  41. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 42 … … … … virtual memory user-de fi ned hook function bl? pc R1 register state: pc: <hook entry> x30: R1+4 Pitfall: bl Breaks Return Address
  42. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 43 … … … … virtual memory user-de fi ned hook function bl? pc R1 ret register state: pc: <hook exit> x30: R1+4 Pitfall: bl Breaks Return Address
  43. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 44 … … … … virtual memory user-de fi ned hook function bl? pc R1 ret register state: pc: R1+4 x30: R1+4 Pitfall: bl Breaks Return Address
  44. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 45 ret … … … … virtual memory user-de fi ned hook function bl? pc R1 ret register state: pc: <end of the function> x30: R1+4 bl: Return Address Lost
  45. • Each instruction is 4 bytes fi xed size •

    -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 46 ret … … … … virtual memory user-de fi ned hook function bl? pc R1 ret Cannot return to the caller register state: pc: <end of the function> x30: R1+4 bl: Return Address Lost
  46. bl: Return Address Lost • Each instruction is 4 bytes

    fi xed size • -> We can replace svc with any other single instruction • At fi rst glance, a bl instruction looks reasonable option • bl: used for function call • Pitfall: bl saves return address to x30 register -> Loses original return address 47 ret … … … … virtual memory user-de fi ned hook function bl? pc R1 ret Cannot return to the caller register state: pc: <end of the function> x30: R1+4 • HermiTux [VEE '19] uses this approach • It mitigates this by checking the lost of the return address is critical using binary analysis • It has fallbacks such as trap-based approach • -> High performance penalty
  47. • Another option is br instruction 48 … … …

    … virtual memory user-de fi ned hook function br? Possible Primitive #2: br
  48. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 49 … … … … virtual memory user-de fi ned hook function br? Possible Primitive #2: br
  49. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach 50 … … … … virtual memory user-de fi ned hook function br? br x8: zpoline-Like Approach
  50. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux 51 … … … … virtual memory user-de fi ned hook function br? sets syscall_nr to x8 br x8: zpoline-Like Approach
  51. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 52 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 br x8: zpoline-Like Approach
  52. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 53 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 Jump to x8 br x8: zpoline-Like Approach
  53. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 54 … … … … virtual memory br x8? sets syscall_nr to x8 0x0 Jump to x8 br x8: zpoline-Like Approach user-de fi ned hook function
  54. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 55 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 bl hook 0x0 Jump br x8: zpoline-Like Approach
  55. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 56 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 0x0 Jump to x8 !!!!Program stops due to the PC misalignment fault !!!! Pitfall of br x8: PC Alignment
  56. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 57 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 0x0 Jump to x8 !!!!Program stops due to the PC misalignment fault !!!! Pitfall of br x8: PC Alignment The system call number is not always a multiple of 4, so it violates the requirement in most cases
  57. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 58 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 0x0 Jump to x8 Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction br x8: Additional Costs
  58. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 59 … … … … virtual memory user-de fi ned hook function br x8? sets syscall_nr to x8 0x0 Jump to x8 Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction br x8: Additional Costs
  59. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 60 … … … … virtual memory user-de fi ned hook function br x8? sets (syscall_nr * 4) to x8 0x0 Jump to x8 Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction Rewrite an instruction that sets system call number to x8 before svc to satisfy pc alignment br x8: Additional Costs
  60. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 61 … … … … virtual memory user-de fi ned hook function br x8? sets (syscall_nr * 4) to x8 0x0 Jump to x8 Refers to x8 Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction What if there is an instruction that refers to x8 AFTER aligning x8? br x8: Another Issue
  61. • Another option is br instruction • br jumps to

    the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 62 … … … … virtual memory user-de fi ned hook function br x8? sets (syscall_nr * 4) to x8 0x0 Jump to x8 Refers to x8 Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction What if there is an instruction that refers to x8 AFTER aligning x8? May cause unde fi ned behavior br x8: Another Issue
  62. br x8: Another Issue • Another option is br instruction

    • br jumps to the address stored in a speci fi ed register without saving the return address to x30 • Possible Idea: zpoline-like approach • x8 register holds system call number on Linux • What if we replace svc with ‘br x8’… 63 … … … … virtual memory user-de fi ned hook function br x8? sets (syscall_nr * 4) to x8 0x0 Jump to x8 What if there is an instruction that refers to x8 AFTER aligning x8? Refers to x8 May cause unde fi ned behavior Requires additional runtime costs to fi nd and replace instructions which sets system call number to x8 before svc instruction • ASC-Hook [LCTES '25] uses this approach • Cons: • Additional run-time costs • May cause unexpected behavior • -> It su ff ers from these drawbacks
  63. Existing Binary Rewriting Methods • Existing Methods : • #1:

    bl (HermiTux [VEE '19]) • Cons: Loses original return address • #2: br (ASC-Hook [LCTES '25]) • Cons: +runtime costs, +UB 64 … … … … virtual memory user-de fi ned hook function ??? Any other options?
  64. Contribution • svc-hook: a system call mechanism for ARM64 CPUs

    • based on binary rewriting • free from the limitations of the prior proposals • Substantially lower performance overhead 65
  65. • svc-hook replaces svc with b instruction 66 … …

    … … virtual memory user-de fi ned hook function b #imm svc-hook Primitive: b R1
  66. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address 67 … … … … virtual memory user-de fi ned hook function b #imm svc-hook Primitive: b R1
  67. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address 68 … … … … virtual memory user-de fi ned hook function b #imm svc-hook Primitive: b R1
  68. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 69 … … … … virtual memory user-de fi ned hook function b #imm Jump svc-hook Primitive: b R1
  69. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 70 … … … … virtual memory user-de fi ned hook function b #imm Jump pc register state: pc: R1 x30: <original return address> svc-hook: Why Trampoline? R1
  70. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 71 … … … … virtual memory user-de fi ned hook function b #imm Jump pc register state: pc: <hook entry> x30: <original return address> svc-hook: Why Trampoline? R1
  71. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 72 … … … … virtual memory user-de fi ned hook function b #imm Jump pc register state: pc: <end of the function> x30: <original return address> svc-hook: Why Trampoline? R1
  72. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 73 … … … … virtual memory user-de fi ned hook function b #imm Jump pc register state: pc: <end of the function> x30: <original return addr> Cannot return to R1+4 svc-hook: Why Trampoline? R1
  73. • svc-hook replaces svc with b instruction • b instruction:

    • jumps to an o ff set within ±128 MiB • without saving the return address • -> This means it does not su ff er from the bl- related issue of overwriting x30 74 … … … … virtual memory user-de fi ned hook function b #imm Jump pc we cannot simply jump straight to the hook function register state: pc: <end of the function> x30: <original return addr> Cannot return to R1+4 svc-hook: Why Trampoline? R1
  74. • svc-hook replaces svc with b instruction • svc-hook instantiates

    a trampoline code for each replaced svc instruction • To embed the return address 75 … … … virtual memory user-de fi ned hook function b #imm per-svc trampoline code svc-hook: Trampoline for Each svc R1
  75. • svc-hook replaces svc with b instruction • svc-hook instantiates

    a trampoline code for each replaced svc instruction • To embed the return address • The b jumps to the instantiated trampoline code 76 … … … virtual memory user-de fi ned hook function b #imm Jump per-svc trampoline code svc-hook: Trampoline for Each svc R1
  76. • In the trampoline code: 77 … … … virtual

    memory user-de fi ned hook function b #imm Jump svc-hook: How Trampoline Works R1
  77. • In the trampoline code: • 1) save registers 78

    save regs … … … virtual memory user-de fi ned hook function b #imm Jump svc-hook: How Trampoline Works R1
  78. • In the trampoline code: • 1) save registers •

    2) call hook function 79 save regs call hook func … … … virtual memory user-de fi ned hook function b #imm Jump svc-hook: How Trampoline Works R1
  79. • In the trampoline code: • 1) save registers •

    2) call hook function • 3) return from the hook function 80 save regs call hook func … … … virtual memory user-de fi ned hook function b #imm Jump svc-hook: How Trampoline Works R1
  80. • In the trampoline code: • 1) save registers •

    2) call hook function • 3) return from the hook function • 4) restore registers 81 save regs call hook func … … … virtual memory user-de fi ned hook function b #imm Jump restore regs svc-hook: How Trampoline Works R1
  81. • In the trampoline code: • 1) save registers •

    2) call hook function • 3) return from the hook function • 4) restore registers • 5) return to the original control fl ow 82 save regs call hook func … … … virtual memory user-de fi ned hook function b #imm Jump restore regs jump to R1+4 svc-hook: How Trampoline Works R1
  82. • In the trampoline code: • 1) save registers •

    2) call hook function • 3) return from the hook function • 4) restore registers • 5) return to the original control fl ow 83 save regs call hook func … … … virtual memory user-de fi ned hook function b #imm restore regs jump to R1+4 No register information lost svc-hook: No Context Lost R1
  83. Is ±128 MiB enough? 86 within 128 MiB It can

    instantiate trampoline -128 MiB +128 MiB Allocated Region b #imm trampoline code
  84. Is ±128 MiB enough? 87 -128 MiB +128 MiB Allocated

    Region b #imm trampoline code We cannot place trampoline ❌ -> Hook Fails Allocated Region
  85. Is ±128 MiB enough? 88 -128 MiB +128 MiB b

    #imm trampoline code We cannot place trampoline ❌ -> Hook Fails Allocated Region Allocated Region Does this situation ever occur in real-world binaries?
  86. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map 90 … … … ELF memory map
  87. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions 91 … … … ELF memory map svc R1
  88. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction 92 … … … ELF memory map svc R1
  89. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 93 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB Usable memory region for R1 svc trampoline code per-svc trampoline code
  90. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 94 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code • Run this program for all Ubuntu 24.04 LTS apt packages
  91. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 95 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 • Run this program for all Ubuntu 24.04 LTS apt packages
  92. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 96 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF • Run this program for all Ubuntu 24.04 LTS apt packages
  93. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 97 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 140 90.2 128 CDF Offset [MiB] • Run this program for all Ubuntu 24.04 LTS apt packages Figure 2: CDF (§ 3.3.2).
  94. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 98 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 140 90.2 128 CDF Offset [MiB] Distance to the nearest free memory region • Run this program for all Ubuntu 24.04 LTS apt packages Figure 2: CDF (§ 3.3.2).
  95. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 99 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 140 90.2 128 CDF Offset [MiB] Distance to the nearest free memory region • Run this program for all Ubuntu 24.04 LTS apt packages Figure 2: CDF (§ 3.3.2). Maximum
  96. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 100 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 140 90.2 128 CDF Offset [MiB] Figure 2: CDF (§ 3.3.2). Maximum 99th percentile 23.4 MiB Distance to the nearest free memory region • Run this program for all Ubuntu 24.04 LTS apt packages
  97. Is ±128 MiB enough? • We have mede a simple

    checker program • reads the ELF header and create memory map • fi nd all svc instructions in the executable regions • for each svc instruction: • checks whether the ±128 MiB range of the svc instruction is entirely used for placing objects of the binary or not 101 … … … ELF memory map svc R1 R1+128 MiB R1-128 MiB per-svc trampoline code Results - Total: 81625 packages - ARM64 Binary: 22441 packages - ELF Binary with svc: 705 packages - svc instructions: 1588 no svc whose ±128 MiB range is used for placing objects of the ELF 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 140 90.2 128 CDF Offset [MiB] this issue does not signi fi cantly limit the applicability • Run this program for all Ubuntu 24.04 LTS apt packages Figure 2: CDF (§ 3.3.2). Maximum 99th percentile 23.4 MiB
  98. Evaluation: System Call Hook Overhead 103 • Evaluated the run-time

    overhead of system call hooking • Measured the time needed to hook a system call • Hook the getpid system call (the simplest system call) • Emulate it by returning a dummy process ID
  99. Evaluation: System Call Hook Overhead Time to hook the getpid

    system call Mechanism Time [ns] ptrace 30789 seccomp 3157 brk 3045 svc-hook 28 LD_PRELOAD 11 104 Table 2: System call hook overhead (§ 4.1).
  100. Evaluation: System Call Hook Overhead Time to hook the getpid

    system call Mechanism Time [ns] ptrace 30789 seccomp 3157 brk 3045 svc-hook 28 LD_PRELOAD 11 105 1100x 113x 109x Improvement Table 2: System call hook overhead (§ 4.1).
  101. Evaluation: System Call Hook Overhead Time to hook the getpid

    system call Mechanism Time [ns] ptrace 30789 seccomp 3157 brk 3045 svc-hook 28 LD_PRELOAD 11 106 +17 ns overhead Table 2: System call hook overhead (§ 4.1).
  102. Evaluation: System Call Hook Overhead Time to hook the getpid

    system call Mechanism Time [ns] ptrace 30789 seccomp 3157 brk 3045 svc-hook 28 LD_PRELOAD 11 107 +17 ns overhead -> going through the trampoline code Table 2: System call hook overhead (§ 4.1).
  103. Evaluation: System Call Hook Overhead Time to hook the getpid

    system call Mechanism Time [ns] ptrace 30789 seccomp 3157 brk 3045 svc-hook 28 LD_PRELOAD 11 108 +17 ns overhead -> going through the trampoline code The run-time overhead is suf fi ciently low Table 2: System call hook overhead (§ 4.1).
  104. Evaluation: Application Performance • We transparently applied lwIP + DPDK

    to applications by using various syscall hook mechanisms 109
  105. Evaluation: Application Performance 110 • We transparently applied lwIP +

    DPDK to applications by using various syscall hook mechanisms ARM64 Machine 10 Gbps ARM64 Machine lwIP + DPDK Simple HTTP Server
  106. Evaluation: Application Performance 111 • We transparently applied lwIP +

    DPDK to applications by using various syscall hook mechanisms lwIP + DPDK Simple HTTP Server syscall hook ARM64 Machine 10 Gbps ARM64 Machine
  107. Evaluation: Application Performance 112 • We transparently applied lwIP +

    DPDK to applications by using various syscall hook mechanisms lwIP + DPDK Simple HTTP Server syscall hook ptrace, seccomp, brk, svc-hook, LD_PRELOAD benchmark client fetch 64B content ARM64 Machine 10 Gbps ARM64 Machine
  108. Evaluation: Application Performance 113 Simple HTTP Server Performance Throughput [K

    reqs/sec] 0 100 200 300 400 500 Linux ptrace seccom p brk svc-hook LD_PRELO AD Linux Figure 3: Network server performance (§ 4.2).
  109. Evaluation: Application Performance 114 Simple HTTP Server Performance Throughput [K

    reqs/sec] 0 100 200 300 400 500 Linux ptrace seccom p brk svc-hook LD_PRELO AD Linux 93% 16% 15% 2% Compared to LD_PRELOAD Figure 3: Network server performance (§ 4.2).
  110. Evaluation: Application Performance 115 • We transparently applied lwIP +

    DPDK to applications by using various syscall hook mechanisms lwIP + DPDK Redis syscall hook ptrace, seccomp, brk, svc-hook, LD_PRELOAD benchmark client GET 16 conc. TCP ARM64 Machine 10 Gbps ARM64 Machine
  111. Evaluation: Application Performance 116 Redis Performance Throughput [K reqs/sec] 0

    28 56 84 112 140 Linux ptrace seccom p brk svc-hook LD_PRELO AD Linux 95% 36% 34% 7% Compared to LD_PRELOAD Figure 3: Network server performance (§ 4.2).
  112. Evaluation: Application Performance 117 Redis Performance Throughput [K reqs/sec] 0

    28 56 84 112 140 Linux ptrace seccom p brk svc-hook LD_PRELO AD Linux 95% 36% 34% 7% Compared to LD_PRELOAD svc-hook is ef fi cient enough to preserve the performance bene fi ts of user-space OS subsystems
  113. Summary • svc-hook: a system call hook mechanism for ARM64

    CPUs • based on binary rewriting • replaces svc with b instruction • instantiates dedicated trampoline code for each replaced svc • ±128 MiB jump o ff set of b does not limit the applicability • E ffi cient enough to maintain the performance merit of the user-pace OS subsystems • svc-hook is open source: https://github.com/retrage/svc-hook • Supported Platforms: Linux (Android), FreeBSD, NetBSD 118 GitHub
  114. Appendix: Table of Contents A. svc-hook Setup Procedure B. Applicable

    OS Environment C. Other CPU Architectures D. Memory Footprint of the Trampoline Code E. Comparison with ASC-Hook F. Performance Overhead for Other Workloads A. SQLite B. PostgreSQL C. Samba 119 Supplemental The 6-page paper is too short to present all of our work!
  115. svc-hook Setup Procedure • Uses the LD_PRELOAD trick introduced by

    zpoline [ATC '23] • Employed by almost all the run-time binary rewriting syscall hook 1. Run LIBSVCHOOK=hook.so LD_PRELOAD=libsvchook.so <target> 2. libsvchook.so has a function that called before the target main function: 1. Scan memory region to fi nd svc 2. Setup trampoline code 3. Rewrite svc 4. Load the hook function speci fi ed with LIBSVCHOOK environment variable 120
  116. Applicable OS Environment • Linux (Android), FreeBSD, and NetBSD are

    supported • OpenBSD and macOS: Not Applicable due to restrictions of mprotect • OpenBSD: • mimmutable(2): Prohibits further manipulation of page mapping • The libc calls this syscall once a program loaded • macOS: • Denies R-X -> RWX memory region attribute changes • These limitations are common to all the run-time binary rewriting approaches 121
  117. Other CPU Architectures • ARM64 b instruction: Jump with immediate

    value • x86: Not applicable • syscall/sysenter 2-bytes: Too small to jump/call instructions • RISC-V: May applicable, but less fl exible • j imm is equivalent to ARM64 b instruction • o ff set range: ±1 MiB: Narrower than ARM64 122
  118. Memory Footprint of Trampoline 123 save regs call hook func

    restore regs jump to original CF Concept save regs call hook func restore regs Implementation Common call syscall table jump to original CF Uncommon Breakdown Misc - Only one common part - 428 bytes - Uncommon part is instantiated for each svc - 28 bytes
  119. Memory Footprint of Trampoline 124 save regs call hook func

    restore regs Implementation Common call syscall table jump to original CF Uncommon Misc - Only one common part - 428 bytes - Uncommon part is instantiated for each svc - 28 bytes • Measured the footprint • Target: int main() {} • 728 svc instructions Size [bytes] Common 428 Uncommon 20384 Total 20812
  120. Memory Footprint of Trampoline 125 save regs call hook func

    restore regs Implementation Common call syscall table jump to original CF Uncommon Misc - Only one common part - 428 bytes - Uncommon part is instantiated for each svc - 28 bytes • Measured the footprint • Target: int main() {} • 728 svc instructions Size [bytes] Common 428 Uncommon 20384 Total 20812 Results 428 bytes + 728 svc * 28 bytes = 20,812 bytes ~ 5.08 pages in 4 KiB page size The memory footprint is acceptably small
  121. Comparison w/ ASC-Hook: Trampoline Overhead 126 Time to hook the

    getpid system call Mechanism Time [ns] ASC-Hook 39 svc-hook 14 LD_PRELOAD 3 +36 ns overhead +11 ns overhead -> svc-hook has less overhead
  122. Comparison w/ ASC-Hook: Startup Time • Measured time to fi

    nish installing a hook to ls command binary • ASC-Hook requires disassembly • To fi nd syscall assignment instructions • svc-hook only needs to fi nd svc instructions • A simple pattern matching works well 127 Time to install a hook Mechanism Time [milisec] ASC-Hook 1890 svc-hook 25 75x faster
  123. Performance Overhead: SQLite 128 Workload w/o [micro sec] w/ [micro

    sec] Overhead Sequential Fill (Write) 22.971 23.572 2.62% Random Fill (Write) 39.996 40.253 0.64% Sequential Read 8.321 8.498 2.13% Random Read 12.572 12.574 0.02% • one million records • key size: 16 bytes • value size: 100 bytes • WAL mode
  124. Performance Overhead: PostgreSQL 129 Workload w/o [TPS] w/ [TPS] Overhead

    pgbench 10923 10790 1.22% • PostgreSQL 17.6 • Ran benchmark on local • 12/16 cores: DB • 4/16 cores: pgbench • 36 clients in total • TPS: Transaction Per Second
  125. Performance Overhead: Samba 130 Workload w/o [MB/s] w/ [MB/s] Overhead

    get (download) 647 641 0.93% put (upload) 858 857 0.12% • Samba 4.15 • Transfer a 16 GB fi le on local • Measured the throughput
  126. Performance Overhead Summary • SQLite: 0.02 ~ 2.62 % •

    PostgreSQL: 1.22 % • Samba: 0.12 ~ 0.93 % • -> svc-hook does not signi fi cantly su ff er the application performance 131