Cryptography on Cortex-M4: Rust + Assembly = 💕

Cryptography on Cortex-M4: Rust + Assembly = 💕

Safe cryptography requires getting a lot of layers right, from an easy-to-use-correctly high-level API, down to the lowest-level manipulation of bits in constant time. With a projected number of over 30 billion active IoT devices for 2020, Rust needs a solid native cryptography story to compete.

Missing so far is public key cryptography on microcontrollers, which at the lowest level depends strongly on the platform's instruction set.

Using our Ed25519 signature library for illustration, this talk

- argues for Arm Cortex-M4 as a mean viable platform, and
- exemplifies how Rust and Assembly combine to a winning combination of high-level design with low-level precision.

Along the way, we'll learn a little bit about multiplying numbers with the obscure UMAAL instruction!


Nicolas Stalder

March 20, 2020


  1. 1.

    Cryptography on Cortex-M4: Rust + Assembly = love Nicolas Stalder

    GitHub @nickray, Twitter: @nickraystalder Oxidize 1k March 20, 2020 (v1)
  2. 2.

    Background / Motivation • Nicolas Stalder (@nickray) is a mathematician

    (arithmetic geometry) • SoloKeys is an open source hardware
 key company (e.g. FIDO2) • Just like > 20 billion IoT devices, need: • establish secure communication • proof of device identity • O Cortex-M cryptography, where art thou?
  3. 4.

    Two Types of Crypto (grossly simplified) Symmetric (+ Hashes) Asymmetric

    • secret key • combinatorics • manipulation of 32-bit words • RustCrypto! h/t @tarcieri • public/private keypair • arithmetic • very large integers • platform specific! ycrypto :)
  4. 5.

    Pyramid of Requirements correct, understandable, API constant time physical attacks

    ... table stakes to
 at least try hard! can't live without!
  5. 6.

    Thesis • Rust is (mostly...) amazing at expressing the mathematical

    structure in terms of traits • With some effort, can trick the compiler into not being too smart (breaking constant time) at high and intermediate level (subtle, zeroize, ...) • At the lowest level (inner loop), need assembly to make optimal use of platform capabilities in constant time. Play a "game of lego".
  6. 7.

    Illustrative Example salty: a library for Ed25519 signatures on Cortex-M4

    • structure from TweetNaCl • API from ed25519-dalek • field implementation from
 Björn M. Haase
  7. 8.

    Rust Example #1 • Not every [u32; 8] array is

    a valid public key!
 The point represented needs to be on the curve • Attacks possible if not • Idiomatic Rust offers TryFrom trait for constructors
 that can fail • API offers only safe ways to construct PublicKey
  8. 9.

    The Maths (grossly simplified) • Coordinates of points are integers

    modulo • Can represent as array [u32; 8] of 32-bit words
 (like numbers are usually written as array of digits) • Addition/Multiplication is word-by-word, with carry,
 and reduction modulo prime q • For example, bit 31 of word 7 is replaced with 19
  9. 10.

    Rust Example #2 Express the mathematical expectations on a field

    implementation Implementation flexibility: • TweetNaCl uses [i64; 16] • Björn Haase uses [u32; 8]
  10. 11.

    Why Cortex-M4? (and above: M33, M35-P, M55, ...) Source: ARM®

    Cortex®-M{3,4} Technical Reference Manual M3 M4 M4 only (DSP)
  11. 12.

    The Instructions lo, hi, a, b are 32-bit words, all

    instructions take one cycle only • UMULL ( lo, hi, a, b ): Unsigned Long Multiply
 (hi, lo) = a * b • UMLAL ( lo, hi, c, d ): Unsigned Long Multiply, with Accumulate
 (hi, lo) += a * b • UMAAL ( lo, hi, a, b ): ..., with Accumulate Accumulate :)
 (hi, lo) = lo + hi + a * b it fits!
  12. 13.

    UMAAL maybe in Action? Could be implemented
 in many different

 possibly branching on
 the (secret) data
  13. 15.

    Next Steps • End-to-end timing and power consumption testing •

    WIP: crypto-service offering compile-time selection of cryptographic algorithms, with encrypted storage • RPC API over heapless queues, designed to run in secure Trustzone-M domain. Non-secure domain can only access handles and wrapped key material • default implementations (salty, nisty, RustCrypto) for Cortex-M4, configurable and pluggable use of hardware acceleration • SoloKeys "model Bee" firmware public soon!
  14. 16.

    Links • • community for Cortex-M4 •

    •, • • Nicolas Stalder Github @nickray, Twitter @nickraystalder Thank you!