Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cryptography on Cortex-M4: Rust + Assembly = 💕

Cryptography on Cortex-M4: Rust + Assembly = 💕

Safe cryptography requires getting a lot of layers right, from an easy-to-use-correctly high-level API, down to the lowest-level manipulation of bits in constant time. With a projected number of over 30 billion active IoT devices for 2020, Rust needs a solid native cryptography story to compete.

Missing so far is public key cryptography on microcontrollers, which at the lowest level depends strongly on the platform's instruction set.

Using our Ed25519 signature library for illustration, this talk

- argues for Arm Cortex-M4 as a mean viable platform, and
- exemplifies how Rust and Assembly combine to a winning combination of high-level design with low-level precision.

Along the way, we'll learn a little bit about multiplying numbers with the obscure UMAAL instruction!

Nicolas Stalder

March 20, 2020

Other Decks in Programming


  1. Cryptography on Cortex-M4:
    Rust + Assembly = love
    Nicolas Stalder
    GitHub @nickray, Twitter: @nickraystalder
    [email protected]
    Oxidize 1k
    March 20, 2020 (v1)

    View Slide

  2. Background / Motivation
    • Nicolas Stalder (@nickray) is a
    mathematician (arithmetic geometry)
    • SoloKeys is an open source hardware

    key company (e.g. FIDO2)
    • Just like > 20 billion IoT devices, need:
    • establish secure communication
    • proof of device identity
    • O Cortex-M cryptography, where art thou?

    View Slide

  3. Oh look, it's us!
    really bad SEO?
    lack of interest?

    View Slide

  4. Two Types of Crypto
    (grossly simplified)
    Symmetric (+ Hashes) Asymmetric
    • secret key
    • combinatorics
    • manipulation of 32-bit words
    • RustCrypto! h/t @tarcieri
    • public/private keypair
    • arithmetic
    • very large integers
    • platform specific! ycrypto :)

    View Slide

  5. Pyramid of Requirements
    correct, understandable, API
    constant time
    physical attacks
    table stakes to

    at least try hard!
    can't live without!

    View Slide

  6. Thesis
    • Rust is (mostly...)
    amazing at expressing the
    mathematical structure in terms of traits
    • With some effort, can trick the compiler into not
    being too smart (breaking constant time) at
    high and intermediate level (subtle, zeroize, ...)
    • At the lowest level (inner loop), need assembly
    to make optimal use of platform capabilities in
    constant time. Play a "game of lego".

    View Slide

  7. Illustrative Example
    salty: a library for Ed25519
    signatures on Cortex-M4
    • structure from TweetNaCl
    • API from ed25519-dalek
    • field implementation from

    Björn M. Haase

    View Slide

  8. Rust Example #1
    • Not every [u32; 8] array is a valid public key!

    The point represented needs to be on the curve
    • Attacks possible if not
    • Idiomatic Rust offers TryFrom trait for constructors

    that can fail
    • API offers only safe ways to construct PublicKey

    View Slide

  9. The Maths
    (grossly simplified)
    • Coordinates of points are integers modulo
    • Can represent as array [u32; 8] of 32-bit words

    (like numbers are usually written as array of digits)
    • Addition/Multiplication is word-by-word, with carry,

    and reduction modulo prime q
    • For example, bit 31 of word 7 is replaced with 19

    View Slide

  10. Rust Example #2
    Express the
    expectations on a
    field implementation
    • TweetNaCl uses
    [i64; 16]
    • Björn Haase uses
    [u32; 8]

    View Slide

  11. Why Cortex-M4?
    (and above: M33, M35-P, M55, ...)
    Source: ARM® Cortex®-M{3,4} Technical Reference Manual
    M3 M4
    M4 only (DSP)

    View Slide

  12. The Instructions
    lo, hi, a, b are 32-bit words, all instructions take one cycle only
    • UMULL ( lo, hi, a, b ): Unsigned Long Multiply

    (hi, lo) = a * b
    • UMLAL ( lo, hi, c, d ): Unsigned Long Multiply, with Accumulate

    (hi, lo) += a * b
    • UMAAL ( lo, hi, a, b ): ..., with Accumulate Accumulate :)

    (hi, lo) = lo + hi + a * b
    it fits!

    View Slide

  13. UMAAL maybe in Action?
    Could be implemented

    in many different ways,

    possibly branching on

    the (secret) data

    View Slide

  14. UMAAL in Action!
    Use case:

    sum of three words

    View Slide

  15. Next Steps
    • End-to-end timing and power consumption testing
    • WIP: crypto-service offering compile-time selection of
    cryptographic algorithms, with encrypted storage
    • RPC API over heapless queues, designed to run in secure
    Trustzone-M domain. Non-secure domain can only access
    handles and wrapped key material
    • default implementations (salty, nisty, RustCrypto) for Cortex-M4,
    configurable and pluggable use of hardware acceleration
    • SoloKeys "model Bee" firmware public soon!

    View Slide

  16. Links
    • docs.rs/salty
    • github.com/ycrypto: community for Cortex-M4
    • github.com/BjoernMHaase/fe25519
    • github.com/RustCrypto, docs.rs/ed25519-dalek
    • tweetnacl.cr.yp.to/papers.html
    • bearssl.org/constanttime.html
    Nicolas Stalder
    Github @nickray, Twitter @nickraystalder
    [email protected]
    Thank you!

    View Slide