$30 off During Our Annual Pro Sale. View Details »

Introduction to Solidity Assembly

Introduction to Solidity Assembly

This is the talk I gave at Blockchain Developers Meetup #04 in Sofia, Bulgaria.

In it, I cover the basics of programming language build types - direct compilation, interpretation, Virtual Machines. Then, I move on to discussing the EVM and Solidity in Ethereum. Finally, there is a broad exploration of writing assembly in Solidity with several related code snippets.

They can be found at:
https://pastebin.com/Yw8D6XB2
https://pastebin.com/pzye9yGx
https://pastebin.com/fqMdK0JR
https://pastebin.com/XbjsJWAM
https://pastebin.com/HvWbBQ91

Preslav Mihaylov

November 08, 2018
Tweet

More Decks by Preslav Mihaylov

Other Decks in Programming

Transcript

  1. Introduction to Solidity Assembly
    Programming Languages, The EVM, Assembly
    Preslav Mihaylov
    Head of Blockchain Training
    SoftUni
    http://softuni.org/

    View Slide

  2. Table of Contents
    2
    ▪ Programming Language Types
    ▪ The EVM & Stack-based VMs
    ▪ Solidity Assembly
    ▪ Basic operations
    ▪ Instructional & Functional Assembly
    ▪ Memory types in the EVM

    View Slide

  3. Programming Language Types
    Compilation, Interpretation & VMs

    View Slide

  4. 4
    ▪ There are different types of programming languages based on
    the way they are built
    ▪ Assembly Languages – assembly code directly translated to
    machine code
    ▪ Compiled languages – source code -> assembly/machine code
    ▪ Interpreted languages – source code is interpreted by a
    program runtime
    ▪ Virtual Machine based languages – source code is compiled to
    intermediary language and later interpreted by a VM runtime
    Programming Language Types

    View Slide

  5. 5
    Assembly Languages
    ▪ Not real programming languages
    ▪ Assembly is a set of mnemonics which directly map to
    machine code of the specific CPU instruction set
    ▪ The program which translates assembly to machine code is
    called an assembler
    ▪ Every instruction set has its own assembly “language”
    ▪ Popular examples – x86 Assembly, x64 Assembly, ARM
    Assembly

    View Slide

  6. 6
    Assembly Languages

    View Slide

  7. 7
    Compiled Languages
    ▪ These are usually the fastest running languages due to their
    direct translation to native assembly
    ▪ They lack portability between different platforms
    ▪ And sometimes lack high-level features such as runtime type
    checking & garbage collection
    ▪ Popular examples – C, C++, Objective-C, Go

    View Slide

  8. 8
    Compiled Languages

    View Slide

  9. 9
    Interpreted Languages
    ▪ These languages usually have the most comfortable high-level
    features due to their lack of direct mapping to machine code
    ▪ But are the worst in terms of performance
    ▪ Require a runtime program to interpret the source code
    ▪ Popular examples – JavaScript, Python, PHP, Perl

    View Slide

  10. 10
    Interpreted Languages

    View Slide

  11. 11
    Virtual Machine Based languages
    ▪ A mix between compiled and interpreted languages
    ▪ Source code compiles to an intermediary language, which is
    later interpreted by a runtime program
    ▪ Combines high-level features of interpreted languages and
    low-level performance of compiled languages
    ▪ Source code is portable between different platforms
    ▪ Popular examples – C#, Java, Solidity

    View Slide

  12. 12
    Virtual Machine Based Languages

    View Slide

  13. The EVM & EVM Assembly
    Specification & Specifics

    View Slide

  14. 14
    The EVM
    ▪ A 256-bit word virtual machine
    ▪ The VM is stack-based, similar to other popular VMs – JVM,
    CLR (C# Virtual Machine)
    ▪ Supports a predefined set of opcodes
    ▪ The opcodes are platform independent and can be executed
    by any native platform which supports the EVM
    ▪ Every single opcode has a gas price predefined
    ▪ Multiple programming languages compile to EVM opcodes

    View Slide

  15. 15
    The EVM & Related Programming Languages

    View Slide

  16. Solidity Assembly
    Usage, Common operations & Examples

    View Slide

  17. 17
    Solidity Assembly
    ▪ Solidity supports constructs for writing low-level code which
    is (almost) directly translated to EVM opcodes
    ▪ Supports inline syntax, which closely resembles EVM opcodes
    and functional syntax, which is more user-friendly
    ▪ The compiler does not optimize assembly code,
    so use with care

    View Slide

  18. 18
    When to use Solidity Assembly?
    ▪ Using solidity assembly should be done in very rare cases and
    only if really necessary
    ▪ It can achieve some performance gains, but not too much
    ▪ It can be used for achieving features not yet introduced in the
    solidity language
    ▪ The purpose of learning it is to understand how it works, and
    to not be afraid of encountering assembly code in smart
    contracts

    View Slide

  19. 19
    Basic operations in Assembly
    ▪ add, sub, mul, div, mod (assembly) == +, -, *, /, % (solidity)
    ▪ lt, gt, eq (assembly) == <, >, == (solidity)
    ▪ and, or, xor, shl, shr (assembly) == &, |, ^, <<, >> (solidity)
    ▪ Logical operators use the same opcodes as bitwise operators
    ▪ jump, jumpi – jump (conditionally) to label
    ▪ Used for creating if-else statements & for-loops
    ▪ origin, gasprice, coinbase… - opcodes for accessing tx
    metadata just like in solidity

    View Slide

  20. Basic Operations in Solidity
    Live Demo

    View Slide

  21. 21
    Instructional vs. Functional Assembly
    ▪ Solidity supports two styles of writing assembly code
    ▪ Instructional assembly is a stack-oriented assembly which
    closely resembles the EVM bytecode
    ▪ Functional assembly is a higher-level assembly syntax which
    resembles the use of functions in normal programming
    languages
    ▪ Prefer functional assembly to enhance readability

    View Slide

  22. Instructional Assembly in Solidity
    Live Demo

    View Slide

  23. Memory in Solidity
    Stack, Memory, Storage & Calldata

    View Slide

  24. 24
    ▪ In the EVM, there are several types of memory:
    ▪ Stack Memory – Consists of data pushed & popped from the
    EVM stack. Available in the function scope.
    ▪ Memory – Similar to Heap memory in normal programs.
    Available throughout the Smart Contract call.
    ▪ Storage – Similar to external storage in normal programs.
    Persists between Smart Contract calls.
    ▪ Calldata – Special read-only memory for storing function call
    metadata.
    Memory in the EVM

    View Slide

  25. 25
    ▪ Memory in solidity is represented as a linear array of bytes
    ▪ Arithmetic can be made to manipulate elements inside
    memory. For example: elements of an array
    ▪ Operations for dealing with memory:
    ▪ mload(address) – retrieve value at given address
    ▪ mstore(address, value) – store value at given address
    ▪ msize – Current size of memory. Can be used for allocating new
    elements in memory
    Memory type

    View Slide

  26. Using Memory in Solidity Assembly
    Live Demo

    View Slide

  27. 27
    ▪ Storage is stored in the form of a Merkle-Patricia Trie
    ▪ This means elements cannot be accessed in a liner fashion as
    in the memory type
    ▪ Every variable in memory, has a slot and offset in it, where it
    can be found
    ▪ For a variable x, they can be accessed as x_slot and x_offset
    ▪ sload(slot + offset) – load value from storage
    ▪ sstore(slot + offset, value) – store value in storage
    Storage type

    View Slide

  28. Using Storage in Solidity Assembly
    Live Demo

    View Slide

  29. 29
    ▪ Special read-only memory for storing function metadata
    ▪ Function signature, parameters…
    ▪ The place where msg.sender and similar special variables are
    stored also
    ▪ Copied in memory on public function calls
    ▪ Not copied & read-only on external function calls
    ▪ Therefore, calling external functions when passing large arrays
    of data is beneficial
    Calldata type

    View Slide

  30. Calldata & External functions
    Live Demo

    View Slide

  31. 31
    ▪ Programming languages differ in terms
    of the way they are built
    ▪ Compiled vs. Interpreted vs. VM Based
    ▪ Solidity is a VM-based language using the
    Ethereum Virtual Machine
    ▪ Solidity Assembly allows you to achieve
    low-level optimizations and implement
    missing language features
    ▪ Prefer high-level optimizations instead of
    relying on assembly
    Summary

    View Slide

  32. ?
    Solidity Advanced

    View Slide