Introduction to Solidity Assembly

Introduction to Solidity Assembly

This is the talk I gave at Blockchain Developers Meetup #04 in Sofia, Bulgaria.

In it, I cover the basics of programming language build types - direct compilation, interpretation, Virtual Machines. Then, I move on to discussing the EVM and Solidity in Ethereum. Finally, there is a broad exploration of writing assembly in Solidity with several related code snippets.

They can be found at:
https://pastebin.com/Yw8D6XB2
https://pastebin.com/pzye9yGx
https://pastebin.com/fqMdK0JR
https://pastebin.com/XbjsJWAM
https://pastebin.com/HvWbBQ91

Af78adc6caf494f74c1e4fb10a15c74a?s=128

Preslav Mihaylov

November 08, 2018
Tweet

Transcript

  1. Introduction to Solidity Assembly Programming Languages, The EVM, Assembly Preslav

    Mihaylov Head of Blockchain Training SoftUni http://softuni.org/
  2. Table of Contents 2 ▪ Programming Language Types ▪ The

    EVM & Stack-based VMs ▪ Solidity Assembly ▪ Basic operations ▪ Instructional & Functional Assembly ▪ Memory types in the EVM
  3. Programming Language Types Compilation, Interpretation & VMs

  4. 4 ▪ There are different types of programming languages based

    on the way they are built ▪ Assembly Languages – assembly code directly translated to machine code ▪ Compiled languages – source code -> assembly/machine code ▪ Interpreted languages – source code is interpreted by a program runtime ▪ Virtual Machine based languages – source code is compiled to intermediary language and later interpreted by a VM runtime Programming Language Types
  5. 5 Assembly Languages ▪ Not real programming languages ▪ Assembly

    is a set of mnemonics which directly map to machine code of the specific CPU instruction set ▪ The program which translates assembly to machine code is called an assembler ▪ Every instruction set has its own assembly “language” ▪ Popular examples – x86 Assembly, x64 Assembly, ARM Assembly
  6. 6 Assembly Languages

  7. 7 Compiled Languages ▪ These are usually the fastest running

    languages due to their direct translation to native assembly ▪ They lack portability between different platforms ▪ And sometimes lack high-level features such as runtime type checking & garbage collection ▪ Popular examples – C, C++, Objective-C, Go
  8. 8 Compiled Languages

  9. 9 Interpreted Languages ▪ These languages usually have the most

    comfortable high-level features due to their lack of direct mapping to machine code ▪ But are the worst in terms of performance ▪ Require a runtime program to interpret the source code ▪ Popular examples – JavaScript, Python, PHP, Perl
  10. 10 Interpreted Languages

  11. 11 Virtual Machine Based languages ▪ A mix between compiled

    and interpreted languages ▪ Source code compiles to an intermediary language, which is later interpreted by a runtime program ▪ Combines high-level features of interpreted languages and low-level performance of compiled languages ▪ Source code is portable between different platforms ▪ Popular examples – C#, Java, Solidity
  12. 12 Virtual Machine Based Languages

  13. The EVM & EVM Assembly Specification & Specifics

  14. 14 The EVM ▪ A 256-bit word virtual machine ▪

    The VM is stack-based, similar to other popular VMs – JVM, CLR (C# Virtual Machine) ▪ Supports a predefined set of opcodes ▪ The opcodes are platform independent and can be executed by any native platform which supports the EVM ▪ Every single opcode has a gas price predefined ▪ Multiple programming languages compile to EVM opcodes
  15. 15 The EVM & Related Programming Languages

  16. Solidity Assembly Usage, Common operations & Examples

  17. 17 Solidity Assembly ▪ Solidity supports constructs for writing low-level

    code which is (almost) directly translated to EVM opcodes ▪ Supports inline syntax, which closely resembles EVM opcodes and functional syntax, which is more user-friendly ▪ The compiler does not optimize assembly code, so use with care
  18. 18 When to use Solidity Assembly? ▪ Using solidity assembly

    should be done in very rare cases and only if really necessary ▪ It can achieve some performance gains, but not too much ▪ It can be used for achieving features not yet introduced in the solidity language ▪ The purpose of learning it is to understand how it works, and to not be afraid of encountering assembly code in smart contracts
  19. 19 Basic operations in Assembly ▪ add, sub, mul, div,

    mod (assembly) == +, -, *, /, % (solidity) ▪ lt, gt, eq (assembly) == <, >, == (solidity) ▪ and, or, xor, shl, shr (assembly) == &, |, ^, <<, >> (solidity) ▪ Logical operators use the same opcodes as bitwise operators ▪ jump, jumpi – jump (conditionally) to label ▪ Used for creating if-else statements & for-loops ▪ origin, gasprice, coinbase… - opcodes for accessing tx metadata just like in solidity
  20. Basic Operations in Solidity Live Demo

  21. 21 Instructional vs. Functional Assembly ▪ Solidity supports two styles

    of writing assembly code ▪ Instructional assembly is a stack-oriented assembly which closely resembles the EVM bytecode ▪ Functional assembly is a higher-level assembly syntax which resembles the use of functions in normal programming languages ▪ Prefer functional assembly to enhance readability
  22. Instructional Assembly in Solidity Live Demo

  23. Memory in Solidity Stack, Memory, Storage & Calldata

  24. 24 ▪ In the EVM, there are several types of

    memory: ▪ Stack Memory – Consists of data pushed & popped from the EVM stack. Available in the function scope. ▪ Memory – Similar to Heap memory in normal programs. Available throughout the Smart Contract call. ▪ Storage – Similar to external storage in normal programs. Persists between Smart Contract calls. ▪ Calldata – Special read-only memory for storing function call metadata. Memory in the EVM
  25. 25 ▪ Memory in solidity is represented as a linear

    array of bytes ▪ Arithmetic can be made to manipulate elements inside memory. For example: elements of an array ▪ Operations for dealing with memory: ▪ mload(address) – retrieve value at given address ▪ mstore(address, value) – store value at given address ▪ msize – Current size of memory. Can be used for allocating new elements in memory Memory type
  26. Using Memory in Solidity Assembly Live Demo

  27. 27 ▪ Storage is stored in the form of a

    Merkle-Patricia Trie ▪ This means elements cannot be accessed in a liner fashion as in the memory type ▪ Every variable in memory, has a slot and offset in it, where it can be found ▪ For a variable x, they can be accessed as x_slot and x_offset ▪ sload(slot + offset) – load value from storage ▪ sstore(slot + offset, value) – store value in storage Storage type
  28. Using Storage in Solidity Assembly Live Demo

  29. 29 ▪ Special read-only memory for storing function metadata ▪

    Function signature, parameters… ▪ The place where msg.sender and similar special variables are stored also ▪ Copied in memory on public function calls ▪ Not copied & read-only on external function calls ▪ Therefore, calling external functions when passing large arrays of data is beneficial Calldata type
  30. Calldata & External functions Live Demo

  31. 31 ▪ Programming languages differ in terms of the way

    they are built ▪ Compiled vs. Interpreted vs. VM Based ▪ Solidity is a VM-based language using the Ethereum Virtual Machine ▪ Solidity Assembly allows you to achieve low-level optimizations and implement missing language features ▪ Prefer high-level optimizations instead of relying on assembly Summary
  32. ? Solidity Advanced