Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Solidity Assembly

Introduction to Solidity Assembly

This is the talk I gave at Blockchain Developers Meetup #04 in Sofia, Bulgaria.

In it, I cover the basics of programming language build types - direct compilation, interpretation, Virtual Machines. Then, I move on to discussing the EVM and Solidity in Ethereum. Finally, there is a broad exploration of writing assembly in Solidity with several related code snippets.

They can be found at:
https://pastebin.com/Yw8D6XB2
https://pastebin.com/pzye9yGx
https://pastebin.com/fqMdK0JR
https://pastebin.com/XbjsJWAM
https://pastebin.com/HvWbBQ91

Preslav Mihaylov

November 08, 2018
Tweet

More Decks by Preslav Mihaylov

Other Decks in Programming

Transcript

  1. Introduction to Solidity Assembly Programming Languages, The EVM, Assembly Preslav

    Mihaylov Head of Blockchain Training SoftUni http://softuni.org/
  2. Table of Contents 2 ▪ Programming Language Types ▪ The

    EVM & Stack-based VMs ▪ Solidity Assembly ▪ Basic operations ▪ Instructional & Functional Assembly ▪ Memory types in the EVM
  3. 4 ▪ There are different types of programming languages based

    on the way they are built ▪ Assembly Languages – assembly code directly translated to machine code ▪ Compiled languages – source code -> assembly/machine code ▪ Interpreted languages – source code is interpreted by a program runtime ▪ Virtual Machine based languages – source code is compiled to intermediary language and later interpreted by a VM runtime Programming Language Types
  4. 5 Assembly Languages ▪ Not real programming languages ▪ Assembly

    is a set of mnemonics which directly map to machine code of the specific CPU instruction set ▪ The program which translates assembly to machine code is called an assembler ▪ Every instruction set has its own assembly “language” ▪ Popular examples – x86 Assembly, x64 Assembly, ARM Assembly
  5. 7 Compiled Languages ▪ These are usually the fastest running

    languages due to their direct translation to native assembly ▪ They lack portability between different platforms ▪ And sometimes lack high-level features such as runtime type checking & garbage collection ▪ Popular examples – C, C++, Objective-C, Go
  6. 9 Interpreted Languages ▪ These languages usually have the most

    comfortable high-level features due to their lack of direct mapping to machine code ▪ But are the worst in terms of performance ▪ Require a runtime program to interpret the source code ▪ Popular examples – JavaScript, Python, PHP, Perl
  7. 11 Virtual Machine Based languages ▪ A mix between compiled

    and interpreted languages ▪ Source code compiles to an intermediary language, which is later interpreted by a runtime program ▪ Combines high-level features of interpreted languages and low-level performance of compiled languages ▪ Source code is portable between different platforms ▪ Popular examples – C#, Java, Solidity
  8. 14 The EVM ▪ A 256-bit word virtual machine ▪

    The VM is stack-based, similar to other popular VMs – JVM, CLR (C# Virtual Machine) ▪ Supports a predefined set of opcodes ▪ The opcodes are platform independent and can be executed by any native platform which supports the EVM ▪ Every single opcode has a gas price predefined ▪ Multiple programming languages compile to EVM opcodes
  9. 17 Solidity Assembly ▪ Solidity supports constructs for writing low-level

    code which is (almost) directly translated to EVM opcodes ▪ Supports inline syntax, which closely resembles EVM opcodes and functional syntax, which is more user-friendly ▪ The compiler does not optimize assembly code, so use with care
  10. 18 When to use Solidity Assembly? ▪ Using solidity assembly

    should be done in very rare cases and only if really necessary ▪ It can achieve some performance gains, but not too much ▪ It can be used for achieving features not yet introduced in the solidity language ▪ The purpose of learning it is to understand how it works, and to not be afraid of encountering assembly code in smart contracts
  11. 19 Basic operations in Assembly ▪ add, sub, mul, div,

    mod (assembly) == +, -, *, /, % (solidity) ▪ lt, gt, eq (assembly) == <, >, == (solidity) ▪ and, or, xor, shl, shr (assembly) == &, |, ^, <<, >> (solidity) ▪ Logical operators use the same opcodes as bitwise operators ▪ jump, jumpi – jump (conditionally) to label ▪ Used for creating if-else statements & for-loops ▪ origin, gasprice, coinbase… - opcodes for accessing tx metadata just like in solidity
  12. 21 Instructional vs. Functional Assembly ▪ Solidity supports two styles

    of writing assembly code ▪ Instructional assembly is a stack-oriented assembly which closely resembles the EVM bytecode ▪ Functional assembly is a higher-level assembly syntax which resembles the use of functions in normal programming languages ▪ Prefer functional assembly to enhance readability
  13. 24 ▪ In the EVM, there are several types of

    memory: ▪ Stack Memory – Consists of data pushed & popped from the EVM stack. Available in the function scope. ▪ Memory – Similar to Heap memory in normal programs. Available throughout the Smart Contract call. ▪ Storage – Similar to external storage in normal programs. Persists between Smart Contract calls. ▪ Calldata – Special read-only memory for storing function call metadata. Memory in the EVM
  14. 25 ▪ Memory in solidity is represented as a linear

    array of bytes ▪ Arithmetic can be made to manipulate elements inside memory. For example: elements of an array ▪ Operations for dealing with memory: ▪ mload(address) – retrieve value at given address ▪ mstore(address, value) – store value at given address ▪ msize – Current size of memory. Can be used for allocating new elements in memory Memory type
  15. 27 ▪ Storage is stored in the form of a

    Merkle-Patricia Trie ▪ This means elements cannot be accessed in a liner fashion as in the memory type ▪ Every variable in memory, has a slot and offset in it, where it can be found ▪ For a variable x, they can be accessed as x_slot and x_offset ▪ sload(slot + offset) – load value from storage ▪ sstore(slot + offset, value) – store value in storage Storage type
  16. 29 ▪ Special read-only memory for storing function metadata ▪

    Function signature, parameters… ▪ The place where msg.sender and similar special variables are stored also ▪ Copied in memory on public function calls ▪ Not copied & read-only on external function calls ▪ Therefore, calling external functions when passing large arrays of data is beneficial Calldata type
  17. 31 ▪ Programming languages differ in terms of the way

    they are built ▪ Compiled vs. Interpreted vs. VM Based ▪ Solidity is a VM-based language using the Ethereum Virtual Machine ▪ Solidity Assembly allows you to achieve low-level optimizations and implement missing language features ▪ Prefer high-level optimizations instead of relying on assembly Summary