Upgrade to Pro — share decks privately, control downloads, hide ads and more …

.NET IL: Into the Marianas Trench (with presenter notes)

.NET IL: Into the Marianas Trench (with presenter notes)

(This version contains presenter notes for my CodeMash 2018 presentation)

Are you interested in writing compilers, targeting Web Assembly, finding security issues automatically, binary analysis, or understanding performance at a low level? While it’s always good to know how your language works, the benefits of understanding the intermediate language extend to metaprogramming and analysis across multiple source languages. Learning how to work with intermediate languages allows you to write programs which would seem unattainable otherwise. You will learn not only how IL works but how it compares with LLVM IR, Java Bytecode, and other intermediate representations. No mere “deep dive,” you’ll leave this talk really understanding how C# turns into microcode and how to use that information to do “impossible” things.

Craig Stuntz

January 12, 2018
Tweet

More Decks by Craig Stuntz

Other Decks in Programming

Transcript

  1. C R A I G S T U N T

    Z ∈ I M P R O V I N G . N E T I L : I N T O T H E M A R I A N A S T R E N C H https://speakerdeck.com/craigstuntz Every year the CodeMash organizers do a survey to see what sort of sessions attendees would like to be included in the conference. Rob Gillen posted them to his blog. One of the comments was “More advanced level sessions containing information not easily found online.” So this will be a highly technical talk. We’ll see if people still want that when next year’s survey rolls around. (click) Slides I have a full hour of material today. No dead air at end of session. Please interrupt with questions if you think they’re of general interest to the audience here, and I would be more than happy to talk after the session if you would like to chat about anything else.
  2. N O A A O K E A N O

    S E X P L O R E R 2 0 1 6 A L L P H O T O S F R O M http://oceanexplorer.noaa.gov/okeanos/explorations/ex1605/welcome.html
  3. P R E V I E W • What is

    an intermediate language? • Why work at this level? • Specifics of .NET IL • Cool tools and techniques • Impossible things! I enjoy strange languages, and I love to share them, but I usually don’t have the opportunity to cover an entire language in a single talk.
  4. C O N T I N U O U S

    W H Y “Execution powered by a new IL interpreter” Why bother working at this level at all? What’s the point? A few examples of tools you really could not build at all without working with IL. You’re not allowed to compile arbitrary code on iOS, but interpretation is another matter. Easier to interpret IL than C#
  5. M O N O W H Y Powers: Running on

    Web Assembly, future work on Mixed Mode Execution, supporting Reflection.Emit in statically compiled Mono code, hot reloading.
  6. M O N O T O U C H W

    H Y The mtouch tool which compiles CIL to native code uses Cecil and the Mono.Linker to reduce the application size.
  7. U N I T Y W H Y often has

    to perform analysis and manipulation of .NET binaries, all done with Cecil.
  8. C O M P I L E R S W

    H Y •Emit module binaries directly •Emit modules via Reflection.Emit •Emit IL text representation, compile that with ILASM It seems obvious that IL can be useful for writing a compiler, but there are multiple ways to work here.
  9. E V E RY D E C O M P

    I L E R W H Y JustDecompile. dnSpy. Reflector. CodeReflect. You name it. Must parse binary IL assemblies. Usually via Mono.Cecil or dnlib.
  10. D O T F U S C AT O R

    W H Y “Dotfuscator uses ildasm and ilasm to process the input assemblies.”
  11. I N T E R M E D I AT

    E L A N G U A G E S A L L A B O U T What even is an intermediate language?
  12. C O M P I L E R S ,

    G E N E R A L LY I N T E R P R E T E R I N P U T C O D E M A C H I N E C O D E C O M P I L E R C O D E M A C H I N E C O D E How does a compiler work? (Click) How does an interpreter work?
  13. C O M P I L I N G C

    # C S C C O D E . N E T I L Compile (Developer Machine) R y u J I T I L M A C H I N E C O D E .NET Virtual Execution System (End User Machine) But that’s not how C# works (click)
  14. IL_0000: nop IL_0001: ldstr "Hello World!" IL_0006: call void [System.Console]System.Console::WriteLine(

    { Console.WriteLine("Hello World!"); } 00007FFBA9E60B43 mov rcx,208F7F93068h 00007FFBA9E60B4D mov rcx,qword ptr [rcx] 00007FFBA9E60B50 call 00007FFBA9E60708 00007FFBA9E60B55 nop J I T C O M P I L AT I O N I L C# x 6 4 Compilers: AOT / JIT Machine: Dev / End user Defining characteristic: Error checking / Speed of compilation Happens: Recompilation / Once
  15. I N T E R M E D I AT

    E L A N G U A G E S , E V E RY W H E R E … I L C# x 6 4 B Y T E C O D E J AVA x 6 4 WA S M C++ x 6 4 Taking the compilers out of the picture, and looking only at the representations…. These examples all work pretty similarly to C#. Compile on developer machine, compile again (JIT) on end user machine. Using Emscripten
  16. B Y A N Y O T H E R

    N A M E … B Y T E C O D E I N T E R M E D I AT E R E P R E S E N TAT I O N I L WA S M S T G P - C O D E STG: Spineless, Tagless G-Machine IL vs. P-Code vs. WASM
  17. – D R . R I C H A R

    D H I P P “SQLite consists of… -Compiler to translate SQL into byte code -Virtual Machine to evaluate the byte code” Lecture presented at Ohio State University, 10 October 2017 However, it turns out that “compile on dev machine, JIT on end user machine” is far from the only use case for intermediate languages. SQLite is either #1 or #2 of the most commonly used computer program in the world. There are billions of installations and > 1 trillion SQLite DBs in active use. Note the different use case: SQL compiled to bytecode and executed on same machine. If the first or second most commonly used library in the world is a byte code and interpreter, doesn’t that suggest we should understand the technique?
  18. . N E T I L J AVA B Y

    T E C O D E S T R U C T S ? ✅ ❌ U N S I G N E D I N T S ? ✅ ❌ G E N E R I C S ? ✅ ❌ o u t & re f A R G S ? ✅ ❌ Comparing…. There are a few extra features in .NET IL, but what’s even more interesting about the difference between them? (They all look like differences between compiled C# and compiled Java.)
  19. – J O H N B A C K U

    S “Underlying every programming language is a model of a computing system that its programs control.”
  20. . N E T I L / J AVA B

    Y T E C O D E L LV M I R M A C H I N E I N D E P E N D E N T ? ✅ ❌ Q U I C K T O J I T ? ✅ ❌ U N D E F I N E D B E H AV I O R ? A T I N Y B I T E N O U G H T O I M P L E M E N T C R E G I S T E R S ? ❌ ✅ This is more interesting. Bytecode and IL both look very different from LLVM IR. Let’s examine why….
  21. I N T E R M E D I AT

    E L A N G U A G E S A S M O P T I M I Z E D ? ✅ ❌ S T R O N G LY D E V I C E - S P E C I F I C ? ❌ ✅ C A N B E C O M P I L E D ? ✅ ✅ C A N B E I N T E R P R E T E D / J I T T E D ? ✅ ❌ C A N B E R E - O P T I M I Z E D AT R U N T I M E ? ✅ ❌ Finally, let’s compare intermediate languages in general with traditional assembler.
  22. I N T E R M E D I AT

    E L A N G U A G E . N E T
  23. H E L L O W O R L D

    , R E V I S I T E D static void Main(string[] args) { Console.WriteLine("Hello World!"); } .method private hidebysig static void Main ( string[] args ) cil managed { $% Code Size: 13 (0xD) bytes .maxstack 8 .entrypoint IL_0000: nop IL_0001: ldstr "Hello World!" IL_0006: call void [System.Console]System.Console::WriteLine(string) IL_000B: nop IL_000C: ret } 1) There are (click) parts of this which look like assembly, but there are (click) parts that very much don’t. 2) So what does this really mean? Immediately, we see a lot of metadata (static, names) not present in trad. ASM. 3) Also, (click) the constants have real types, and (click) the jumps have real method names
  24. R E P R E S E N TAT I

    O N S • Binary (Assembly/Module) • Text (ILAsm format) The most common way to write IL is with a compiler for some high level language. C# and Delphi emit binary representations (EXE/DLL) Some other languages like .NET versions of ADA, COBOL, and SML emit text to be compiled to binary via ILAsm
  25. A R E A L LY C O N F

    U S I N G T H I N G • CIL: The instruction set understood by the VES • ILAsm: Short for “IL Assembly Language” • ILASM: A utility which compiles an extended form of ILAsm to binary assemblies. The people who came up with this distinction did not have CodeMash presenters in mind. CIL is the .NET Virtual Execution System instruction set. ILAsm, mixed case, is a general purpose programming language closely designed around CIL. ILASM, ALL CAPS, is a tool which compiles the ILAsm, mixed case, into CIL.
  26. M E M O RY M O D E L

    S W i n / x 8 6 R E G I S T E R S S TA C K H E A P I L A R G U M E N T TA B L E E VA L U AT I O N S TA C K L O C A L S TA B L E F I E L D S In a .NET app, stack and heap exist, but you’ll rarely think about them. Registers are an implementation detail you don’t deal with directly. The evaluation stack, as we’ll see, is local to each method and is not the same thing as the stack. JITter converts the IL memory model to the x86 memory model.
  27. M E TA D ATA .NET IL Assembler, by Serge

    Lidin, p. 188 Descriptions of Assemblies, Modules, Types, etc. Represented in ILAsm as text, but it’s not instructions. Stored in the form of an optimized relational database Used by the loader
  28. P R I M I T I V E T

    Y P E S • Void • Bool • Certain numeric types • signed/unsigned int8, int16, int32, int64, native int • Single/double float • Chars • TypedRef • Pointers • Signed or unsigned, implementation defined size These are types known to the runtime that have specific type codes. Other types are built on top of these.
  29. E VA L U AT I O N S TA

    C K T Y P E S • Certain numeric types • int32, int64, native int, F • Chars • Object references (types undistinguished) • Pointers • Managed • Unmanaged • User-defined value types The types which can exist on the evaluation stack are even more limited. That’s the whole list. Even this list is kind of a lie. They’re all just numbers to the CLI. Primitive types are distinguished by the runtime for performance reasons. Other types are built from these types. CIL instructions understand only these types. Booleans, chars.
  30. FA K I N G O T H E R

    T Y P E S • Unsigned integers • Boolean • Chars • Arrays • Enums • Native (interop) types
  31. E N U M S .class public enum Color {

    .field public specialname int32 __value .field public static literal valuetype Color Red = int32 (1) .field public static literal valuetype Color Green = int32 (2) .field public static literal valuetype Color Blue = int32 (3) } This means the types your used to often have completely unfamiliar form. (click) enum keyword is ILAsm syntactic sugar for deriving from [mscorlib]System.Enum (click) Must have exactly one instance field and (click) one or more static literal fields.
  32. C L A S S E S .namespace CodeMash.Food {

    .class public Pizza { $$. } }
  33. C L A S S E S .class public CodeMash.Food.Pizza

    { $$. } Often the namespace is implicit. But this is the simple case. In practice, you see something more like:
  34. C L A S S E S .class public auto

    ansi abstract sealed beforefieldinit ImpossibLe.TailCall extends [mscorlib]System.Object { Auto field layout (default) Marshal strings to unmanaged code as ANSI This is complicated. Some of this is pretty recognizable (public). Others (click) TBH, I have to look this up every time. It’s fine. You don’t have to feel bad about it.
  35. F I E L D S .field private static valuetype

    [Mono.C ecil]Mono.C ecil.Cil.OpCode[] callInstructions Fields are fairly straightforward, but a couple things to call out (click) You’re required to use “valuetype” on a user-defined value type. (Including types defined in the BCL) (click) Assembly in brackets, plus fully qualified namespace/type name. Nothing implicit! There is such a thing as a “global field” which belongs to a module rather than a type. C# doesn’t use them.
  36. M E T H O D S T I N

    Y H E A D E R C I L C O D E ( < 6 4 B Y T E S ) FAT H E A D E R ( F L A G S , M A X S TA C K , L O C A L S ) C I L C O D E S E H TA B L E Classes often contain methods. It’s possible to have a method without a class in IL, but C# doesn’t expose this.
  37. L O C A L S .locals init ( [0]

    class ImpossibLe.TailCall/'<>c__DisplayClass0_0`2'<!!A, !!R> 'CS$<>8__locals0', [1] string 'assembly', [2] string assemblyExt, [3] string tempFileName, [4] class [Mono.C ecil]Mono.C ecil.AssemblyDefinition assemblyDefinition, [5] class [Mono.C ecil]Mono.C ecil.TypeDefinition typeDefinition, [6] class [Mono.C ecil]Mono.C ecil.MethodDefinition definition, [7] class [Mono.C ecil]Mono.C ecil.MethodReference methodReference, [8] class [Mono.C ecil]Mono.C ecil.Cil.ILProcessor ilProcessor, [9] class ImpossibLe.TailCall/TailCallData tailCallData, [10] class [mscorlib]System.R eflection.Assembly rewrittenAssembly, [11] class [mscorlib]System.Type rewrittenType, [12] class [mscorlib]System.R eflection.MethodInfo rewrittenMethod, [13] class [System.Core]System.Linq.Expressions.Expression 'instance', [14] class [mscorlib]System.Func`3<!!A, !!R, !!R> result, [15] int32 index, [16] bool, [17] class [mscorlib]System.Collections.Generic.IEnumerator`1<class ImpossibLe.TailCall/StlocBrReturnData>, [18] class ImpossibLe.TailCall/StlocBrReturnData otherExit, [19] bool, [20] class [mscorlib]System.Func`3<!!A, !!R, !!R> ) Just like in C#, you can declare locals mid-method if you want. But C# compiler doesn’t do this. These don’t line up 1:1 with your C# locals; the compiler elides some.
  38. T H E E VA L U AT I O

    N S TA C K 1 : i n t 2 : i n t 3 : i n t 1 : i n t ldc.i4.1 ldc.i4.2 add Elements of the stack are not words or bytes, but slots. Slots carry type metadata of their contents. Default size of the evaluation stack is 8. You can change this in IL. Compilers do it for you. Note that stack slots are typed. The types of slots can change as computation progresses.
  39. a d d S TA C K T R A

    N S I T I O N A L B E H AV I O R The stack transitional behavior, in sequential order, is: 1. value1 is pushed onto the stack. 2. value2 is pushed onto the stack. 3. value2 and value1 are popped from the stack; value1 is added to value2. 4. The result is pushed onto the stack. https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.add(v=vs.110).aspx Add example (pop 2, push 1) All instructions specify this. Note that items 1 and 2 are preconditions for using the instruction. Items 3 and 4 are postconditions
  40. S TAT E O F T H E S TA

    C K • Empty on method entry and exit • Stack will be mapped to registers by the runtime when possible • “Stack underflow” errors? “Empty” insofar as the final ret will pop one item off the stack
  41. V E R I F I A B L E

    I L • Subset of “Correct” IL • Don’t confuse with C# “safe” • Restrictions on managed pointers and unmanaged code • Other instructions limited to verifiability conditions • Many other restrictions! “InvalidProgramException / Common Language Runtime detected an invalid program”
  42. V E R I F I A B L E

    I L PS> peverify C:\Users\craig\AppData\Local\Temp\tmpBC4F.tmp Microsoft (R) .NET Framework PE Verifier. Version 4.0.30319.0 Copyright (c) Microsoft Corporation. All rights reserved. [IL]: Error: [C:\Users\craig\AppData\Local\Temp\tmpBC4F.tmp : ImpossibLe.Program::SumTailRecursive][offset 0x0000000D] Branch out of the method. Verifiable IL can be verified with the peverify utility. This is the first thing to try when the CLR won’t load your modified application.
  43. I N S T R U C T I O

    N S ldc.i4 314 ldc.i4 159 bge SomeLabel CIL instructions operate on both parameters and the evaluation stack. First two instructions each take a parameter, which is a constant int, and pushes it onto the stack Third examines the two values on the stack and then jumps to the offset passed as the parameter, “SomeLabel” Think of the evaluation stack as a “hidden” argument to most instructions.
  44. S H O R T F O R M S

    , U N S I G N E D VA L U E S , C O N S I S T I N S T R U C T I O N PA R A M E T E R T Y P E E VA L U AT I O N S TA C K VA L U E T Y P E b g e L O N G I N T S I G N E D b g e . s S H O R T I N T S I G N E D b g e . u n L O N G I N T U N S I G N E D b g e . u n . s S H O R T I N T U N S I G N E D l d c . i 4 . 0 C O N S TA N T 0 N / A The values compared by bge come from the evaluation stack. The parameter is the offset of the instruction to jump to if the comparison is true. Can be short or long int. Stack slot types don’t include unsigned types, so there are instructions to treat the values as unsigned when comparing.
  45. A R I T H M E T I C

    I N S T R U C T I O N S static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret I’m going to walk you through all the IL opcodes. The whole language. You can roughly group them into families. First I’ll show a code example and corresponding C# Then I’ll show the entire family One thing you see right away is that usually a single line of C# translates into many lines of IL.
  46. A R I T H M E T I C

    I N S T R U C T I O N S • nop • dup/pop • ldc.i4/ldc.r4/ldind.*/stind • add/sub/mul/div/rem/neg • and/or/xor/not • shl/shr • conv.* • ceq/cgt/clt/ckfinite • cpblk/initblk Ckfinite: Throws ArithmeticException if value is not a finite number. Cpblk: Copies a specified number bytes from a source address to a destination address The names of many of these opcodes accurately represent what they do, so I won’t necessarily call out each one, although I’ll mention the most important and any that have particularly confusing names. Shout out if I skip one that you’d like an explanation for.
  47. L A B E L S A N D C

    O N T R O L F L O W static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret “ret” requires one value on the evaluation stack
  48. L A B E L S A N D C

    O N T R O L F L O W static int Add1Log(int number) { try { return number + 1; } finally { Debug.Write("Done!"); } } .try { ldarg.0 ldc.i4.1 add stloc.0 leave.s IL_0016 } finally { ldstr "Done!" call void [System.Diagnosti endfinally } ldloc.0 ret
  49. L A B E L S A N D C

    O N T R O L F L O W • br • brfalse/brtrue • beq/bne • bge/bgt/ble/blt • switch • break • leave/endfilter/endfinally • ret break is a breakpoint, not a switch case
  50. A R G U M E N T S A

    N D L O C A L S static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret You’ve seen this code before, but this time we’ll look at how to put a method argument on the evaluation stack
  51. A R G U M E N T S A

    N D L O C A L S • ldarg/ldarga • starg • arglist • ldloc/ldloca • stloc • localloc Ldarga: Load an argument address onto the evaluation stack Arglist: Returns an unmanaged pointer to the argument list of the current method. Localloc: Allocates a certain number of bytes from the local dynamic memory pool and pushes the address (a transient pointer, type *) of the first allocated byte onto the evaluation stack
  52. F I E L D S var x = Guid.Empty;

    ldsfld valuetype System.Guid::Empty stloc.0 (It’s actually ldsfld valuetype [System.Runtime]System.Guid [System.Runtime]System.Guid::Empty, but I had to truncate to fit on the slide.)
  53. F I E L D S • ldfld/ldsfld/ldflda/ldsflda • stfld/stsfld

    • unaligned/volatile unaligned and volatile are prefixes. unaligned indicates that an address currently atop the evaluation stack might not be aligned to the natural size of the immediately following instruction.
  54. I N V O K I N G M E

    T H O D S static int Add1Log(int number) { try { return number + 1; } finally { Debug.Write("Done!"); } } .try { ldarg.0 ldc.i4.1 add stloc.0 leave.s IL_0016 } finally { ldstr "Done!" call void [System.Diagnosti endfinally } ldloc.0 ret Again, you’ve seen this before, but now let’s look at how to call a method.
  55. I N V O K I N G M E

    T H O D S • jmp • call/callvirt • ldftn/ldvirtftn • calli • tail/constrained You’ll almost never see jmp. It’s not verifiable Callvirt works for virtual and non-virtual methods, but requires a valid this pointer and can’t be used on a value type’s methods ldftn/ldvirtftn/calli for function pointers tail/constrained are prefix. constrained prepares the runtime to call a generic method which might be on a struct or a class
  56. A D D R E S S I N G

    O B J E C T S var x = new Object(); newobj instance void System.Object::.ctor() stloc.0 (It’s actually newobj instance void [System.Runtime]System.Object::.ctor(), but I had to truncate to fit on the slide)
  57. A D D R E S S I N G

    O B J E C T S • ldnull • ldstr • ldobj/stobj/cpobj • newobj/initobj • castclass/isinst • box/unbox/unbox.any • mkrefany/refanytype/refanyval • ldtoken • sizeof • throw/rethrow mkrefany converts managed or unmanaged pointer to a typedref, and pushes that onto the stack. You can use these as arguments to methods which expect typedrefs castclass attempts to cast a class passed by reference and put it onto the stack as an object unbox pushes a pointer to a value type onto the stack unbox.any pushes the value onto the stack and is symmetrical to box ldtoken: Take a value and pushes a RuntimeHandle for the specified metadata token. A RuntimeHandle can be a fieldref/fielddef, a methodref/methoddef, or a typeref/ typedef. Used for reflection.
  58. A R R AY S var x = new int[1];

    x[0] = 1; ldc.i4.1 newarr [System.Runtime]System.Int32 stloc.0 ldloc.0 ldc.i4.0 ldc.i4.1 stelem.i4
  59. A R R AY S • newarr • ldlen •

    ldelema • ldelem.*/stelem.* • readonly readonly is a prefix. Specifies that the subsequent array address operation performs no type check at run time, and that it returns a managed pointer whose mutability is restricted. The purpose of the readonly prefix is to avoid a type check when fetching an element from an array in generic code.
  60. D ATA S E C T I O N .class

    public value sealed MagicNumber { .field public static int32 MagicOne at D_00 } .data D_00 = int32 (123) Classes and methods can both have data sections. Class .data shown here. You can initialize fields without burning CPU cycles in the constructor Method .data is really only for exception tables. So let’s talk about exceptions.
  61. E X C E P T I O N S

    .try { IL_0002: call uint64 ImpossibLe.Program::WillNotCrash() IL_0007: stloc.0 IL_0008: ldloca.s result IL_000A: call instance string [mscorlib]System.UInt64::ToString() IL_000F: call void [mscorlib]System.Console::WriteLine(string) IL_0016: leave.s IL_0034 } $% end .try catch [mscorlib]System.Exception { IL_0018: stloc.1 IL_001A: call class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Error() IL_001F: ldloc.1 IL_0020: callvirt instance string [mscorlib]System.Object::ToString() IL_0025: callvirt instance void [mscorlib]System.IO.TextWriter::WriteLine(string) IL_002B: call string [mscorlib]System.Console::ReadLine() IL_0030: pop IL_0032: leave.s IL_0034 } $% end handler IL_0034: ret Some non-.NET languages do exception handling entirely in the standard library. Other non-.NET langs use the Win32/64 SEH mechanism. In .NET, though, it’s a core feature of the CLR. This is stored as method .data, but there’s syntactic sugar in ILAsm to make it look more C#-ish. The interesting thing here is the leave instruction, which jumps out of a protected block to the exit of the method
  62. F I N A L LY .method public static int32

    main ( string[] argv ) cil managed { /* $$. */ IL_0000: call void [Fizil.Instrumentation]Fizil.Instrumentation.Instrument::Open() .try { /* Lots. Of. Code.... */ IL_00FF: leave.s IL_0107 } $% end .try finally { IL_0101: call void [Fizil.Instrumentation]Fizil.Instrumentation.Instrument::Close() IL_0106: endfinally } $% end handler L_0107: ldloc.s 12 IL_0109: ret } $% end of method Program::main Try/finally is similar. You need to use the leave instruction, and the finally block may have zero or more endfinally instructions.
  63. I L U N U S E D I N

    C # • Global (*to a module) variables • Classless methods • Tail calls • Memberless value types as static data
  64. I L D A S M >ildasm /TEXT Program.exe $%

    Microsoft (R) .NET Framework IL Disassembler. Version 4.6.1055.0 $% Copyright (c) Microsoft Corporation. All rights reserved. $% Metadata version: v4.0.30319 .assembly extern mscorlib { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) $% .z\V.4.. .ver 4:0:0:0 } .assembly extern Mono.C ecil { .publickeytoken = (07 38 EB 9F 13 2E D7 56 ) $% .8.....V .ver 0:9:6:0 } .assembly extern System.Core { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) $% .z\V.4.. .ver 4:0:0:0 } There’s a GUI, but that’s not why you’d ever want to use it.
  65. L I N Q PA D One of the best

    ways to begin to understand IL is to read IL corresponding to C# you write. A super-simple way to do this is with the free LINQPad. Enter arbitrary C# or (click) F#! and instantly see the corresponding IL.
  66. d n S p y You can hover over an

    IL opcode and get pop-up help
  67. R E F L E C T I O N

    . E M I T let private emit (ilg : Emit.ILGenerator) inst = match inst with | Add $→ ilg.Emit(OpCodes.Add) | Call mi $→ ilg.Emit(OpCodes.Call, mi) | Callvirt mi $→ ilg.Emit(OpCodes.Callvirt, mi) | DeclareLocal t $→ ignore(ilg.D eclareLocal(t)) | Div $→ ilg.Emit(OpCodes.Div) | LdArg_0 $→ ilg.Emit(OpCodes.Ldarg_0) | Ldc_I4 n $→ ilg.Emit(OpCodes.Ldc_I4, n) | Ldc_I4_0 $→ ilg.Emit(OpCodes.Ldc_I4_0) https://github.com/CraigStuntz/TinyLanguage/blob/ea4b7dd5a2bd454a5366247de65e18b2aa5de9da/TinyLanguage/Il.fs#L61
  68. M O N O . C E C I L

    / D N L I B let private removeStrongName (assemblyDefinition : AssemblyDefinition) = let name = assemblyDefinition.Name; name.HasPublicKey 0← false; name.P ublicKey 0← Array.empty; assemblyDefinition.Modules |> Seq.iter ( fun moduleDefinition $→ moduleDefinition.Attributes 0← moduleDefinition.Attributes &&& ˜˜˜ModuleAttributes.StrongNameSigned) let aptca = assemblyDefinition.CustomAttributes.FirstOrDefault( fun attr $→ attr.AttributeType.FullName = typeof<System.Security.AllowPartiallyTrustedCallersAttribute>.FullName) assemblyDefinition.CustomAttributes.Remove aptca |> ignore
  69. I L I N T E R P R E

    T E R http://mattwarren.org/2017/03/30/The-.NET-IL-Interpreter/
  70. I M P O S S I B L E

    T H I N G S
  71. M O D I F Y I N G B

    I N A R I E S let private insertTraceInstruction(ilProcessor, before, state) = let compileTimeRandom = state.Random.Next(0, UInt16.MaxValue |> Convert.ToInt32) let ldArg = ilProcessor.Create(OpCodes.Ldc_I4, compileTimeRandom) let callTrace = ilProcessor.Create(OpCodes.Call, state.Trace) ilProcessor.InsertBefore(before, ldArg) ilProcessor.InsertAfter (ldArg, callTrace) https://github.com/CraigStuntz/Fizil/blob/master/Fizil.Fuzzer/CilInstrument.fs Instrumentation, obfuscation (DotFuscator)
  72. W E I R D I N S T R

    U C T I O N S • tail • Global functions Stuff supported in IL but not C#: Global functions, tail calls Used to include exception filters, but C# does this now.
  73. TA I L C A L L S 1 +

    2 + 3 + 4 + 5 = 15 So let’s really talk about tail recursion
  74. TA I L C A L L S static UInt64

    SumTailRecursive(UInt64 n, UInt64 accum) { if (n < 1) { return accum; } else { return SumTailRecursive(n - 1, n + accum); } } return SumTailRecursive(30000, 0); This is written in a tail recursive style, but C# won’t compile it to proper tail recursion and will grow the stack. (click) So what if we call this with n set to 30000?
  75. TA I L C A L L S That’s no

    good. They’re reporting me to Microsoft!
  76. TA I L C A L L S var tailCallVersion

    = TailCall.Rewrite<UInt64, UInt64>(null, SumTailRecursive); return tailCallVersion(50000, 0); OK, no problem, we’ll just call this neat function I wrote that takes a non-tail-recursive function and returns the same function as tail recursive.
  77. TA I L C A L L S Now it

    works fine! Only real problem is it shouldn’t be possible to write a function like my TailCall.Rewrite in C#, but I assure you this example is real, it works, and you can get it from my GitHub. How does it work?
  78. IL_0000: nop IL_0001: ldarg.0 IL_0002: ldc.i4.1 IL_0003: conv.i8 IL_0004: clt.un

    IL_0006: stloc.0 IL_0007: ldloc.0 IL_0008: brfalse.s IL_000F IL_000A: nop IL_000B: ldarg.1 IL_000C: stloc.1 IL_000D: br.s IL_001F IL_000F: nop IL_0010: ldarg.0 IL_0011: ldc.i4.1 IL_0012: conv.i8 IL_0013: sub IL_0014: ldarg.0 IL_0015: ldarg.1 IL_0016: add IL_0017: call uint64 ImpossibLe.Program::SumTailRecursive(uint64, uint64) IL_001C: stloc.1 IL_001D: br.s IL_001F IL_001F: ldloc.1 IL_0020: ret IL_0000: nop IL_0001: ldarg.0 IL_0002: ldc.i4.1 IL_0003: conv.i8 IL_0004: clt.un IL_0006: stloc.0 IL_0007: ldloc.0 IL_0008: brfalse.s IL_000D IL_000A: nop IL_000B: ldarg.1 IL_000C: ret IL_000D: nop IL_000E: ldarg.0 IL_000F: ldc.i4.1 IL_0010: conv.i8 IL_0011: sub IL_0012: ldarg.0 IL_0013: ldarg.1 IL_0014: add IL_0015: tail. IL_0017: call uint64 RewrittenNamespace.RewrittenType::SumTailRecursive(uint64, uint64) IL_001C: ret ❌ Step 1 Step 2 Step 3 TA I L C A L L S Base case exit Recursive case exit It’s complicated, but here’s the transformation we want to do. Original IL emitted by C# compiler on left, tail recursive version on right. If you look closely, it’s entirely the same, (click) except for the two exits. The first is the base case, which C# compiles as…. second is the recursive case. All we need to do is (click) detect the pattern of a tail recursive function and rewrite the IL accordingly.
  79. TA I L C A L L S TailCallData tailCallData

    = TryFindTailCall(definition.Body.Instructions, definition); while (tailCallData 02 null) { $% rewrite as tail call $% Step 1: Remove stloc/ldloc/br.s instructions before ret instruction for (var index = tailCallData.RetIndex -1; index > tailCallData.CallIndex; index--) { ilProcessor.Remove(definition.Body.Instructions[index]); } $% Step 2: Rewrite any stloc/br "returns" as proper rets foreach (var otherExit in tailCallData.OtherExits) { ilProcessor.R eplace(otherExit.BrInstruction, Instruction.Create(OpCodes.Ret)); ilProcessor.Remove(otherExit.StlocInstruction); } $% Step 3: Insert "tail" IL instruction before call ilProcessor.InsertBefore(tailCallData.CallInstruction, ilProcessor.Create(OpCodes.Tail)); $% any more? tailCallData = TryFindTailCall(definition.Body.Instructions, definition); } https://github.com/CraigStuntz/ImpossibLe To see the whole story you’ll have to follow the link here to get the whole project, but here’s the high-level view. (click) we disassemble the method and parse the body. (click) Then remove the instruction which stores the function’s return value to the evaluation stack. (click) Look up any instruction which implements C# return by jumping to the end of the method and change to a real IL “ret” (explain why) (click) Finally, insert the “tail” IL instruction. Then we write and load the modified assembly, and get a function reference to the new type and method there to return to the caller.
  80. F U R T H E R R E A

    D I N G • .NET IL Assembler, by Serge Lidin • ECMA-335 Standard, Common Language Infrastructure • LLVM Language Reference Manual IL reflects CLI design, so… ECMA standard, Partition III defines IL instructions, Partition VI, Annex B has sample applications, and later annexes explain ilasm and give a machine-readable formal grammar. .NET IL Assembler is
  81. The conference organizers have asked me to ask you to

    please rate the session. Just open up AttendeeHub. If it crashes, open it again Click the “Session Survey” link highlighted here.