Upgrade to Pro — share decks privately, control downloads, hide ads and more …

.NET IL: Into the Marianas Trench

.NET IL: Into the Marianas Trench

Are you interested in writing compilers, targeting Web Assembly, finding security issues automatically, binary analysis, or understanding performance at a low level? While it’s always good to know how your language works, the benefits of understanding the intermediate language extend to metaprogramming and analysis across multiple source languages. Learning how to work with intermediate languages allows you to write programs which would seem unattainable otherwise. You will learn not only how IL works but how it compares with LLVM IR, Java Bytecode, and other intermediate representations. No mere “deep dive,” you’ll leave this talk really understanding how C# turns into microcode and how to use that information to do “impossible” things.

56e5c49368a2e0ab999848a8d9e3c116?s=128

Craig Stuntz

January 12, 2018
Tweet

Transcript

  1. C R A I G S T U N T

    Z ∈ I M P R O V I N G . N E T I L : I N T O T H E M A R I A N A S T R E N C H https://speakerdeck.com/craigstuntz
  2. N O A A O K E A N O

    S E X P L O R E R 2 0 1 6 A L L P H O T O S F R O M http://oceanexplorer.noaa.gov/okeanos/explorations/ex1605/welcome.html
  3. P R E V I E W • What is

    an intermediate language? • Why work at this level? • Specifics of .NET IL • Cool tools and techniques • Impossible things!
  4. C O N T I N U O U S

    W H Y “Execution powered by a new IL interpreter”
  5. M O N O W H Y

  6. M O N O T O U C H W

    H Y
  7. U N I T Y W H Y

  8. J S I L W H Y ?

  9. C O M P I L E R S W

    H Y • Emit module binaries directly • Emit modules via Reflection.Emit • Emit IL text representation, compile that with ILASM
  10. E V E RY D E C O M P

    I L E R W H Y
  11. D O T F U S C AT O R

    W H Y “Dotfuscator uses ildasm and ilasm to process the input assemblies.”
  12. F I Z I L W H Y

  13. I N T E R M E D I AT

    E L A N G U A G E S A L L A B O U T
  14. C O M P I L E R S ,

    G E N E R A L LY I N T E R P R E T E R I N P U T C O D E M A C H I N E C O D E C O M P I L E R C O D E M A C H I N E C O D E
  15. C O M P I L I N G C

    # C S C C O D E . N E T I L Compile (Developer Machine) R y u J I T I L M A C H I N E C O D E .NET Virtual Execution System (End User Machine)
  16. IL_0000: nop IL_0001: ldstr "Hello World!" IL_0006: call void [System.Console]System.Console::WriteLine(

    { Console.WriteLine("Hello World!"); } 00007FFBA9E60B43 mov rcx,208F7F93068h 00007FFBA9E60B4D mov rcx,qword ptr [rcx] 00007FFBA9E60B50 call 00007FFBA9E60708 00007FFBA9E60B55 nop J I T C O M P I L AT I O N I L C# x 6 4
  17. I N T E R M E D I AT

    E L A N G U A G E S , E V E RY W H E R E … I L C# x 6 4 B Y T E C O D E J AVA x 6 4 WA S M C++ x 6 4
  18. B Y A N Y O T H E R

    N A M E … B Y T E C O D E I N T E R M E D I AT E R E P R E S E N TAT I O N I L WA S M S T G P - C O D E
  19. – D R . R I C H A R

    D H I P P “SQLite consists of… -Compiler to translate SQL into byte code -Virtual Machine to evaluate the byte code” Lecture presented at Ohio State University, 10 October 2017
  20. . N E T I L J AVA B Y

    T E C O D E S T R U C T S ? ✅ ❌ U N S I G N E D I N T S ? ✅ ❌ G E N E R I C S ? ✅ ❌ o u t & re f A R G S ? ✅ ❌
  21. – J O H N B A C K U

    S “Underlying every programming language is a model of a computing system that its programs control.”
  22. . N E T I L / J AVA B

    Y T E C O D E L LV M I R M A C H I N E I N D E P E N D E N T ? ✅ ❌ Q U I C K T O J I T ? ✅ ❌ U N D E F I N E D B E H AV I O R ? A T I N Y B I T E N O U G H T O I M P L E M E N T C R E G I S T E R S ? ❌ ✅
  23. I N T E R M E D I AT

    E L A N G U A G E S A S M O P T I M I Z E D ? ✅ ❌ S T R O N G LY D E V I C E - S P E C I F I C ? ❌ ✅ C A N B E C O M P I L E D ? ✅ ✅ C A N B E I N T E R P R E T E D / J I T T E D ? ✅ ❌ C A N B E R E - O P T I M I Z E D AT R U N T I M E ? ✅ ❌
  24. I N T E R M E D I AT

    E L A N G U A G E . N E T
  25. H E L L O W O R L D

    , R E V I S I T E D static void Main(string[] args) { Console.WriteLine("Hello World!"); } .method private hidebysig static void Main ( string[] args ) cil managed { $% Code Size: 13 (0xD) bytes .maxstack 8 .entrypoint IL_0000: nop IL_0001: ldstr "Hello World!" IL_0006: call void [System.Console]System.Console::WriteLine(string) IL_000B: nop IL_000C: ret }
  26. R E P R E S E N TAT I

    O N S • Binary (Assembly/Module) • Text (ILAsm format)
  27. A R E A L LY C O N F

    U S I N G T H I N G • CIL: The instruction set understood by the VES • ILAsm: Short for “IL Assembly Language” • ILASM: A utility which compiles an extended form of ILAsm to binary assemblies.
  28. M E M O RY M O D E L

    S W i n / x 8 6 R E G I S T E R S S TA C K H E A P I L A R G U M E N T TA B L E E VA L U AT I O N S TA C K L O C A L S TA B L E F I E L D S
  29. M E TA D ATA .NET IL Assembler, by Serge

    Lidin, p. 188
  30. P R I M I T I V E T

    Y P E S • Void • Bool • Certain numeric types • signed/unsigned int8, int16, int32, int64, native int • Single/double float • Chars • TypedRef • Pointers • Signed or unsigned, implementation defined size
  31. E VA L U AT I O N S TA

    C K T Y P E S • Certain numeric types • int32, int64, native int, F • Chars • Object references (types undistinguished) • Pointers • Managed • Unmanaged • User-defined value types
  32. FA K I N G O T H E R

    T Y P E S • Unsigned integers • Boolean • Chars • Arrays • Enums • Native (interop) types
  33. E N U M S .class public enum Color {

    .field public specialname int32 __value .field public static literal valuetype Color Red = int32 (1) .field public static literal valuetype Color Green = int32 (2) .field public static literal valuetype Color Blue = int32 (3) }
  34. C L A S S E S .namespace CodeMash.Food {

    .class public Pizza { $$. } }
  35. C L A S S E S .class public CodeMash.Food.Pizza

    { $$. }
  36. C L A S S E S .class public auto

    ansi abstract sealed beforefieldinit ImpossibLe.TailCall extends [mscorlib]System.Object { Auto field layout (default) Marshal strings to unmanaged code as ANSI This is complicated.
  37. F I E L D S .field private static valuetype

    [Mono.C ecil]Mono.C ecil.Cil.OpCode[] callInstructions
  38. M E T H O D S T I N

    Y H E A D E R C I L C O D E ( < 6 4 B Y T E S ) FAT H E A D E R ( F L A G S , M A X S TA C K , L O C A L S ) C I L C O D E S E H TA B L E
  39. L O C A L S .locals init ( [0]

    class ImpossibLe.TailCall/'<>c__DisplayClass0_0`2'<!!A, !!R> 'CS$<>8__locals0', [1] string 'assembly', [2] string assemblyExt, [3] string tempFileName, [4] class [Mono.C ecil]Mono.C ecil.AssemblyDefinition assemblyDefinition, [5] class [Mono.C ecil]Mono.C ecil.TypeDefinition typeDefinition, [6] class [Mono.C ecil]Mono.C ecil.MethodDefinition definition, [7] class [Mono.C ecil]Mono.C ecil.MethodReference methodReference, [8] class [Mono.C ecil]Mono.C ecil.Cil.ILProcessor ilProcessor, [9] class ImpossibLe.TailCall/TailCallData tailCallData, [10] class [mscorlib]System.R eflection.Assembly rewrittenAssembly, [11] class [mscorlib]System.Type rewrittenType, [12] class [mscorlib]System.R eflection.MethodInfo rewrittenMethod, [13] class [System.Core]System.Linq.Expressions.Expression 'instance', [14] class [mscorlib]System.Func`3<!!A, !!R, !!R> result, [15] int32 index, [16] bool, [17] class [mscorlib]System.Collections.Generic.IEnumerator`1<class ImpossibLe.TailCall/StlocBrReturnData>, [18] class ImpossibLe.TailCall/StlocBrReturnData otherExit, [19] bool, [20] class [mscorlib]System.Func`3<!!A, !!R, !!R> )
  40. T H E E VA L U AT I O

    N S TA C K 1 : i n t 2 : i n t 3 : i n t 1 : i n t ldc.i4.1 ldc.i4.2 add
  41. a d d S TA C K T R A

    N S I T I O N A L B E H AV I O R The stack transitional behavior, in sequential order, is: 1. value1 is pushed onto the stack. 2. value2 is pushed onto the stack. 3. value2 and value1 are popped from the stack; value1 is added to value2. 4. The result is pushed onto the stack. https://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.add(v=vs.110).aspx
  42. S TAT E O F T H E S TA

    C K • Empty on method entry and exit • Stack will be mapped to registers by the runtime when possible • “Stack underflow” errors?
  43. V E R I F I A B L E

    I L • Subset of “Correct” IL • Don’t confuse with C# “safe” • Restrictions on managed pointers and unmanaged code • Other instructions limited to verifiability conditions • Many other restrictions!
  44. V E R I F I A B L E

    I L PS> peverify C:\Users\craig\AppData\Local\Temp\tmpBC4F.tmp Microsoft (R) .NET Framework PE Verifier. Version 4.0.30319.0 Copyright (c) Microsoft Corporation. All rights reserved. [IL]: Error: [C:\Users\craig\AppData\Local\Temp\tmpBC4F.tmp : ImpossibLe.Program::SumTailRecursive][offset 0x0000000D] Branch out of the method.
  45. I N S T R U C T I O

    N S ldc.i4 314 ldc.i4 159 bge SomeLabel
  46. S H O R T F O R M S

    , U N S I G N E D VA L U E S , C O N S I S T I N S T R U C T I O N PA R A M E T E R T Y P E E VA L U AT I O N S TA C K VA L U E T Y P E b g e L O N G I N T S I G N E D b g e . s S H O R T I N T S I G N E D b g e . u n L O N G I N T U N S I G N E D b g e . u n . s S H O R T I N T U N S I G N E D l d c . i 4 . 0 C O N S TA N T 0 N / A
  47. A R I T H M E T I C

    I N S T R U C T I O N S static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret
  48. A R I T H M E T I C

    I N S T R U C T I O N S • nop • dup/pop • ldc.i4/ldc.r4/ldind.*/stind • add/sub/mul/div/rem/neg • and/or/xor/not • shl/shr • conv.* • ceq/cgt/clt/ckfinite • cpblk/initblk
  49. L A B E L S A N D C

    O N T R O L F L O W static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret
  50. L A B E L S A N D C

    O N T R O L F L O W static int Add1Log(int number) { try { return number + 1; } finally { Debug.Write("Done!"); } } .try { ldarg.0 ldc.i4.1 add stloc.0 leave.s IL_0016 } finally { ldstr "Done!" call void [System.Diagnosti endfinally } ldloc.0 ret
  51. L A B E L S A N D C

    O N T R O L F L O W • br • brfalse/brtrue • beq/bne • bge/bgt/ble/blt • switch • break • leave/endfilter/endfinally • ret
  52. A R G U M E N T S A

    N D L O C A L S static int Add1(int number) { return number + 1; } ldarg.0 ldc.i4.1 add ret
  53. A R G U M E N T S A

    N D L O C A L S • ldarg/ldarga • starg • arglist • ldloc/ldloca • stloc • localloc
  54. F I E L D S var x = Guid.Empty;

    ldsfld valuetype System.Guid::Empty stloc.0
  55. F I E L D S • ldfld/ldsfld/ldflda/ldsflda • stfld/stsfld

    • unaligned/volatile
  56. I N V O K I N G M E

    T H O D S static int Add1Log(int number) { try { return number + 1; } finally { Debug.Write("Done!"); } } .try { ldarg.0 ldc.i4.1 add stloc.0 leave.s IL_0016 } finally { ldstr "Done!" call void [System.Diagnosti endfinally } ldloc.0 ret
  57. I N V O K I N G M E

    T H O D S • jmp • call/callvirt • ldftn/ldvirtftn • calli • tail/constrained
  58. A D D R E S S I N G

    O B J E C T S var x = new Object(); newobj instance void System.Object::.ctor() stloc.0
  59. A D D R E S S I N G

    O B J E C T S • ldnull • ldstr • ldobj/stobj/cpobj • newobj/initobj • castclass/isinst • box/unbox/unbox.any • mkrefany/refanytype/refanyval • ldtoken • sizeof • throw/rethrow
  60. A R R AY S var x = new int[1];

    x[0] = 1; ldc.i4.1 newarr [System.Runtime]System.Int32 stloc.0 ldloc.0 ldc.i4.0 ldc.i4.1 stelem.i4
  61. A R R AY S • newarr • ldlen •

    ldelema • ldelem.*/stelem.* • readonly
  62. D ATA S E C T I O N .class

    public value sealed MagicNumber { .field public static int32 MagicOne at D_00 } .data D_00 = int32 (123)
  63. E X C E P T I O N S

    .try { IL_0002: call uint64 ImpossibLe.Program::WillNotCrash() IL_0007: stloc.0 IL_0008: ldloca.s result IL_000A: call instance string [mscorlib]System.UInt64::ToString() IL_000F: call void [mscorlib]System.Console::WriteLine(string) IL_0016: leave.s IL_0034 } $% end .try catch [mscorlib]System.Exception { IL_0018: stloc.1 IL_001A: call class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Error() IL_001F: ldloc.1 IL_0020: callvirt instance string [mscorlib]System.Object::ToString() IL_0025: callvirt instance void [mscorlib]System.IO.TextWriter::WriteLine(string) IL_002B: call string [mscorlib]System.Console::ReadLine() IL_0030: pop IL_0032: leave.s IL_0034 } $% end handler IL_0034: ret
  64. F I N A L LY .method public static int32

    main ( string[] argv ) cil managed { /* $$. */ IL_0000: call void [Fizil.Instrumentation]Fizil.Instrumentation.Instrument::Open() .try { /* Lots. Of. Code.... */ IL_00FF: leave.s IL_0107 } $% end .try finally { IL_0101: call void [Fizil.Instrumentation]Fizil.Instrumentation.Instrument::Close() IL_0106: endfinally } $% end handler L_0107: ldloc.s 12 IL_0109: ret } $% end of method Program::main
  65. I L U N U S E D I N

    C # • Global (*to a module) variables • Classless methods • Tail calls • Memberless value types as static data
  66. D I S A S S E M B LY

  67. I L D A S M >ildasm /TEXT Program.exe $%

    Microsoft (R) .NET Framework IL Disassembler. Version 4.6.1055.0 $% Copyright (c) Microsoft Corporation. All rights reserved. $% Metadata version: v4.0.30319 .assembly extern mscorlib { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) $% .z\V.4.. .ver 4:0:0:0 } .assembly extern Mono.C ecil { .publickeytoken = (07 38 EB 9F 13 2E D7 56 ) $% .8.....V .ver 0:9:6:0 } .assembly extern System.Core { .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) $% .z\V.4.. .ver 4:0:0:0 }
  68. L I N Q PA D

  69. d n S p y

  70. A S S E M B LY

  71. I L A S M

  72. R E F L E C T I O N

    . E M I T let private emit (ilg : Emit.ILGenerator) inst = match inst with | Add $→ ilg.Emit(OpCodes.Add) | Call mi $→ ilg.Emit(OpCodes.Call, mi) | Callvirt mi $→ ilg.Emit(OpCodes.Callvirt, mi) | DeclareLocal t $→ ignore(ilg.D eclareLocal(t)) | Div $→ ilg.Emit(OpCodes.Div) | LdArg_0 $→ ilg.Emit(OpCodes.Ldarg_0) | Ldc_I4 n $→ ilg.Emit(OpCodes.Ldc_I4, n) | Ldc_I4_0 $→ ilg.Emit(OpCodes.Ldc_I4_0) https://github.com/CraigStuntz/TinyLanguage/blob/ea4b7dd5a2bd454a5366247de65e18b2aa5de9da/TinyLanguage/Il.fs#L61
  73. M O N O . C E C I L

    / D N L I B let private removeStrongName (assemblyDefinition : AssemblyDefinition) = let name = assemblyDefinition.Name; name.HasPublicKey 0← false; name.P ublicKey 0← Array.empty; assemblyDefinition.Modules |> Seq.iter ( fun moduleDefinition $→ moduleDefinition.Attributes 0← moduleDefinition.Attributes &&& ˜˜˜ModuleAttributes.StrongNameSigned) let aptca = assemblyDefinition.CustomAttributes.FirstOrDefault( fun attr $→ attr.AttributeType.FullName = typeof<System.Security.AllowPartiallyTrustedCallersAttribute>.FullName) assemblyDefinition.CustomAttributes.Remove aptca |> ignore
  74. I L I N T E R P R E

    T E R http://mattwarren.org/2017/03/30/The-.NET-IL-Interpreter/
  75. I M P O S S I B L E

    T H I N G S
  76. M O D I F Y I N G B

    I N A R I E S let private insertTraceInstruction(ilProcessor, before, state) = let compileTimeRandom = state.Random.Next(0, UInt16.MaxValue |> Convert.ToInt32) let ldArg = ilProcessor.Create(OpCodes.Ldc_I4, compileTimeRandom) let callTrace = ilProcessor.Create(OpCodes.Call, state.Trace) ilProcessor.InsertBefore(before, ldArg) ilProcessor.InsertAfter (ldArg, callTrace) https://github.com/CraigStuntz/Fizil/blob/master/Fizil.Fuzzer/CilInstrument.fs
  77. W E I R D I N S T R

    U C T I O N S • tail • Global functions
  78. TA I L C A L L S 1 +

    2 + 3 + 4 + 5 = 15
  79. TA I L C A L L S static UInt64

    SumTailRecursive(UInt64 n, UInt64 accum) { if (n < 1) { return accum; } else { return SumTailRecursive(n - 1, n + accum); } } return SumTailRecursive(30000, 0);
  80. TA I L C A L L S

  81. TA I L C A L L S var tailCallVersion

    = TailCall.Rewrite<UInt64, UInt64>(null, SumTailRecursive); return tailCallVersion(50000, 0);
  82. TA I L C A L L S

  83. IL_0000: nop IL_0001: ldarg.0 IL_0002: ldc.i4.1 IL_0003: conv.i8 IL_0004: clt.un

    IL_0006: stloc.0 IL_0007: ldloc.0 IL_0008: brfalse.s IL_000F IL_000A: nop IL_000B: ldarg.1 IL_000C: stloc.1 IL_000D: br.s IL_001F IL_000F: nop IL_0010: ldarg.0 IL_0011: ldc.i4.1 IL_0012: conv.i8 IL_0013: sub IL_0014: ldarg.0 IL_0015: ldarg.1 IL_0016: add IL_0017: call uint64 ImpossibLe.Program::SumTailRecursive(uint64, uint64) IL_001C: stloc.1 IL_001D: br.s IL_001F IL_001F: ldloc.1 IL_0020: ret IL_0000: nop IL_0001: ldarg.0 IL_0002: ldc.i4.1 IL_0003: conv.i8 IL_0004: clt.un IL_0006: stloc.0 IL_0007: ldloc.0 IL_0008: brfalse.s IL_000D IL_000A: nop IL_000B: ldarg.1 IL_000C: ret IL_000D: nop IL_000E: ldarg.0 IL_000F: ldc.i4.1 IL_0010: conv.i8 IL_0011: sub IL_0012: ldarg.0 IL_0013: ldarg.1 IL_0014: add IL_0015: tail. IL_0017: call uint64 RewrittenNamespace.RewrittenType::SumTailRecursive(uint64, uint64) IL_001C: ret ❌ Step 1 Step 2 Step 3 TA I L C A L L S Base case exit Recursive case exit
  84. TA I L C A L L S TailCallData tailCallData

    = TryFindTailCall(definition.Body.Instructions, definition); while (tailCallData 02 null) { $% rewrite as tail call $% Step 1: Remove stloc/ldloc/br.s instructions before ret instruction for (var index = tailCallData.RetIndex -1; index > tailCallData.CallIndex; index--) { ilProcessor.Remove(definition.Body.Instructions[index]); } $% Step 2: Rewrite any stloc/br "returns" as proper rets foreach (var otherExit in tailCallData.OtherExits) { ilProcessor.R eplace(otherExit.BrInstruction, Instruction.Create(OpCodes.Ret)); ilProcessor.Remove(otherExit.StlocInstruction); } $% Step 3: Insert "tail" IL instruction before call ilProcessor.InsertBefore(tailCallData.CallInstruction, ilProcessor.Create(OpCodes.Tail)); $% any more? tailCallData = TryFindTailCall(definition.Body.Instructions, definition); } https://github.com/CraigStuntz/ImpossibLe
  85. F U R T H E R R E A

    D I N G • .NET IL Assembler, by Serge Lidin • ECMA-335 Standard, Common Language Infrastructure • LLVM Language Reference Manual
  86. None
  87. C O N TA C T craig.stuntz@improving.com @craigstuntz http://paperswelove.org/chapter/columbus/ https://speakerdeck.com/craigstuntz

    https://github.com/CraigStuntz/