Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bite-sized ByteCode and ClassLoaders

alblue
June 09, 2020

Bite-sized ByteCode and ClassLoaders

This talk looks at how JVM classes are created, how the JVM loads classes with ClassLoaders and the ways in which classes can be manipulated and generated at runtime. It was run as part of the London Java Community virtual meetup series. The presentation was recorded, and is available at YouTube: https://www.youtube.com/watch?v=_ZF1HDTjSSY

The corresponding GitHub repository is at https://github.com/alblue/jvmulator/

If you want a more polished tool for stepping through real bytecode, use Chris Newlands' JITWatch https://github.com/AdoptOpenJDK/jitwatch

alblue

June 09, 2020
Tweet

More Decks by alblue

Other Decks in Technology

Transcript

  1. @alblue ©2020 Alex Blewitt Loading and defining classes • The

    JVM builds Class instances from .class files • A class loader is responsible for finding (or generating) bytes • Class.forName("an.example") → triggers lookup if not loaded • A ClassLoader can be chained to a 'parent classloader' • Most app servers (e.g. Tomcat, Netty) have one ClassLoader per app App 1 CL App 2 CL App 3 CL App 4 CL Tomcat CL ClassLoaders also load resources!
  2. @alblue ©2020 Alex Blewitt ClassLoaders and Classes • A Class

    is owned by its loading ClassLoader • A Class must be uniquely named in a ClassLoader • Two Class objects with the same name can exist in a JVM • A name is not unique; a Class+ClassLoader pair is unique a.getClass().getName().equals("ClassName") == true; (ClassName)a ! ClassCastException • Could be caused by multiple web apps storing thread local variables
  3. @alblue ©2020 Alex Blewitt Loading a class • The mechanisms

    to bring a class into existence vary • URLClassLoader can load classes from a URL • AppletClassLoader (was) used to load applets into a browser • ASM, ByteBuddy, Mockito etc. generating classes on the fly • In essence, custom class loaders boil down to: 1. Load or generate bytes from somewhere 2. Call defineClass()
  4. @alblue ©2020 Alex Blewitt Dynamic class creation • Classes can

    be created at runtime, from Java 1.3 onwards: final Runnable r = (Runnable) Proxy.newProxyInstance(getClass().getClassLoader(), new Class<?>[] { Runnable.classv}, (InvocationHandler) (instance, method, args) -> { System.out.println("Hello World!"); return null; } ); • Easier to use a lambda for Java 8 and above: Runnable r = () -> { System.out.println("Hello World!"); };
  5. @alblue ©2020 Alex Blewitt Dynamic class creation Tool Provider System

    JavaC File Manager Source File Class File Class Loader
  6. @alblue ©2020 Alex Blewitt Dynamic class creation • Can also

    compile and load classes programmatically:
 var javac = javax.tools.ToolProvider.getSystemJavaCompiler(); var fileMgr = javac.getStandardFileManager(null, null, null); var srcs = fileMgr.getJavaFileObjects("/tmp/Test.java"); javac.getTask(null, fileMgr, null, null, null, srcs).call(); var cl = new ClassLoader() { public Class load(final byte[] bytes) { return defineClass(bytes, 0, bytes.length); } }; final var bytes = Files.readAllBytes(Path.of("/tmp/Test.class")); ((Runnable) (cl.load(bytes).newInstance())).run(); public class Test implements Runnable { public void run() { System.out.println("Hello World!"); } } https://github.com/alblue/jvmulator/blob/master/src/main/java/com/bandlem/jvm/jvmulator/compiler/
  7. @alblue ©2020 Alex Blewitt Class file format Magic 0xcafebabe Minor

    0 Major 55 Constant Pool count Flags public This Super Fields count Methods count Class Attributes count UTF8 Count 5 E x a m p e l Int 0x48656c6f Float 0x7f800000 Long 0x416c2042_6c756521 Doubl 0xfff00000_00000000 Class UTF8 1 Field Class 2 NaT 3 Method Class 2 NaT 4 NaT Name 1 Type 6 String UTF8 1 IMethod Class 2 NaT 4 UTF8 Count 2 [ Z 1⃣ 1⃣ 8⃣ 7⃣ 3⃣ 4⃣ 6⃣ 5⃣ Name 1 Type 6 Flags public Attributes count Attribute Data Length Name 1 Attributes allow for extensions The file format hasn't changed for decades Few constant types have been added
  8. @alblue ©2020 Alex Blewitt Attributes • Class files can be

    adorned with many attributes • Extensible, stringly-typed array of bytes • Code → contains bytecode for a method's execution • Exceptions → list of exceptions that can be thrown form a method • Runtime(In)visibleAnnotations → set of key/value pairs • NestHost/NestMembers → new support for nest mates in Java 11 (JEP 181) • Attributes are optional; e.g. native/abstract methods have no "Code" attribute
  9. @alblue ©2020 Alex Blewitt Special methods • Classes can have

    'special' or synthetic methods • <clinit> – method that runs when the class is accessed at first time • <init> – constructor special name • Accessor methods generated for inner classes • Have ACC_SYNTHETIC set, so they don't show up in tools
  10. @alblue ©2020 Alex Blewitt Displaying bytecode • Java has a

    built-in disassembler for Java code javap -c [-p[rivate]] [-v[erbose]] [-cp classpath] com.Example[.class] Disassemble byte code Show all private members (c.f. proteted, package, public Display constant pool and attributes Classpath (or — module-path) of class name Class name (optional .class extension)
  11. @alblue ©2020 Alex Blewitt JavaP – displaying bytecode $ javap

    -v -c java.lang.Object public class java.lang.Object minor version: 0 major version: 55 flags: (0x0021) ACC_PUBLIC, ACC_SUPER this_class: #17 // java/lang/Object super_class: #0 interfaces: 0, fields: 0, methods: 14, attributes: 1 Constant pool: #1 = Class #63 // java/lang/StringBuilder #2 = Methodref #1.#64 // java/lang/StringBuilder."<init>":()V #3 = Methodref #17.#65 // java/lang/Object.getClass:()Ljava/lang/Class; #4 = Methodref #66.#67 // java/lang/Class.getName:()Ljava/lang/String; ... #6 = String #69 // @ ... #17 = Class #80 // java/lang/Object
 ...
 #34 = Utf8 equals #35 = Utf8 (Ljava/lang/Object;)Z ... #80 = Utf8 java/lang/Object All other classes will have a super_class which is not 0 Compiled against Java 11 Constant used in default 'toString' method Used to define equals(Object) method
  12. @alblue ©2020 Alex Blewitt JavaP – displaying bytecode $ javap

    -v -c java.lang.Object public class java.lang.Object public boolean equals(java.lang.Object); descriptor: (Ljava/lang/Object;)Z flags: (0x0001) ACC_PUBLIC Code: stack=2, locals=2, args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn LineNumberTable: line 158: 0 LocalVariableTable: Start Length Slot Name Signature 0 11 0 this Ljava/lang/Object; 0 11 1 obj Ljava/lang/Object; SourceFile: Object.java Code attribute LineNumberTable (nested) attribute LocalVariableTable (nested) attribute Used to define equals(Object) method Slot 0 usually contains 'this' Slot 1 contains the first argument obj Line number 158 of Object.java Source attribute
  13. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn Locals Stack 0 1 ➡ this other
  14. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn Locals Stack 0 this this 1 other ➡
  15. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn Locals Stack 0 this this 1 other other ➡
  16. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn Locals Stack 0 this 1 other ➡
  17. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn Locals Stack 0 this 1 other ➡ 0
  18. @alblue ©2020 Alex Blewitt Bytecode for equals() Code: stack=2, locals=2,

    args_size=2 0: aload_0 1: aload_1 2: if_acmpne 9 5: iconst_1 6: goto 10 9: iconst_0 10: ireturn ➡ 0
  19. @alblue ©2020 Alex Blewitt Bytecode • Most bytecodes are encoded

    as a single byte (hence the name) • Some bytecodes take additional operands, but most operate on the stack • Bytecodes can: • Consume values from the stack • Push a value onto the stack • Transfer from the stack to a local variable (and vice versa) • Load constants from the class' constant pool
  20. @alblue ©2020 Alex Blewitt Reference and object bytecodes • new

    <type> – push a new instance of the class from the constant pool • newarray <type> – push a new array with a primitive type Z B C S I L • anewarray <type> – push a new array of reference types • multianewarray <type> <dim> – push a multi-dimensional array • arraylength – push the length of the array • checkcast <type> – throw if top of stack is not of the specified type • instanceof <type> – push true if top of stack is of specified type Array of booleans is [Z Array of array of char is [[C
  21. @alblue ©2020 Alex Blewitt Calling methods • invokestatic <method> –

    call a static method (constant contains class) • invokevirtual <method> – call instance methods of ToS (with inheritance) • invokespecial <method> – call super constructor/methods of ToS • invokeinterface <method> – call an interface method on ToS • invokedynamic <method> – invokes a dynamic method (since Java 1.7) → Used for implementing Lambda operations
  22. @alblue ©2020 Alex Blewitt Mathematics • {i,l,f,d}neg – negates the

    top of stack • {i,l,f,d}add/sub – adds/subtracts two numbers together • {i,l,f,d}mul/div – multiplies/divides one number from the other • {i,l,f,d}rem – remainder when divided by (modulus) • {i,l}and/or/xor – performs bitwise and/or/xor on two numbers • {i,l}shl/shr/ushr – arithmetic shift left/right or unsigned (bitwise) shift right Consumes top two stack items, pushes result onto stack Consumes and pushes single element on stack
  23. @alblue ©2020 Alex Blewitt Constants • {i,l,f,d}const_{0,1} – push 0

    or 1 onto the stack as integer/long/float/double • iconst_{2,3,4,5,m1} – push 2,3,4,5 or -1 onto the stack as an integer • {b,s}ipush <byte/short> – push the next byte/short onto the stack • ldc{,_w,2_w} <constant> – push a constant from the pool onto the stack • aconst_null – push 'null' on to the stack
  24. @alblue ©2020 Alex Blewitt Conversions int short char byte long

    float double d2f f2d f2l d2i l2f f2i i2f l2d d2l i2d i2l l2i i2b i2s i2c 6⃣4⃣ 3⃣2⃣ 8⃣ 1⃣6⃣ 1⃣6⃣ 3⃣2⃣ 6⃣4⃣ boolean
  25. @alblue ©2020 Alex Blewitt Loading and storing • {b,s,c,i,f,l,d,a}aload/astore –

    load/store element into array at index • {i,l,f,d,a}load/store{<local>,_0,_1,_2,_3} – load/store from variable at index • iinc <local> <amount> – increment local variable by constant byte • getfield/putfield <field> – get/put a field in an instance on ToS • getstatic/putstatic <field> – get/put a static field in a class
  26. @alblue ©2020 Alex Blewitt Comparisons • {f,d}cmpg – compare two

    floats/doubles, pushes 1 on NaN • {f,d}cmpl – compares two floads/dobules, pushes -1 on NaN • lcmp – compares two longs, pushes 1 or -1 • if{eq,ne,gt,ge,lt,le} <±jump> – branch if =, ≠, >, ≥, <, ≤ 0 • if_icmp{eq,ne,gt,ge,lt,le} <±jump> – branch if =, ≠, >, ≥, <, ≤ other number • if_acmp{eq,ne} <±jump> – branch if references are equal or not equal • if{,non}null <±jump> – branch if (non) null IEEE754 floating point spec uses 'Not a Number' to represent conditions such as divide-by-zero or sqrt(-1)
  27. @alblue ©2020 Alex Blewitt Control flow • {lookup,table}switch <table…> –

    continue execution from table (switch) • {,i,l,f,d,a}return – return a void/int/long/float/double/reference • goto{_w} <±jump> – jump to another bytecode (do not push address) • athrow – throw the (Throwable) reference on top of the stack • jsr{_w} <±jump> – jump to another part of the method (push address) • ret <local> – return (from a jsr) to an address specified in local var
  28. @alblue ©2020 Alex Blewitt Stack manipulation • swap – swap

    the top two int/float values on the stack • pop{,2} – pop (drop) one or two slots from the stack • dup – duplicate the top int/float on the stack • dup_x{1,2} – duplicate the top int/float on the stack, put it 1 or 2 below • dup2 – duplicate the top long/double on the stack • dup2_x{1,2} – duplicate the top long/double on the stack, put it 1 or 2 below
  29. @alblue ©2020 Alex Blewitt Miscellaneous • nop – no operation

    • monitor{enter,exit} – synchronized blocks • breakpoint – breakpoint for debuggers • impdep{1,2} – implementation dependent operations for debuggers • wide – treat the next bytecode as having wider argument • iinc <byte> <byte> → wide iinc <short> <short> • *load/*store/ret <byte> → wide *load/*store/ret <short>
  30. @alblue ©2020 Alex Blewitt Summary • Java class files define

    a class, along with methods and fields • ClassLoader instances loads a class from somewhere (disk, url, …) as a Class • Methods' implementation are bytecodes stored in Code attributes • Bytecode operates on a stack, with a number of 'local' variables • The stack and locals operate on int/long/float/double/reference types • Conversions between data types are handled with opcodes • Some opcodes take operands but the majority do not