Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bite-sized ByteCode and ClassLoaders

alblue
June 09, 2020

Bite-sized ByteCode and ClassLoaders

This talk looks at how JVM classes are created, how the JVM loads classes with ClassLoaders and the ways in which classes can be manipulated and generated at runtime. It was run as part of the London Java Community virtual meetup series. The presentation was recorded, and is available at YouTube: https://www.youtube.com/watch?v=_ZF1HDTjSSY

The corresponding GitHub repository is at https://github.com/alblue/jvmulator/

If you want a more polished tool for stepping through real bytecode, use Chris Newlands' JITWatch https://github.com/AdoptOpenJDK/jitwatch

alblue

June 09, 2020
Tweet

More Decks by alblue

Other Decks in Technology

Transcript

  1. @alblue
    ©2020 Alex Blewitt
    Bite Sized Bytecode
    Classes and ClassLoaders

    View full-size slide

  2. @alblue
    ©2020 Alex Blewitt
    Overview
    • ClassLoaders

    • Classes

    • Bytecode

    View full-size slide

  3. @alblue
    ©2020 Alex Blewitt
    Class Loaders and Class files

    View full-size slide

  4. @alblue
    ©2020 Alex Blewitt
    Loading and defining classes
    • The JVM builds Class instances from .class files

    • A class loader is responsible for finding (or generating) bytes

    • Class.forName("an.example") → triggers lookup if not loaded

    • A ClassLoader can be chained to a 'parent classloader'

    • Most app servers (e.g. Tomcat, Netty) have one ClassLoader per app
    App 1
    CL
    App 2
    CL
    App 3
    CL
    App 4
    CL
    Tomcat CL
    ClassLoaders also load resources!

    View full-size slide

  5. @alblue
    ©2020 Alex Blewitt
    ClassLoaders and Classes
    • A Class is owned by its loading ClassLoader

    • A Class must be uniquely named in a ClassLoader

    • Two Class objects with the same name can exist in a JVM

    • A name is not unique; a Class+ClassLoader pair is unique

    a.getClass().getName().equals("ClassName") == true;
    (ClassName)a ! ClassCastException
    • Could be caused by multiple web apps storing thread local variables

    View full-size slide

  6. @alblue
    ©2020 Alex Blewitt
    Loading a class
    • The mechanisms to bring a class into existence vary

    • URLClassLoader can load classes from a URL

    • AppletClassLoader (was) used to load applets into a browser

    • ASM, ByteBuddy, Mockito etc. generating classes on the fly

    • In essence, custom class loaders boil down to:

    1. Load or generate bytes from somewhere

    2. Call defineClass()

    View full-size slide

  7. @alblue
    ©2020 Alex Blewitt
    Dynamic class creation
    • Classes can be created at runtime, from Java 1.3 onwards:

    final Runnable r = (Runnable)
    Proxy.newProxyInstance(getClass().getClassLoader(),
    new Class>[] { Runnable.classv},
    (InvocationHandler) (instance, method, args) -> {
    System.out.println("Hello World!");
    return null;
    }
    );
    • Easier to use a lambda for Java 8 and above:

    Runnable r = () -> { System.out.println("Hello World!"); };

    View full-size slide

  8. @alblue
    ©2020 Alex Blewitt
    Dynamic class creation
    Tool
    Provider
    System
    JavaC
    File
    Manager
    Source
    File
    Class File
    Class
    Loader

    View full-size slide

  9. @alblue
    ©2020 Alex Blewitt
    Dynamic class creation
    • Can also compile and load classes programmatically:

    var javac = javax.tools.ToolProvider.getSystemJavaCompiler();
    var fileMgr = javac.getStandardFileManager(null, null, null);
    var srcs = fileMgr.getJavaFileObjects("/tmp/Test.java");
    javac.getTask(null, fileMgr, null, null, null, srcs).call();
    var cl = new ClassLoader() {
    public Class load(final byte[] bytes) {
    return defineClass(bytes, 0, bytes.length);
    }
    };
    final var bytes = Files.readAllBytes(Path.of("/tmp/Test.class"));
    ((Runnable) (cl.load(bytes).newInstance())).run();
    public class Test implements Runnable {
    public void run() {
    System.out.println("Hello World!");
    }
    }
    https://github.com/alblue/jvmulator/blob/master/src/main/java/com/bandlem/jvm/jvmulator/compiler/

    View full-size slide

  10. @alblue
    ©2020 Alex Blewitt
    Class file format
    Magic
    0xcafebabe
    Minor
    0
    Major
    55
    Constant Pool
    count
    Flags
    public
    This Super
    Fields
    count
    Methods
    count
    Class Attributes
    count
    UTF8
    Count
    5
    E x a m p e
    l
    Int
    0x48656c6f
    Float
    0x7f800000
    Long
    0x416c2042_6c756521
    Doubl
    0xfff00000_00000000
    Class
    UTF8
    1
    Field
    Class
    2
    NaT
    3
    Method
    Class
    2
    NaT
    4
    NaT
    Name
    1
    Type
    6
    String
    UTF8
    1
    IMethod
    Class
    2
    NaT
    4
    UTF8
    Count
    2
    [ Z
    1⃣ 1⃣
    8⃣ 7⃣
    3⃣
    4⃣ 6⃣
    5⃣
    Name
    1
    Type
    6
    Flags
    public
    Attributes
    count
    Attribute
    Data
    Length
    Name
    1
    Attributes allow
    for extensions
    The file format hasn't
    changed for decades
    Few constant types
    have been added

    View full-size slide

  11. @alblue
    ©2020 Alex Blewitt
    Attributes
    • Class files can be adorned with many attributes

    • Extensible, stringly-typed array of bytes

    • Code → contains bytecode for a method's execution

    • Exceptions → list of exceptions that can be thrown form a method

    • Runtime(In)visibleAnnotations → set of key/value pairs

    • NestHost/NestMembers → new support for nest mates in Java 11 (JEP 181)

    • Attributes are optional; e.g. native/abstract methods have no "Code" attribute

    View full-size slide

  12. @alblue
    ©2020 Alex Blewitt
    Special methods
    • Classes can have 'special' or synthetic methods

    • – method that runs when the class is accessed at first time

    • – constructor special name

    • Accessor methods generated for inner classes

    • Have ACC_SYNTHETIC set, so they don't show up in tools

    View full-size slide

  13. @alblue
    ©2020 Alex Blewitt
    Displaying bytecode
    • Java has a built-in disassembler for Java code

    javap -c [-p[rivate]] [-v[erbose]] [-cp classpath] com.Example[.class]
    Disassemble
    byte code
    Show all private
    members (c.f.
    proteted, package,
    public
    Display
    constant pool
    and attributes
    Classpath (or —
    module-path) of
    class name
    Class name
    (optional .class
    extension)

    View full-size slide

  14. @alblue
    ©2020 Alex Blewitt
    JavaP – displaying bytecode
    $ javap -v -c java.lang.Object
    public class java.lang.Object
    minor version: 0
    major version: 55
    flags: (0x0021) ACC_PUBLIC, ACC_SUPER
    this_class: #17 // java/lang/Object
    super_class: #0
    interfaces: 0, fields: 0, methods: 14, attributes: 1
    Constant pool:
    #1 = Class #63 // java/lang/StringBuilder
    #2 = Methodref #1.#64 // java/lang/StringBuilder."":()V
    #3 = Methodref #17.#65 // java/lang/Object.getClass:()Ljava/lang/Class;
    #4 = Methodref #66.#67 // java/lang/Class.getName:()Ljava/lang/String;
    ...
    #6 = String #69 // @
    ...
    #17 = Class #80 // java/lang/Object

    ...

    #34 = Utf8 equals
    #35 = Utf8 (Ljava/lang/Object;)Z
    ...
    #80 = Utf8 java/lang/Object
    All other classes will have a
    super_class which is not 0
    Compiled against Java 11
    Constant used in default 'toString' method
    Used to define equals(Object) method

    View full-size slide

  15. @alblue
    ©2020 Alex Blewitt
    JavaP – displaying bytecode
    $ javap -v -c java.lang.Object
    public class java.lang.Object
    public boolean equals(java.lang.Object);
    descriptor: (Ljava/lang/Object;)Z
    flags: (0x0001) ACC_PUBLIC
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    LineNumberTable:
    line 158: 0
    LocalVariableTable:
    Start Length Slot Name Signature
    0 11 0 this Ljava/lang/Object;
    0 11 1 obj Ljava/lang/Object;
    SourceFile: Object.java
    Code attribute
    LineNumberTable (nested) attribute
    LocalVariableTable (nested) attribute
    Used to define equals(Object) method
    Slot 0 usually contains 'this'
    Slot 1 contains the first argument obj
    Line number 158
    of Object.java
    Source attribute

    View full-size slide

  16. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    Locals Stack
    0
    1

    this
    other

    View full-size slide

  17. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    Locals Stack
    0 this this
    1 other

    View full-size slide

  18. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    Locals Stack
    0 this this
    1 other other

    View full-size slide

  19. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    Locals Stack
    0 this
    1 other

    View full-size slide

  20. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn
    Locals Stack
    0 this
    1 other

    0

    View full-size slide

  21. @alblue
    ©2020 Alex Blewitt
    Bytecode for equals()
    Code:
    stack=2, locals=2, args_size=2
    0: aload_0
    1: aload_1
    2: if_acmpne 9
    5: iconst_1
    6: goto 10
    9: iconst_0
    10: ireturn

    0

    View full-size slide

  22. @alblue
    ©2020 Alex Blewitt
    Bytecode

    View full-size slide

  23. @alblue
    ©2020 Alex Blewitt
    Bytecode
    • Most bytecodes are encoded as a single byte (hence the name)

    • Some bytecodes take additional operands, but most operate on the stack

    • Bytecodes can:

    • Consume values from the stack

    • Push a value onto the stack

    • Transfer from the stack to a local variable (and vice versa)

    • Load constants from the class' constant pool

    View full-size slide

  24. @alblue
    ©2020 Alex Blewitt
    Reference and object bytecodes
    • new – push a new instance of the class from the constant pool

    • newarray – push a new array with a primitive type Z B C S I L

    • anewarray – push a new array of reference types

    • multianewarray – push a multi-dimensional array

    • arraylength – push the length of the array

    • checkcast – throw if top of stack is not of the specified type

    • instanceof – push true if top of stack is of specified type
    Array of booleans is [Z
    Array of array of char is [[C

    View full-size slide

  25. @alblue
    ©2020 Alex Blewitt
    Calling methods
    • invokestatic – call a static method (constant contains class)

    • invokevirtual – call instance methods of ToS (with inheritance)

    • invokespecial – call super constructor/methods of ToS

    • invokeinterface – call an interface method on ToS

    • invokedynamic – invokes a dynamic method (since Java 1.7)

    → Used for implementing Lambda operations

    View full-size slide

  26. @alblue
    ©2020 Alex Blewitt
    Mathematics
    • {i,l,f,d}neg – negates the top of stack

    • {i,l,f,d}add/sub – adds/subtracts two numbers together

    • {i,l,f,d}mul/div – multiplies/divides one number from the other

    • {i,l,f,d}rem – remainder when divided by (modulus)

    • {i,l}and/or/xor – performs bitwise and/or/xor on two numbers

    • {i,l}shl/shr/ushr – arithmetic shift left/right or unsigned (bitwise) shift right
    Consumes top two
    stack items, pushes
    result onto stack
    Consumes and pushes
    single element on stack

    View full-size slide

  27. @alblue
    ©2020 Alex Blewitt
    Constants
    • {i,l,f,d}const_{0,1} – push 0 or 1 onto the stack as integer/long/float/double

    • iconst_{2,3,4,5,m1} – push 2,3,4,5 or -1 onto the stack as an integer

    • {b,s}ipush – push the next byte/short onto the stack

    • ldc{,_w,2_w} – push a constant from the pool onto the stack

    • aconst_null – push 'null' on to the stack

    View full-size slide

  28. @alblue
    ©2020 Alex Blewitt
    Conversions
    int
    short
    char
    byte
    long
    float double
    d2f
    f2d
    f2l d2i
    l2f
    f2i
    i2f l2d
    d2l
    i2d
    i2l l2i
    i2b
    i2s
    i2c
    6⃣4⃣
    3⃣2⃣
    8⃣
    1⃣6⃣
    1⃣6⃣
    3⃣2⃣ 6⃣4⃣
    boolean

    View full-size slide

  29. @alblue
    ©2020 Alex Blewitt
    Loading and storing
    • {b,s,c,i,f,l,d,a}aload/astore – load/store element into array at index

    • {i,l,f,d,a}load/store{,_0,_1,_2,_3} – load/store from variable at index

    • iinc – increment local variable by constant byte

    • getfield/putfield <field> – get/put a field in an instance on ToS

    • getstatic/putstatic <field> – get/put a static field in a class

    View full-size slide

  30. @alblue
    ©2020 Alex Blewitt
    Comparisons
    • {f,d}cmpg – compare two floats/doubles, pushes 1 on NaN

    • {f,d}cmpl – compares two floads/dobules, pushes -1 on NaN

    • lcmp – compares two longs, pushes 1 or -1

    • if{eq,ne,gt,ge,lt,le} <±jump> – branch if =, ≠, >, ≥, <, ≤ 0

    • if_icmp{eq,ne,gt,ge,lt,le} <±jump> – branch if =, ≠, >, ≥, <, ≤ other number

    • if_acmp{eq,ne} <±jump> – branch if references are equal or not equal

    • if{,non}null <±jump> – branch if (non) null
    IEEE754 floating point spec
    uses 'Not a Number' to
    represent conditions such as
    divide-by-zero or sqrt(-1)

    View full-size slide

  31. @alblue
    ©2020 Alex Blewitt
    Control flow
    • {lookup,table}switch – continue execution from table (switch)

    • {,i,l,f,d,a}return – return a void/int/long/float/double/reference

    • goto{_w} <±jump> – jump to another bytecode (do not push address)

    • athrow – throw the (Throwable) reference on top of the stack

    • jsr{_w} <±jump> – jump to another part of the method (push address)

    • ret – return (from a jsr) to an address specified in local var

    View full-size slide

  32. @alblue
    ©2020 Alex Blewitt
    Stack manipulation
    • swap – swap the top two int/float values on the stack

    • pop{,2} – pop (drop) one or two slots from the stack

    • dup – duplicate the top int/float on the stack

    • dup_x{1,2} – duplicate the top int/float on the stack, put it 1 or 2 below

    • dup2 – duplicate the top long/double on the stack

    • dup2_x{1,2} – duplicate the top long/double on the stack, put it 1 or 2 below

    View full-size slide

  33. @alblue
    ©2020 Alex Blewitt
    Miscellaneous
    • nop – no operation

    • monitor{enter,exit} – synchronized blocks

    • breakpoint – breakpoint for debuggers

    • impdep{1,2} – implementation dependent operations for debuggers

    • wide – treat the next bytecode as having wider argument

    • iinc → wide iinc

    • *load/*store/ret → wide *load/*store/ret

    View full-size slide

  34. @alblue
    ©2020 Alex Blewitt
    Demo

    View full-size slide

  35. @alblue
    ©2020 Alex Blewitt
    JVMulator
    https://github.com/alblue/jvmulator

    View full-size slide

  36. @alblue
    ©2020 Alex Blewitt
    JVMulator
    https://github.com/alblue/jvmulator

    View full-size slide

  37. @alblue
    ©2020 Alex Blewitt
    Summary
    • Java class files define a class, along with methods and fields

    • ClassLoader instances loads a class from somewhere (disk, url, …) as a Class

    • Methods' implementation are bytecodes stored in Code attributes

    • Bytecode operates on a stack, with a number of 'local' variables

    • The stack and locals operate on int/long/float/double/reference types

    • Conversions between data types are handled with opcodes

    • Some opcodes take operands but the majority do not

    View full-size slide

  38. @alblue
    ©2020 Alex Blewitt
    Thank you
    https://alblue.bandlem.com

    https://twitter.com/alblue

    https://github.com/alblue

    https://vimeo.com/alblue

    https://speakerdeck.com/alblue

    View full-size slide