Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

We describe how and why we used the Neo4J graph database to analyse runtime behaviour of a proprietary embedded JVM deployed in one of the most widely used consumer products in the UK. We will demonstrate (via live coding) some queries that uncovered surprising aspects of our code, the runtime platform and Oracle’s Java compiler. We hope this talk will convince that graph databases can be easily used to build powerful software development and analysis tools for specific applications or niche platforms and languages.

Nat Pryce

April 10, 2014
Tweet

More Decks by Nat Pryce

Other Decks in Programming

Transcript

  1. Image CC BY 2.0 Alex Indigo 2008
    Looking for Smoking Guns in a Haystack
    JVM Heap Analysis with Neo4J
    Nat Pryce
    James Richardson

    View full-size slide

  2. PVR Platform Stack
    Electronic Programme Guide
    Java Platform Adaptors
    Middleware
    Linux
    Clean-Room JVM
    MIPS or ARM
    Java
    C

    View full-size slide

  3. Platform Limitations
    Top-end PVR has:
    ● 400 MHz MIPS CPU
    ● 15 MB Java Heap
    ● Limited flash memory for native code,
    Java bytecode & resources
    Older boxes are slower and smaller.

    View full-size slide

  4. JVM for Embedded Systems

    View full-size slide

  5. Embedded in a C Process
    JVM provided as a C library
    JVM runtime classes defined by the
    library, so cannot be inspected in the IDE

    View full-size slide

  6. No JIT
    To save memory, we don’t use the
    vendor’s JIT
    Every bytecode counts, performance-wise
    E.g. invokevirtual faster than if statements

    View full-size slide

  7. Slow Garbage Collection
    Especially full GC and compaction
    Have to control GC to ensure responsive
    UI

    View full-size slide

  8. No Escape Analysis
    Common Java style creates lots of
    objects on the heap
    ● For-each loop creates iterators
    ● Auto-boxing

    View full-size slide

  9. Limited Tooling
    No JVMTI
    No JMX and diagnostic MBeans
    Remote debugging requires valuable
    memory
    Proprietary heap-dump format
    No heap-dump analysis tools

    View full-size slide

  10. Problem: heap fragmentation

    View full-size slide

  11. A Fragmented Heap

    View full-size slide

  12. Heaps are graphs!
    Image CC BY-SA 2.0 Kreg Steppe 2008

    View full-size slide

  13. Relational Database

    View full-size slide

  14. Graph Database

    View full-size slide

  15. Import Process
    Python
    script
    Heap
    Dump
    Cypher
    neo4j-shell
    neo4j
    server

    View full-size slide

  16. Much Too Slow
    Image CC BY 2.0 Peter Megyeri 2007

    View full-size slide

  17. Import Process
    Java Program
    Heap
    Dump
    BatchInserter

    View full-size slide

  18. Very Fast!
    Image CC BY 2.0 Flickr user poorboy1225 2011

    View full-size slide

  19. Q: How to document the schema?
    Image CC BY 2.0 William Warby, 2010

    View full-size slide

  20. Explore the Fine Graph?

    View full-size slide

  21. A: Our Own Graphical Notation

    View full-size slide

  22. Example Queries
    and some surprising discoveries

    View full-size slide

  23. Which classes use large Strings?
    MATCH (t:Type) -- (o) -[f:FIELD]-> (s:String)
    RETURN
    t.name as owner_type,
    f.name as owner_field,
    s.length as length,
    s.value as value
    ORDER BY length DESCENDING

    View full-size slide

  24. How many descriptions? How big?
    MATCH (s:String) <-[f:FIELD]- ()
    WHERE f.name =~ '.*\\.description'
    WITH DISTINCT s AS d
    RETURN
    COUNT(d) AS count,
    SUM(d.length)*2 AS total_bytes

    View full-size slide

  25. What about arrays?
    MATCH
    (o) -[f:FIELD]-> (a:Array),
    (o) -- (ot:Type),
    (a) --> (at:Type)
    RETURN
    ot.name as owner_type,
    f.name as owner_field,
    a.length as array_length,
    at.name as array_type
    ORDER BY
    array_length DESCENDING,
    owner_type, owner_field ASCENDING

    View full-size slide

  26. A Memory Hogging JSON Parser
    package org.json.simple.parser;
    class Yylex {
    private static final int ZZ_BUFFERSIZE = 16384;
    private static final int[] ZZ_LEXSTATE;
    private static final java.lang.String ZZ_CMAP_PACKED = "\t\u0000\u0001\u0007...
    private static final char[] ZZ_CMAP;
    private static final int[] ZZ_ACTION;
    private static final java.lang.String ZZ_ACTION_PACKED_0 = "\u0002\u0000...
    private static final int[] ZZ_ROWMAP;
    private static final java.lang.String ZZ_ROWMAP_PACKED_0 = "\u0000\u0000...
    private static final int[] ZZ_TRANS;
    private static final java.lang.String[] ZZ_ERROR_MSG;
    private static final int[] ZZ_ATTRIBUTE;
    private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0 = "\u0002\u0000...;
    ...

    View full-size slide

  27. What about the huge BitSets?
    MATCH
    (o:Object) -[f:FIELD]-> (bs:Object),
    (bs) -[:TYPE]-> (bst:Type),
    (o) -[:TYPE]-> (ot:Type),
    (bs) -[:FIELD]-> (bits:Array)
    WHERE bst.name = 'java/util/BitSet'
    RETURN
    ot.name as owner_type,
    f.name as owner_field,
    bits.length as bitset_size
    ORDER BY bitset_size DESCENDING

    View full-size slide

  28. Bloated Bitsets
    enum EPGInfoBit {
    MonauralSoundType,
    SimpleStereoSoundType,
    SurroundSoundSoundType,
    DigitalSurroundSsoundType
    ...
    public final int asInt;
    private EPGInfoBit() {
    asInt = 1 << ordinal();
    }
    }
    BitSet epgInfoBitsMask = new
    BitSet();
    BitSet epgInfoBits = new BitSet();
    void addBit(EPGInfoBit bit,
    boolean set)
    {
    epgInfoBitsMask.set(bit.asInt);
    if (set) {
    epgInfoBits.set(bit.asInt);
    }
    }

    View full-size slide

  29. What are those SwitchMap arrays?
    MATCH
    (o) -[f:FIELD]-> (a:Array),
    (o) -- (ot:Type),
    (a) --> (at:Type)
    WHERE
    f.name =~ '.*SwitchMap.*'
    RETURN
    a.length as array_length,
    ot.name as owner_type,
    f.name as owner_field
    ORDER BY
    array_length DESCENDING

    View full-size slide

  30. Java Language Specification, SE7
    Chapter 13. Binary Compatibility

    13.4.26. Evolution of Enums
    Adding or reordering constants in an enum type
    will not break compatibility with pre-existing
    binaries.
    Bytecode cannot switch directly on enum
    ordinals.

    View full-size slide

  31. Switching on Enums is Expensive
    Javac generates an additional synthetic class for
    every class that switches on an enum.
    The synthetic class contains a "switch map" int array
    ● Maps enum ordinals to jump table indices.
    ● Length is the number of enum elements
    ● Initialised when the class is loaded
    But we don't change enums and client code
    independently

    View full-size slide

  32. Solution
    Sponsor development of enum optimisations in
    Proguard

    View full-size slide

  33. Take-Away Lessons

    View full-size slide

  34. Take-Away Lessons
    Line-oriented
    data format
    Parse &
    Insert
    Graph
    Database
    (Niche)
    Development
    Environment
    Explore,
    Visualise,
    Analyse,
    Transform

    View full-size slide

  35. sky.com/jobs

    View full-size slide

  36. Nat Pryce
    natpryce.com
    [email protected]
    @natpryce
    Questions?
    James Richardson
    time4tea.net
    [email protected]
    @richajam
    sky.com/jobs

    View full-size slide