Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

We describe how and why we used the Neo4J graph database to analyse runtime behaviour of a proprietary embedded JVM deployed in one of the most widely used consumer products in the UK. We will demonstrate (via live coding) some queries that uncovered surprising aspects of our code, the runtime platform and Oracle’s Java compiler. We hope this talk will convince that graph databases can be easily used to build powerful software development and analysis tools for specific applications or niche platforms and languages.

Nat Pryce

April 10, 2014
Tweet

More Decks by Nat Pryce

Other Decks in Programming

Transcript

  1. Image CC BY 2.0 Alex Indigo 2008
    Looking for Smoking Guns in a Haystack
    JVM Heap Analysis with Neo4J
    Nat Pryce
    James Richardson

    View Slide

  2. Sky PVR

    View Slide

  3. PVR Platform Stack
    Electronic Programme Guide
    Java Platform Adaptors
    Middleware
    Linux
    Clean-Room JVM
    MIPS or ARM
    Java
    C

    View Slide

  4. Platform Limitations
    Top-end PVR has:
    ● 400 MHz MIPS CPU
    ● 15 MB Java Heap
    ● Limited flash memory for native code,
    Java bytecode & resources
    Older boxes are slower and smaller.

    View Slide

  5. JVM for Embedded Systems

    View Slide

  6. Embedded in a C Process
    JVM provided as a C library
    JVM runtime classes defined by the
    library, so cannot be inspected in the IDE

    View Slide

  7. No JIT
    To save memory, we don’t use the
    vendor’s JIT
    Every bytecode counts, performance-wise
    E.g. invokevirtual faster than if statements

    View Slide

  8. Slow Garbage Collection
    Especially full GC and compaction
    Have to control GC to ensure responsive
    UI

    View Slide

  9. No Escape Analysis
    Common Java style creates lots of
    objects on the heap
    ● For-each loop creates iterators
    ● Auto-boxing

    View Slide

  10. Limited Tooling
    No JVMTI
    No JMX and diagnostic MBeans
    Remote debugging requires valuable
    memory
    Proprietary heap-dump format
    No heap-dump analysis tools

    View Slide

  11. Problem: heap fragmentation

    View Slide

  12. A Fragmented Heap

    View Slide

  13. Heaps are graphs!
    Image CC BY-SA 2.0 Kreg Steppe 2008

    View Slide

  14. Relational Database

    View Slide

  15. Graph Database

    View Slide

  16. Import Process
    Python
    script
    Heap
    Dump
    Cypher
    neo4j-shell
    neo4j
    server

    View Slide

  17. Much Too Slow
    Image CC BY 2.0 Peter Megyeri 2007

    View Slide

  18. Import Process
    Java Program
    Heap
    Dump
    BatchInserter

    View Slide

  19. Very Fast!
    Image CC BY 2.0 Flickr user poorboy1225 2011

    View Slide

  20. Q: How to document the schema?
    Image CC BY 2.0 William Warby, 2010

    View Slide

  21. Explore the Fine Graph?

    View Slide

  22. Maths?

    View Slide

  23. UML?

    View Slide

  24. A: Our Own Graphical Notation

    View Slide

  25. Example Queries
    and some surprising discoveries

    View Slide

  26. Which classes use large Strings?
    MATCH (t:Type) -- (o) -[f:FIELD]-> (s:String)
    RETURN
    t.name as owner_type,
    f.name as owner_field,
    s.length as length,
    s.value as value
    ORDER BY length DESCENDING

    View Slide

  27. How many descriptions? How big?
    MATCH (s:String) <-[f:FIELD]- ()
    WHERE f.name =~ '.*\\.description'
    WITH DISTINCT s AS d
    RETURN
    COUNT(d) AS count,
    SUM(d.length)*2 AS total_bytes

    View Slide

  28. What about arrays?
    MATCH
    (o) -[f:FIELD]-> (a:Array),
    (o) -- (ot:Type),
    (a) --> (at:Type)
    RETURN
    ot.name as owner_type,
    f.name as owner_field,
    a.length as array_length,
    at.name as array_type
    ORDER BY
    array_length DESCENDING,
    owner_type, owner_field ASCENDING

    View Slide

  29. A Memory Hogging JSON Parser
    package org.json.simple.parser;
    class Yylex {
    private static final int ZZ_BUFFERSIZE = 16384;
    private static final int[] ZZ_LEXSTATE;
    private static final java.lang.String ZZ_CMAP_PACKED = "\t\u0000\u0001\u0007...
    private static final char[] ZZ_CMAP;
    private static final int[] ZZ_ACTION;
    private static final java.lang.String ZZ_ACTION_PACKED_0 = "\u0002\u0000...
    private static final int[] ZZ_ROWMAP;
    private static final java.lang.String ZZ_ROWMAP_PACKED_0 = "\u0000\u0000...
    private static final int[] ZZ_TRANS;
    private static final java.lang.String[] ZZ_ERROR_MSG;
    private static final int[] ZZ_ATTRIBUTE;
    private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0 = "\u0002\u0000...;
    ...

    View Slide

  30. What about the huge BitSets?
    MATCH
    (o:Object) -[f:FIELD]-> (bs:Object),
    (bs) -[:TYPE]-> (bst:Type),
    (o) -[:TYPE]-> (ot:Type),
    (bs) -[:FIELD]-> (bits:Array)
    WHERE bst.name = 'java/util/BitSet'
    RETURN
    ot.name as owner_type,
    f.name as owner_field,
    bits.length as bitset_size
    ORDER BY bitset_size DESCENDING

    View Slide

  31. Bloated Bitsets
    enum EPGInfoBit {
    MonauralSoundType,
    SimpleStereoSoundType,
    SurroundSoundSoundType,
    DigitalSurroundSsoundType
    ...
    public final int asInt;
    private EPGInfoBit() {
    asInt = 1 << ordinal();
    }
    }
    BitSet epgInfoBitsMask = new
    BitSet();
    BitSet epgInfoBits = new BitSet();
    void addBit(EPGInfoBit bit,
    boolean set)
    {
    epgInfoBitsMask.set(bit.asInt);
    if (set) {
    epgInfoBits.set(bit.asInt);
    }
    }

    View Slide

  32. What are those SwitchMap arrays?
    MATCH
    (o) -[f:FIELD]-> (a:Array),
    (o) -- (ot:Type),
    (a) --> (at:Type)
    WHERE
    f.name =~ '.*SwitchMap.*'
    RETURN
    a.length as array_length,
    ot.name as owner_type,
    f.name as owner_field
    ORDER BY
    array_length DESCENDING

    View Slide

  33. Java Language Specification, SE7
    Chapter 13. Binary Compatibility

    13.4.26. Evolution of Enums
    Adding or reordering constants in an enum type
    will not break compatibility with pre-existing
    binaries.
    Bytecode cannot switch directly on enum
    ordinals.

    View Slide

  34. Switching on Enums is Expensive
    Javac generates an additional synthetic class for
    every class that switches on an enum.
    The synthetic class contains a "switch map" int array
    ● Maps enum ordinals to jump table indices.
    ● Length is the number of enum elements
    ● Initialised when the class is loaded
    But we don't change enums and client code
    independently

    View Slide

  35. Solution
    Sponsor development of enum optimisations in
    Proguard

    View Slide

  36. Take-Away Lessons

    View Slide

  37. Take-Away Lessons
    Line-oriented
    data format
    Parse &
    Insert
    Graph
    Database
    (Niche)
    Development
    Environment
    Explore,
    Visualise,
    Analyse,
    Transform

    View Slide

  38. sky.com/jobs

    View Slide

  39. Nat Pryce
    natpryce.com
    [email protected]
    @natpryce
    Questions?
    James Richardson
    time4tea.net
    [email protected]
    @richajam
    sky.com/jobs

    View Slide