Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

We describe how and why we used the Neo4J graph database to analyse runtime behaviour of a proprietary embedded JVM deployed in one of the most widely used consumer products in the UK. We will demonstrate (via live coding) some queries that uncovered surprising aspects of our code, the runtime platform and Oracle’s Java compiler. We hope this talk will convince that graph databases can be easily used to build powerful software development and analysis tools for specific applications or niche platforms and languages.


Nat Pryce

April 10, 2014


  1. 1.

    Image CC BY 2.0 Alex Indigo 2008 Looking for Smoking

    Guns in a Haystack JVM Heap Analysis with Neo4J Nat Pryce James Richardson
  2. 2.
  3. 4.

    Platform Limitations Top-end PVR has: • 400 MHz MIPS CPU

    • 15 MB Java Heap • Limited flash memory for native code, Java bytecode & resources Older boxes are slower and smaller.
  4. 6.

    Embedded in a C Process JVM provided as a C

    library JVM runtime classes defined by the library, so cannot be inspected in the IDE
  5. 7.

    No JIT To save memory, we don’t use the vendor’s

    JIT Every bytecode counts, performance-wise E.g. invokevirtual faster than if statements
  6. 9.

    No Escape Analysis Common Java style creates lots of objects

    on the heap • For-each loop creates iterators • Auto-boxing
  7. 10.

    Limited Tooling No JVMTI No JMX and diagnostic MBeans Remote

    debugging requires valuable memory Proprietary heap-dump format No heap-dump analysis tools
  8. 22.
  9. 23.
  10. 26.

    Which classes use large Strings? MATCH (t:Type) -- (o) -[f:FIELD]->

    (s:String) RETURN t.name as owner_type, f.name as owner_field, s.length as length, s.value as value ORDER BY length DESCENDING
  11. 27.

    How many descriptions? How big? MATCH (s:String) <-[f:FIELD]- () WHERE

    f.name =~ '.*\\.description' WITH DISTINCT s AS d RETURN COUNT(d) AS count, SUM(d.length)*2 AS total_bytes
  12. 28.

    What about arrays? MATCH (o) -[f:FIELD]-> (a:Array), (o) -- (ot:Type),

    (a) --> (at:Type) RETURN ot.name as owner_type, f.name as owner_field, a.length as array_length, at.name as array_type ORDER BY array_length DESCENDING, owner_type, owner_field ASCENDING
  13. 29.

    A Memory Hogging JSON Parser package org.json.simple.parser; class Yylex {

    private static final int ZZ_BUFFERSIZE = 16384; private static final int[] ZZ_LEXSTATE; private static final java.lang.String ZZ_CMAP_PACKED = "\t\u0000\u0001\u0007... private static final char[] ZZ_CMAP; private static final int[] ZZ_ACTION; private static final java.lang.String ZZ_ACTION_PACKED_0 = "\u0002\u0000... private static final int[] ZZ_ROWMAP; private static final java.lang.String ZZ_ROWMAP_PACKED_0 = "\u0000\u0000... private static final int[] ZZ_TRANS; private static final java.lang.String[] ZZ_ERROR_MSG; private static final int[] ZZ_ATTRIBUTE; private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0 = "\u0002\u0000...; ...
  14. 30.

    What about the huge BitSets? MATCH (o:Object) -[f:FIELD]-> (bs:Object), (bs)

    -[:TYPE]-> (bst:Type), (o) -[:TYPE]-> (ot:Type), (bs) -[:FIELD]-> (bits:Array) WHERE bst.name = 'java/util/BitSet' RETURN ot.name as owner_type, f.name as owner_field, bits.length as bitset_size ORDER BY bitset_size DESCENDING
  15. 31.

    Bloated Bitsets enum EPGInfoBit { MonauralSoundType, SimpleStereoSoundType, SurroundSoundSoundType, DigitalSurroundSsoundType ...

    public final int asInt; private EPGInfoBit() { asInt = 1 << ordinal(); } } BitSet epgInfoBitsMask = new BitSet(); BitSet epgInfoBits = new BitSet(); void addBit(EPGInfoBit bit, boolean set) { epgInfoBitsMask.set(bit.asInt); if (set) { epgInfoBits.set(bit.asInt); } }
  16. 32.

    What are those SwitchMap arrays? MATCH (o) -[f:FIELD]-> (a:Array), (o)

    -- (ot:Type), (a) --> (at:Type) WHERE f.name =~ '.*SwitchMap.*' RETURN a.length as array_length, ot.name as owner_type, f.name as owner_field ORDER BY array_length DESCENDING
  17. 33.

    Java Language Specification, SE7 Chapter 13. Binary Compatibility … 13.4.26.

    Evolution of Enums Adding or reordering constants in an enum type will not break compatibility with pre-existing binaries. Bytecode cannot switch directly on enum ordinals.
  18. 34.

    Switching on Enums is Expensive Javac generates an additional synthetic

    class for every class that switches on an enum. The synthetic class contains a "switch map" int array • Maps enum ordinals to jump table indices. • Length is the number of enum elements • Initialised when the class is loaded But we don't change enums and client code independently
  19. 37.

    Take-Away Lessons Line-oriented data format Parse & Insert Graph Database

    (Niche) Development Environment Explore, Visualise, Analyse, Transform