Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

Looking for Smoking Guns in a Haystack: JVM Heap Analysis with Neo4J

We describe how and why we used the Neo4J graph database to analyse runtime behaviour of a proprietary embedded JVM deployed in one of the most widely used consumer products in the UK. We will demonstrate (via live coding) some queries that uncovered surprising aspects of our code, the runtime platform and Oracle’s Java compiler. We hope this talk will convince that graph databases can be easily used to build powerful software development and analysis tools for specific applications or niche platforms and languages.

5358df52bd2ef4f57da1b1cc8634cfd9?s=128

Nat Pryce

April 10, 2014
Tweet

Transcript

  1. Image CC BY 2.0 Alex Indigo 2008 Looking for Smoking

    Guns in a Haystack JVM Heap Analysis with Neo4J Nat Pryce James Richardson
  2. Sky PVR

  3. PVR Platform Stack Electronic Programme Guide Java Platform Adaptors Middleware

    Linux Clean-Room JVM MIPS or ARM Java C
  4. Platform Limitations Top-end PVR has: • 400 MHz MIPS CPU

    • 15 MB Java Heap • Limited flash memory for native code, Java bytecode & resources Older boxes are slower and smaller.
  5. JVM for Embedded Systems

  6. Embedded in a C Process JVM provided as a C

    library JVM runtime classes defined by the library, so cannot be inspected in the IDE
  7. No JIT To save memory, we don’t use the vendor’s

    JIT Every bytecode counts, performance-wise E.g. invokevirtual faster than if statements
  8. Slow Garbage Collection Especially full GC and compaction Have to

    control GC to ensure responsive UI
  9. No Escape Analysis Common Java style creates lots of objects

    on the heap • For-each loop creates iterators • Auto-boxing
  10. Limited Tooling No JVMTI No JMX and diagnostic MBeans Remote

    debugging requires valuable memory Proprietary heap-dump format No heap-dump analysis tools
  11. Problem: heap fragmentation

  12. A Fragmented Heap

  13. Heaps are graphs! Image CC BY-SA 2.0 Kreg Steppe 2008

  14. Relational Database

  15. Graph Database

  16. Import Process Python script Heap Dump Cypher neo4j-shell neo4j server

  17. Much Too Slow Image CC BY 2.0 Peter Megyeri 2007

  18. Import Process Java Program Heap Dump BatchInserter

  19. Very Fast! Image CC BY 2.0 Flickr user poorboy1225 2011

  20. Q: How to document the schema? Image CC BY 2.0

    William Warby, 2010
  21. Explore the Fine Graph?

  22. Maths?

  23. UML?

  24. A: Our Own Graphical Notation

  25. Example Queries and some surprising discoveries

  26. Which classes use large Strings? MATCH (t:Type) -- (o) -[f:FIELD]->

    (s:String) RETURN t.name as owner_type, f.name as owner_field, s.length as length, s.value as value ORDER BY length DESCENDING
  27. How many descriptions? How big? MATCH (s:String) <-[f:FIELD]- () WHERE

    f.name =~ '.*\\.description' WITH DISTINCT s AS d RETURN COUNT(d) AS count, SUM(d.length)*2 AS total_bytes
  28. What about arrays? MATCH (o) -[f:FIELD]-> (a:Array), (o) -- (ot:Type),

    (a) --> (at:Type) RETURN ot.name as owner_type, f.name as owner_field, a.length as array_length, at.name as array_type ORDER BY array_length DESCENDING, owner_type, owner_field ASCENDING
  29. A Memory Hogging JSON Parser package org.json.simple.parser; class Yylex {

    private static final int ZZ_BUFFERSIZE = 16384; private static final int[] ZZ_LEXSTATE; private static final java.lang.String ZZ_CMAP_PACKED = "\t\u0000\u0001\u0007... private static final char[] ZZ_CMAP; private static final int[] ZZ_ACTION; private static final java.lang.String ZZ_ACTION_PACKED_0 = "\u0002\u0000... private static final int[] ZZ_ROWMAP; private static final java.lang.String ZZ_ROWMAP_PACKED_0 = "\u0000\u0000... private static final int[] ZZ_TRANS; private static final java.lang.String[] ZZ_ERROR_MSG; private static final int[] ZZ_ATTRIBUTE; private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0 = "\u0002\u0000...; ...
  30. What about the huge BitSets? MATCH (o:Object) -[f:FIELD]-> (bs:Object), (bs)

    -[:TYPE]-> (bst:Type), (o) -[:TYPE]-> (ot:Type), (bs) -[:FIELD]-> (bits:Array) WHERE bst.name = 'java/util/BitSet' RETURN ot.name as owner_type, f.name as owner_field, bits.length as bitset_size ORDER BY bitset_size DESCENDING
  31. Bloated Bitsets enum EPGInfoBit { MonauralSoundType, SimpleStereoSoundType, SurroundSoundSoundType, DigitalSurroundSsoundType ...

    public final int asInt; private EPGInfoBit() { asInt = 1 << ordinal(); } } BitSet epgInfoBitsMask = new BitSet(); BitSet epgInfoBits = new BitSet(); void addBit(EPGInfoBit bit, boolean set) { epgInfoBitsMask.set(bit.asInt); if (set) { epgInfoBits.set(bit.asInt); } }
  32. What are those SwitchMap arrays? MATCH (o) -[f:FIELD]-> (a:Array), (o)

    -- (ot:Type), (a) --> (at:Type) WHERE f.name =~ '.*SwitchMap.*' RETURN a.length as array_length, ot.name as owner_type, f.name as owner_field ORDER BY array_length DESCENDING
  33. Java Language Specification, SE7 Chapter 13. Binary Compatibility … 13.4.26.

    Evolution of Enums Adding or reordering constants in an enum type will not break compatibility with pre-existing binaries. Bytecode cannot switch directly on enum ordinals.
  34. Switching on Enums is Expensive Javac generates an additional synthetic

    class for every class that switches on an enum. The synthetic class contains a "switch map" int array • Maps enum ordinals to jump table indices. • Length is the number of enum elements • Initialised when the class is loaded But we don't change enums and client code independently
  35. Solution Sponsor development of enum optimisations in Proguard

  36. Take-Away Lessons

  37. Take-Away Lessons Line-oriented data format Parse & Insert Graph Database

    (Niche) Development Environment Explore, Visualise, Analyse, Transform
  38. sky.com/jobs

  39. Nat Pryce natpryce.com info@natpryce.com @natpryce Questions? James Richardson time4tea.net info@time4tea.net

    @richajam sky.com/jobs