$30 off During Our Annual Pro Sale. View Details »

Racy JGit - A short history of time

msohn
November 17, 2019

Racy JGit - A short history of time

Detecting file modifications using file meta data and how this was improved in JGit.

msohn

November 17, 2019
Tweet

More Decks by msohn

Other Decks in Technology

Transcript

  1. Racy JGit
    A short history of time
    Matthias Sohn (SAP)
    10-7
    10-10
    Clock A by Martin Burgess made
    to John Harrison’s principles
    Jack Parry and Louis Essen
    First atomic Caesium clock

    View Slide

  2. Bug 544199 - Pack file removed from the pack list,
    but new list not reloaded (2019-02-06)
    GC a busy repository in Gerrit
    ➨ Packfile marked corrupt and removed from pack list
    ➨ MissingObjectExceptions
    Requires node restart (2-3 times a day)

    View Slide

  3. Analysis
    Packfile name is determined by contained objects
    GC may change checksum (same list of objects)
    if pack options are different than in original pack
    ➨ JGit didn‘t detect file modification
    but different checksum
    ➨ pack marked corrupt
    removed from in-memory packlist
    ➨ missing objects

    View Slide

  4. First fix
    (2019-03-06)

    View Slide

  5. How does Git detect file modification ?
    Comparing file content is too slow
    ➨ compare file meta data (lastModified, size, inode …)
    ➨ much faster J
    $ stat Test.java
    File: Test.java
    Size: 36178 Blocks: 72 IO Block: 4096 regular file
    Device: 1000004h/16777220d Inode: 665954471 Links: 1
    Access: (0644/-rw-r--r--) Uid: ( 503/ d029788) Gid: ( 20/
    staff)
    Access: 2019-11-15 21:06:23.535548393 -0800
    Modify: 2019-11-13 14:55:06.820859430 -0800
    Change: 2019-11-13 14:55:06.821155690 -0800
    Birth: 2019-11-13 14:55:06.820727648 -0800

    View Slide

  6. Racy Git Problem
    cached lastModified != lastModified ➨ file changed
    cached lastModified == lastModified ➨ don‘t know ???
    ➨ finite resolution of file timestamps ➨ compare content
    JGit‘s FileSnapshot implements this
    time
    1 2 3 4 5
    resolution

    View Slide

  7. Bug 546891 - Performance regression
    ➨ expensive pack file and index re-reads
    FileSnapshot used hard-coded resolution 2,5 sec

    View Slide

  8. Improve FileSnapshot
    To detect modification consider
    1. file size
    2. inode (Unix)
    3. lastModified
    4. for pack files: checksum
    Instead of millis (long) use Instant, FileTime
    ➨ use higher timestamp resolution on modern filesystems

    View Slide

  9. File Timestamp Resolution
    Filesystem OS Timestamp Resolution
    ext3 Linux 1 sec
    ext4, btrfs, xfs, zfs Linux 1 ns
    HFS+ Mac 1 sec
    APFS Mac 1 ns
    NTFS Windows 100 ns
    FAT Windows 2 sec

    View Slide

  10. Auto Tuning
    Measure per filesystem:
    • clock resolution (seen from Java)
    + file timestamp resolution (seen from Java):
    ➨ timestampResolution
    • test minimal time interval lastModified can distinguish:
    ➨ minRacyInterval
    Store in ~/.gitconfig (5.1.9 - 5.5.1)
    XDG_CONFIG_HOME/jgit/config (> 5.5.1)

    View Slide

  11. Java File-system OS timestampResolution minRacyInterval
    1.8.0_212 btrfs Linux 1 ms 3,6 - 6,6 ms
    1.8.0_212 ext4 Linux 3 ms 1,1 - 4,1 ms
    1.8.0_212 xfs Linux 4 ms 3,7 - 3,9 ms
    1.8.0_212 zfs Linux 3 ms 4,8 - 5,0 ms
    11.0.3+7 btrfs Linux 3 us 0,7 - 4,7 ms
    11.0.3+7 ext4 Linux 6 us 0,7 - 4,7 ms
    11.0.3+7 xfs Linux 7 us 0,1 - 8,0 ms
    11.0.3+7 zfs Linux 7 us 0,7 - 5,2 ms
    1.8.0_212 APFS (SSD) Mac 1 s 0
    11.0.3+7 APFS (SSD) Mac 6 us 0
    Results

    View Slide

  12. Why minRacyInterval ?
    • clocks on different cores are not guaranteed to be in synch
    • Java GC
    • measuring duration in Java takes > 1ns
    • JIT compilation
    • OS task scheduler
    eff. Resolution = max(timestampResolution,
    minRacyThreshold) * safetyFactor
    safetyFactor: < 100ms : 5/2
    > 100ms : 5/4

    View Slide

  13. Config Example
    [filesystem “my.host.name|AdoptOpenJDK|1.8.0_221|/dev/disk1s5"]
    timestampResolution = 1001 milliseconds
    minRacyThreshold = 0 nanoseconds
    [filesystem "my.host.name |SAP SE|13.0.1|/dev/disk1s5"]
    timestampResolution = 10000 nanoseconds
    minRacyThreshold = 0 nanoseconds

    View Slide

  14. Java Bugs File Timestamps
    high res FS (1ns): 14:55:06.820859430
    Java 8 (Mac): 14:55:06.000000000
    Java 8, 9 (Linux): 14:55:06.820000000
    Java < 14: 14:55:06.820859000
    Java 14 (Windows) 14:55:06.820859000
    Java 14 (Posix) 14:55:06.820859430
    when git index had entries from git and jgit
    this caused wrong git status of modified files
    ➨ truncate all timestamps if trailing 000 found

    View Slide

  15. Bug 548716 - Error after pull: "couldn't lock local
    tracking ref for an update"
    Newly added refs stopped being visible
    Lock failure in Gerrit IT test
    Looked like another case of racy JGit
    • PackedBatchRefUpdate created unsorted packed-refs
    • Higher resolution timestamp handling helped hitting this
    Fixed by Han-Wen J (2019-09-03)

    View Slide

  16. Summary
    • 7 months
    • 82 commits
    • 6 authors
    Chris , Han-Wen, Luca,
    Matthias, Marc, Thomas
    • many reviewers
    • countless tests
    • 22 service releases
    4.5.7, 4.7.9, 4.9.10, 4.11.8, 4.11.9,
    5.1.7 - 5.1.12, 5.2.2, 5.3.1 - 5.3.6,
    5.4.0 - 5.4.3, 5.5.0, 5.5.1
    Measuring time and detecting modified files is hard
    10-18
    NIST ytterbium lattice clock

    View Slide

  17. Links
    • General: https://amturing.acm.org/p558-lamport.pdf http://steveloughran.blogspot.com/2015/09/time-on-multi-core-multi-socket-
    servers.html
    • Subtle ways to lose data, section Timestamp semantics and granularity https://www.nayuki.io/page/subtle-ways-to-lose-data
    • Java: http://stas-blogspot.blogspot.com/2012/02/what-is-behind-
    systemnanotime.html https://web.archive.org/web/20160308031939/https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_
    vm_clockshttps://shipilev.net/blog/2014/nanotrusting-nanotime/ https://blog.packagecloud.io/eng/2017/03/14/using-strace-to-
    understand-java-performance-improvement/ https://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html
    • TIMERMETER: Quantifying Properties of Software Timers for System Analysis (2009) Uni Karlsruhe
    https://sdqweb.ipd.kit.edu/publications/pdfs/kuperberg2009c.pdf
    source code in
    https://github.com/akara/faban https://github.com/akara/faban/tree/master/driver/src/com/sun/faban/driver/util/timermeter
    • Metric-based Selection of Timer Methods for Accurate Measurements https://sdqweb.ipd.kit.edu/publications/pdfs/kuperberg2011
    • Linux (2006): https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6458294
    Redhat on Linux clocks: https://access.redhat.com/documentation/en-
    US/Red_Hat_Enterprise_MRG/2/html/Realtime_Reference_Guide/chap-Timestamping.html
    • timestamp accuracy on EXT4 (sub millsecond) (2013) https://stackoverflow.com/questions/14392975/timestamp-accuracy-on-ext4-
    sub-millsecond
    • mtime comparison considered harmful (2018) https://apenwarr.ca/log/20181113
    also see this discussion of this blog on ycombinator https://news.ycombinator.com/item?id=18473744
    • Windows: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6440250

    View Slide