Slide 77
Slide 77 text
git at Facebook
From: Joshua Redstone fb.com>
Subject: Git performance results on a large repository
Date: 2012-02-03 14:20:06 GMT
Hi Git folks,
We (Facebook) have been investigating source control systems to meet our
growing needs. We already use git fairly widely, but have noticed it
getting slower as we grow, and we want to make sure we have a good story
going forward. We're debating how to proceed and would like to solicit
people's thoughts.
To better understand git scalability, I've built up a large, synthetic
repository and measured a few git operations on it. I summarize the
results here.
The test repo has 4 million commits, linear history and about 1.3 million
files. The size of the .git directory is about 15GB, and has been
repacked with 'git repack -a -d -f --max-pack-size=10g --depth=100
--window=250'. This repack took about 2 days on a beefy machine (I.e.,
lots of ram and flash). The size of the index file is 191 MB. I can share
the script that generated it if people are interested - It basically picks
2-5 files, modifies a line or two and adds a few lines at the end
consisting of random dictionary words, occasionally creates a new file,
commits all the modifications and repeats.
I timed a few common operations with both a warm OS file cache and a cold
cache. i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did
the operation in question a few times (first timing is the cold timing,
the next few are the warm timings). The following results are on a server
with average hard drive (I.e., not flash) and > 10GB of ram.
http://thread.gmane.org/gmane.comp.version-control.git/189776