that you normally use ◦ add, merge, commit, log, grep, status, … ◦ Relatively† user-friendly • Plumbing: the low-level commands that provide the actual implementation ◦ Generally, you don’t need to know these † C’mon, it’s Git.
workshop • Blob, tree, and commit objects form a content-addressable file system of immutable objects • Refs, tags, and branches provide human-friendly names for objects • All stored under .git/ at top of working tree
a single file ◦ But not filename or permissions • Named by the SHA-1 hash of the contents ◦ .git/objects/01/23456789012345678901234567890123456789 • Change the contents of a file => get a new blob (different SHA) • Blob contains a header and gzipped payload • inflate script can show raw content of a Git object #!/usr/bin/env ruby require "zlib" puts Zlib::Inflate.inflate(STDIN.read) • git hash-object -w path • git cat-file -p object_id
blobs and trees • Each pointer has: ◦ File Mode: 100644 (normal), 100755 (executable), 120000 (symlink) ◦ Type: blob or tree ◦ SHA-1 hash ◦ Filename • A tree is a snapshot of a directory • A tree object is also named by the SHA-1 hash of its contents • Since trees can contain other trees, can describe entire directory tree
in which every leaf node is labelled with the hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes” — Wikipedia • (Yes, Merkle trees are also used in Blockchain.) • Files with different contents have different hashes. Easy to tell if file changed between two commits. • Two files with the same content have the same hash, regardless of filenames and/or permissions. • Two directories (trees) with the same hash have the same children. • Trees, like blobs, are immutable.
tree ◦ Top-level tree ◦ Parent commit(s) ◦ Author/Committer (includes date) ◦ Commit message • Yet another hashed object • A commit contains a complete snapshot of entire tree, not a delta • If a subtree from a previous commit is unchanged, then tree object for that subtree is reused • git commit --amend creates new commit = new tree(s) + new blob(s)
are named pointers to commits ◦ Far more meaningful for humans than 40-hexdigit hashes • A reference is a file in .git/refs containing a hash • Local branches are refs in .git/refs/heads • Remote branches are refs in .git/refs/remotes • HEAD is a symbolic ref in .git/HEAD • A tag refers to a specific commit; used for versioned releases • The reflog records every change made in a repo
Guide workshop • A packfile contains a series of versions of an individual file and the deltas between them. • Git’s “smart” protocol exchanges packs between client and server when pulling and pushing. • Packs save space on disk and on the wire.
10: “Git Internals” ◦ Chapter 3: “Git Branching” • “A Plumber’s Guide to Git”, Alex Chan • Building Git, James Coglan ◦ Recreates much of Git from scratch in Ruby • “More Productive Git”, James Turnbull, Increment #9: Open Source • “Comparing Git Trees in Go”, source{d} blog • “Unpacking Git packfiles”, Aditya Mukerjee, Codewords #3