Slide 1

Slide 1 text

Git Internals A Tour of some Git Plumbing @georgevreilly — 2019/08/13

Slide 2

Slide 2 text

2 XKCD 1597

Slide 3

Slide 3 text

3 Agenda ● Porcelain vs Plumbing ● Git Object Model ○ blob, tree, and commit objects ○ Refs, tags, and branches ● Packs ● Merges

Slide 4

Slide 4 text

4 Git Porcelain and Plumbing ● Porcelain: the Git commands that you normally use ○ add, merge, commit, log, grep, status, … ○ Relatively† user-friendly ● Plumbing: the low-level commands that provide the actual implementation ○ Generally, you don’t need to know these † C’mon, it’s Git.

Slide 5

Slide 5 text

5 Git Object Model: blobs, trees, commits, refs

Slide 6

Slide 6 text

6 Some Git Plumbing ● A Plumber's Guide to Git workshop ● Blob, tree, and commit objects form a content-addressable file system of immutable objects ● Refs, tags, and branches provide human-friendly names for objects ● All stored under .git/ at top of working tree

Slide 7

Slide 7 text

7 Blob object ● A blob stores the contents of a single file ○ But not filename or permissions ● Named by the SHA-1 hash of the contents ○ .git/objects/01/23456789012345678901234567890123456789 ● Change the contents of a file => get a new blob (different SHA) ● Blob contains a header and gzipped payload ● inflate script can show raw content of a Git object #!/usr/bin/env ruby require "zlib" puts Zlib::Inflate.inflate(STDIN.read) ● git hash-object -w path ● git cat-file -p object_id

Slide 8

Slide 8 text

8 Tree object ● A tree contains sorted pointers to blobs and trees ● Each pointer has: ○ File Mode: 100644 (normal), 100755 (executable), 120000 (symlink) ○ Type: blob or tree ○ SHA-1 hash ○ Filename ● A tree is a snapshot of a directory ● A tree object is also named by the SHA-1 hash of its contents ● Since trees can contain other trees, can describe entire directory tree

Slide 9

Slide 9 text

9 Merkle Tree ● “A Merkle Tree is a tree in which every leaf node is labelled with the hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes” — Wikipedia ● (Yes, Merkle trees are also used in Blockchain.) ● Files with different contents have different hashes. Easy to tell if file changed between two commits. ● Two files with the same content have the same hash, regardless of filenames and/or permissions. ● Two directories (trees) with the same hash have the same children. ● Trees, like blobs, are immutable.

Slide 10

Slide 10 text

10 Commit object ● A commit gives context to a tree ○ Top-level tree ○ Parent commit(s) ○ Author/Committer (includes date) ○ Commit message ● Yet another hashed object ● A commit contains a complete snapshot of entire tree, not a delta ● If a subtree from a previous commit is unchanged, then tree object for that subtree is reused ● git commit --amend creates new commit = new tree(s) + new blob(s)

Slide 11

Slide 11 text

11 Refs, Tags, and Branches ● Refs, tags, and branches are named pointers to commits ○ Far more meaningful for humans than 40-hexdigit hashes ● A reference is a file in .git/refs containing a hash ● Local branches are refs in .git/refs/heads ● Remote branches are refs in .git/refs/remotes ● HEAD is a symbolic ref in .git/HEAD ● A tag refers to a specific commit; used for versioned releases ● The reflog records every change made in a repo

Slide 12

Slide 12 text

12 Packs ● We’ve seen “loose” objects in the Plumber’s Guide workshop ● A packfile contains a series of versions of an individual file and the deltas between them. ● Git’s “smart” protocol exchanges packs between client and server when pulling and pushing. ● Packs save space on disk and on the wire.

Slide 13

Slide 13 text

13 Merges ● A merge is a commit with N≥2 parents, retaining merge history ● See “Basic Branching and Merging” in Pro Git ● Rebasing keeps a linear history

Slide 14

Slide 14 text

14 References ● Pro Git by Scott Chacon ○ Chapter 10: “Git Internals” ○ Chapter 3: “Git Branching” ● “A Plumber’s Guide to Git”, Alex Chan ● Building Git, James Coglan ○ Recreates much of Git from scratch in Ruby ● “More Productive Git”, James Turnbull, Increment #9: Open Source ● “Comparing Git Trees in Go”, source{d} blog ● “Unpacking Git packfiles”, Aditya Mukerjee, Codewords #3