Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Git Internals

Git Internals

A Tour of some Git Plumbing

George V. Reilly

August 13, 2019
Tweet

More Decks by George V. Reilly

Other Decks in Programming

Transcript

  1. 3 Agenda • Porcelain vs Plumbing • Git Object Model

    ◦ blob, tree, and commit objects ◦ Refs, tags, and branches • Packs • Merges
  2. 4 Git Porcelain and Plumbing • Porcelain: the Git commands

    that you normally use ◦ add, merge, commit, log, grep, status, … ◦ Relatively† user-friendly • Plumbing: the low-level commands that provide the actual implementation ◦ Generally, you don’t need to know these † C’mon, it’s Git.
  3. 6 Some Git Plumbing • A Plumber's Guide to Git

    workshop • Blob, tree, and commit objects form a content-addressable file system of immutable objects • Refs, tags, and branches provide human-friendly names for objects • All stored under .git/ at top of working tree
  4. 7 Blob object • A blob stores the contents of

    a single file ◦ But not filename or permissions • Named by the SHA-1 hash of the contents ◦ .git/objects/01/23456789012345678901234567890123456789 • Change the contents of a file => get a new blob (different SHA) • Blob contains a header and gzipped payload • inflate script can show raw content of a Git object #!/usr/bin/env ruby require "zlib" puts Zlib::Inflate.inflate(STDIN.read) • git hash-object -w path • git cat-file -p object_id
  5. 8 Tree object • A tree contains sorted pointers to

    blobs and trees • Each pointer has: ◦ File Mode: 100644 (normal), 100755 (executable), 120000 (symlink) ◦ Type: blob or tree ◦ SHA-1 hash ◦ Filename • A tree is a snapshot of a directory • A tree object is also named by the SHA-1 hash of its contents • Since trees can contain other trees, can describe entire directory tree
  6. 9 Merkle Tree • “A Merkle Tree is a tree

    in which every leaf node is labelled with the hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes” — Wikipedia • (Yes, Merkle trees are also used in Blockchain.) • Files with different contents have different hashes. Easy to tell if file changed between two commits. • Two files with the same content have the same hash, regardless of filenames and/or permissions. • Two directories (trees) with the same hash have the same children. • Trees, like blobs, are immutable.
  7. 10 Commit object • A commit gives context to a

    tree ◦ Top-level tree ◦ Parent commit(s) ◦ Author/Committer (includes date) ◦ Commit message • Yet another hashed object • A commit contains a complete snapshot of entire tree, not a delta • If a subtree from a previous commit is unchanged, then tree object for that subtree is reused • git commit --amend creates new commit = new tree(s) + new blob(s)
  8. 11 Refs, Tags, and Branches • Refs, tags, and branches

    are named pointers to commits ◦ Far more meaningful for humans than 40-hexdigit hashes • A reference is a file in .git/refs containing a hash • Local branches are refs in .git/refs/heads • Remote branches are refs in .git/refs/remotes • HEAD is a symbolic ref in .git/HEAD • A tag refers to a specific commit; used for versioned releases • The reflog records every change made in a repo
  9. 12 Packs • We’ve seen “loose” objects in the Plumber’s

    Guide workshop • A packfile contains a series of versions of an individual file and the deltas between them. • Git’s “smart” protocol exchanges packs between client and server when pulling and pushing. • Packs save space on disk and on the wire.
  10. 13 Merges • A merge is a commit with N≥2

    parents, retaining merge history • See “Basic Branching and Merging” in Pro Git • Rebasing keeps a linear history
  11. 14 References • Pro Git by Scott Chacon ◦ Chapter

    10: “Git Internals” ◦ Chapter 3: “Git Branching” • “A Plumber’s Guide to Git”, Alex Chan • Building Git, James Coglan ◦ Recreates much of Git from scratch in Ruby • “More Productive Git”, James Turnbull, Increment #9: Open Source • “Comparing Git Trees in Go”, source{d} blog • “Unpacking Git packfiles”, Aditya Mukerjee, Codewords #3