Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Git as a Document Format — Wil Shipley

1fa9cb8c7997c8c4d3d251fb5e41f749?s=47 Realm
July 27, 2015

Git as a Document Format — Wil Shipley

I’ll discuss the advantages and pitfalls of traditional Cocoa file formats (archives, plists, XML, and CoreData), and the traditional way of adding undo and redo to Cocoa apps. Then I’ll talk about how ‘git’ (the library, not the command-line tools) magically solves all those problems and basically gives you everything you ever wanted, for free.

This talk was presented at AltConf in June 2015.

1fa9cb8c7997c8c4d3d251fb5e41f749?s=128

Realm

July 27, 2015
Tweet

Transcript

  1. Document File Management: Load, Save, Autosave Undo, Redo, and Backup

    Using git Wil Shipley
  2. Who’s This Guy?

  3. 8,600!

  4. None
  5. None
  6. None
  7. None
  8. None
  9. Cool Story, Bro

  10. Thoughts

  11. Thoughts on Load & Save • Often one or more

    control files that change frequently: • JSON or XML or plists • e.g.: the lines and boxes in OmniGraffle, the book titles in Delicious Library • Plus several huge resource files (“blobs”): • images, sound files, movies • e.g.: images in OmniGraffle, covers in Delicious Library • You need reading and writing the control file(s) to be fast, since you’re doing it a lot and the whole thing changes every time the user changes anything • You don’t want to save out all the resources every time you save the control file(s). • But saves should be atomic.
  12. Thoughts on Autosave • It’s totally great except if you’re

    editing a file in Preview and it’s live-saving the changes you make so your original file is being messed up because some dweeb decided that “Save As…” shouldn’t work any more.
  13. Thoughts on Undo & Redo • It probably should never

    corrupt your file if you undo and redo. (*cough* Xcode *cough*)
  14. Thoughts on Backups • Nobody will do backups unless they

    magically happen when you sleep like underwear-stealing gnomes.
  15. A Brief History of Documents in Cocoa

  16. 1988: NeXTstep 0.8 — “TypedStreams” • Load & save: with

    TypedStreams • Good: easy to write • Bad: class names and variables’ order encoded in files, not future-safe • Bad: not particularly fast, binary, not user-editable • No autosave • No undo & redo • No backups
  17. 1990: NeXTstep 2 — text files • Load & save:

    convention to use custom text file format • Diagram! (OmniGraffle), Concurrence (Keynote) • Good: User-modifiable, reparable • Bad: Pretty slow, very large files, bad for images • No autosave • No undo & redo • No backups
  18. 1992: NextStep 3 — Text “Property Lists” • Load &

    save: with text plists • Concurrence-redux • Good: User-modifiable, “standard,” reparable • Bad: Pretty slow, very large files, bad for images • No autosave • No undo & redo • No backups
  19. 1992: NextStep 3 — “NSUndoManager” • Undo & redo: with

    NSUndoManager • -prepareWithInvocationTarget: records state changes both directions • Good: fairly easy to use • Bad: very easy to screw up • miss state change and model is corrupted (cf. Xcode) • release objects and it crashes • Bad: not persistent
  20. 1996-2000: NeXTSTEP 4, Rhapsody, OS X Server • Not much

    changed in document APIs • Except we got Carbon and Blue Box • Yay?
  21. 2001: OS X 10.0 — “NSFileWrapper” / “NSBundle” • Load

    & save: with NSFileWrapper • Good: easy to implement, quickly saves control files without re-saving huge blobs they need, does atomic writes • Bad: good solution, still in use • Limits: doesn’t address format of control files
  22. 2001: OS X 10.0 — “NSKeyedArchiver” • Load & save:

    with NSKeyedArchiver • Good: faster than text, future-proof • Bad: hard for users to edit (requires Xcode) • Limits: shouldn’t be used for huge blobs
  23. 2002: OS X 10.1 — Binary “NSPropertyList” • Load &

    save: with NSPropertyList • Good: faster than text, future-proof • Bad: hard for users to edit (requires Xcode) • Limits: shouldn’t be used for huge blobs
  24. 2004: OS X 10.4 — “CoreData” • Load & save

    • Good: very very fast, huge files supported, stores blobs well, structured, sqlite well supported, users can sort of edit files (cf. Delicious Library 2 & 3) • Bad: would corrupt entire file on crash until 10.9, CoreData conventions very brittle, lots of exceptions • Autosave • Good: Saving so fast autosave is trivial • Undo & redo • Bad: Very tweaky, didn’t allow for state changes outside of undo, not persistent • Backup • Bad: Entire file changes at once, huge backups
  25. 2007: OS X 10.5 — “Time Machine” • Backups •

    Good: automatic, regular, complete, incremental, people actually did backups • Bad: no API on “Versions,” stores huge files monolithically, originally very buggy
  26. 2010: OS X 10.7 — “Versions” • Autosave: with NSDocument

    • Mostly voodoo, along with the multithreaded file presenter bullcrap and state restoration that didn’t work • Good: kind of works if you do it just right • Bad: inflexible, still blows away user data (cf. Preview), how do we tie into “Versions” in our software?
  27. A Perfect World • Load: fast, incremental (don’t load blobs

    we don’t need) • Save: fast, understands multi-part documents, doesn’t re-save large blobs that haven’t changed, saves into format users might be able to edit • Autosave: instant like CoreData but without losing previous states • Undo & redo: instant, can’t be corrupted by bad coding, are persistent, and can be pruned anywhere without losing the whole stack • Backup: should play nicely with Time Machine (not one huge file) and should be easy to back up to other systems • All these should take almost no code • Also I’d like a pony
  28. 2013: What to do? • Designing new app, know I

    want persistent undo stack • Also know I’m extremely lazy • Sean O’Brien suggests ‘git’ • I believe he’s high as frick, because ‘git’ is crazy ugly and is also an SCM system, not a file format, duh
  29. git: it’s a row of trees, man diagram credit: https://git-scm.com/book/no-nb/v1/Git-Internals-Git-Objects

  30. git: Saving • Create a new local git repository for

    each document • Each object in your document has a git “GTBlob” • Each object exists as an entry in a git “GTTree” • Each object has a UUID so it can be uniquely referenced—the entry name for the blob is its UUID’s string • Every time an object changes you write out its containing GTTree using git, and bump the repository HEAD to point to the new commit • git uniques objects and does all the hard work • git compresses when needed
  31. git: Loading • read the HEAD commit, which points to

    a GTTree • read the GTBlob entries inside the GTTree, assign them the UUIDs from their filenames, and create your real objects for the blobs • Not fun part: manually rebuild the NSUndoManager’s stack to match the git repository, since there’s no decent APIs on NSUndoManager.
  32. git: Autosave • simply save after every top-level undo is

    closed, so every user change is remembered • around 0.01s per save in Dwelling
  33. git: Undo • move the HEAD to the previous commit

    • re-load the GTTree like you would opening a file
  34. git: Redo • move the HEAD to the next commit

    • re-load the GTTree like you would opening a file
  35. git: Backup • normal users can use Time Machine and

    it won’t have to back up a single huge file like a database • advanced users can push their local file repositories to github or their own servers
  36. Other free things with git • if the file gets

    too big or the user wants privacy delete ANY commits (except for the HEAD) and the rest of the undo / redo stack is still completely valid • can implement a branching system so two users could edit the same document separately, pull each other’s changes, and then merge them together.
  37. Things git doesn’t solve • still need to pick a

    file format for your control files • text plist, XML, json all fine ideas • if control files are too big then your file will grow huge when it’s been edited a lot
  38. A Perfect World • Load: fast, incremental (don’t load blobs

    we don’t need) • Save: fast, understands multi-part documents, doesn’t re-save large blobs that haven’t changed, saves into format users might be able to edit • Autosave: instant like CoreData but without losing previous states • Undo & redo: instant, can’t be corrupted by bad coding, are persistent, and can be pruned anywhere without losing the whole stack • Backup: should play nicely with Time Machine (not one huge file) and should be easy to back up to other systems • All these should take almost no code • Still no pony
  39. demo

  40. libgit2 and objective-git • GPLv2 with a special Linking Exception

    • https://github.com/libgit2/libgit2 • https://github.com/libgit2/objective-git
  41. Sample code hastily ported to Swift: class GitDocument : NSDocument

    { static let endOfUndoReferenceName = "refs/heads/master", headReferenceName = "HEAD" // MARK: properties var repository: GTRepository! var currentCommit: GTCommit! var walls: [Wall]() // MARK: NSDocument override func readFromURL(url: NSURL, ofType typeName: String) throws { // File loading do { try repository = GTRepository(URL: url) } catch { // create new file do { try GTRepository.initializeEmptyRepositoryAtURL(url) try repository = GTRepository(URL: url) } catch { // create new file throw NSError() // MISSING I'm too lazy to make an error here, you do it } checkpointFileWithMessage("Empty Nest", "commit name for an empty file") } // MISSING: here we'd set up the NSUndoManager to have the old undos from the file, and also sets "currentCommit" loadTreeFromCurrentCommit() } override func autosavesInPlace() -> Bool { return false } // we’ll handle this ourselves, thanks override func updateChangeCount(change: NSDocumentChangeType) { } // Nope! We save down at the model level, after every event, so we'll just ignore these messages, so we don't get prompted to save changes and we don't get a dirty window // MARK: private methods private func loadTreeFromCurrentCommit() { let wallsTree = try GTTree.objectWithTreeEntry(currentCommit.tree, entryWithName: "connections") walls = [] for entryIndex in 0..<wallsTree.entryCount { let wallEntry = wallsTree.entryAtIndex(entryIndex) let wallBlob = try GTBlob.objectWithTreeEntry(wallEntry) let wall = Wall(dataRepresentation: wallBlob.data) wall.nameUUID = NSUUID(UUIDString: wallEntry.name) walls.appendObject(wall) } } } this Swift code was based on my real, very old Obj-C code but won’t compile as-is and is missing pieces, sorry