Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future is Immutable (CodeMash 2016)

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for John Daily John Daily
January 07, 2016

The Future is Immutable (CodeMash 2016)

Word processors. POSIX filesystems. Relational databases. For decades, data storage has been largely predicated on the need to twiddle bits in files. Seagate is projecting that in 2020 we’ll store 12 zettabytes of data. There aren’t enough monkeys with keyboards in the known universe to twiddle that many bits. This is mostly immutable data. The idea of immutable record-keeping long predates computers, but is undergoing a resurgence as programmers cope with the complexities of distributed systems and large data sets. It’s even changing the way storage hardware is designed. We’ll talk about the ongoing evolution of hardware, software, and programming languages.

(Because my slides are quite spartan, I have included my occasionally-clunky presenters' notes.)

Avatar for John Daily

John Daily

January 07, 2016
Tweet

Other Decks in Technology

Transcript

  1. A @macintux joint The Future is Immutable …an opinionated look

    at the world as it is … or should be Thank CodeMash, volunteers, sponsors. David my Proctor. I’ve had a small taste of organizing a conference; very rewarding but very very very hard Diet mountain dew I’m John Daily, macintux on Twitter. Please interject questions, and don’t be alarmed if I wave off something as too trivial to talk about, because that really means I have no idea.
  2. Courtesy domo.com Quote from the tweet that introduced me to

    this infographic: More Data Created in Last 24 Months Than All Prior Human History. And the Internet of Things, once known more prosaically as sensor data, will only continue this trend.
  3. Fragment of an infographic from XO Communications Cisco says that

    this year a zettabyte of data will be transferred across the Internet. I’ve seen estimates that in 2013 over 4 zettabytes were stored globally. Cheaper disks get, more data companies want to store. One of the reasons NoSQL has become so popular is that the relational model has challenges when scaling beyond a single server, and there ain’t no server big enough to handle a zettabyte of data.
  4. const char * hello Explain difference. The latter is a

    guarantee, and guarantees are wonderful things
  5. const char * hello Explain difference. The latter is a

    guarantee, and guarantees are wonderful things
  6. Guarantees are liberating Talk about Bryan’s analogy. What we have

    is a guarantee of security for yourself…and also a constraint on your actions. Both of these are liberating. If you’ve been here for the pre-compiler sessions, you may have encountered a programming exercise. If I asked you to write a program, you’d probably stare at me blankly. If I give you a set of constraints: solve this problem, with these parameters, then you’ve got something to chew on. If I give you a deadline, now you’ve really got a problem to solve on top of a problem to solve. Constraints and guarantees are liberating.
  7. Constraints are liberating Guarantees are liberating Talk about Bryan’s analogy.

    What we have is a guarantee of security for yourself…and also a constraint on your actions. Both of these are liberating. If you’ve been here for the pre-compiler sessions, you may have encountered a programming exercise. If I asked you to write a program, you’d probably stare at me blankly. If I give you a set of constraints: solve this problem, with these parameters, then you’ve got something to chew on. If I give you a deadline, now you’ve really got a problem to solve on top of a problem to solve. Constraints and guarantees are liberating.
  8. “How many people here have used a complex system that

    manages its data immutably?” “How many people have used git for version control?”
  9. Image snagged from https://wiki.phpbb.com/Git This is the phpBB branching model.

    Yes, git is crazy complicated, but how reassuring is it to know that all of your history is in there somewhere? Notice the long down arrow on the left: time. Time is a vital part of the immutability story. It’s not that an immutable data set never changes. Instead, new data can arrive and change the overall picture, but time becomes an explicit parameter and can be rolled back. Talk about liberation! Anyone here used NFS? How about Dropbox? I can’t speak to what may have happened in the last decade or so with NFS, but when I used it, it was awful. Server crashed? All clients became unusable. Dropbox, on the other hand, has a very different distributed file model which operates fine in disconnected mode, and much like git, keeps versions of each file. Immutable objects, changeable big picture
  10. Functional Programming At the heart of software is the idea

    of data transformation. Joining tables in a database, calculating the driving time between two points on a map, figuring out whether your snowboarding sprite just collided with that rock: these are all data transformations. No style of programming, with the possible exception of stack-based languages, places data transformation front and center like functional programming. If programming is applied math, functional programming hews closest to the platonic ideal.
  11. x=x+1 Stealing from myself: if you attend my Erlang talk

    tomorrow, you’ll see this slide again. Any language that allows you to do this is doing it wrong. Any mathematician will tell you X can never be equal to X + 1.
  12. So if you can’t change data, how can you write

    software? To picture how immutable data structures are managed and copied, let’s walk through this simple illustration of a tree (or trie)
  13. We need to also up the orange nodes to create

    a new path through the tree
  14. So picture two threads sharing v1 of this tree. One

    of them needs to make changes; the important aspect here is that it CAN without impacting in any way the other thread. The other still sees v1 of this data. And this can be a very efficient copy strategy.
  15. Functional Programming Lisp, Clojure, ClojureScript, Erlang + other BEAM languages,

    Haskell. Hybrids like Scala I’m personally very wary of. I like opinionated languages, ones that offer serious constraints in the pursuit of useful guarantees. Having said that, I also have a deep, abiding love for Perl, so my hypocrisy knows no bounds.
  16. Functional Programming • Thread safety • Referential transparency • Efficient,

    safe deep “copy” Lisp, Clojure, ClojureScript, Erlang + other BEAM languages, Haskell. Hybrids like Scala I’m personally very wary of. I like opinionated languages, ones that offer serious constraints in the pursuit of useful guarantees. Having said that, I also have a deep, abiding love for Perl, so my hypocrisy knows no bounds.
  17. Distributed systems Many of the challenges involved in distributed systems

    is making sure data is consistent. If a piece of data exists on three different machines, how do you cope with updates? Coordination, the art of making sure an update occurs everywhere, is expensive and hard to do correctly. Many papers have been written on the topic, with Paxos, Zookeeper, Raft being among the more common names of protocols to do this. Not only at the distributed systems layer, but increasingly computers internally are distributed systems. If you obtain a lock on a multicore system, the cost in terms of the amount of work that the other cores are not doing can be very expensive
  18. Distributed systems • Trivial coordination of updates to distributed data

    Many of the challenges involved in distributed systems is making sure data is consistent. If a piece of data exists on three different machines, how do you cope with updates? Coordination, the art of making sure an update occurs everywhere, is expensive and hard to do correctly. Many papers have been written on the topic, with Paxos, Zookeeper, Raft being among the more common names of protocols to do this. Not only at the distributed systems layer, but increasingly computers internally are distributed systems. If you obtain a lock on a multicore system, the cost in terms of the amount of work that the other cores are not doing can be very expensive
  19. Databases Relational database : tables are effectively just a cache

    of the immutable WAL, used for log shipping. Talk about characteristics of a log in this sense. You peasants get the snapshot; the landowners, the database authors themselves, get the full history. What are some limits of relational databases? Hadoop: what do you do when you don’t have enough data for your long-running job? (will return to this later) Normalization: why is it necessary Event sourcing: expose the WAL in actionable format. Here’s an immutable log of all changes. Accounts don’t use erasers; they just provide a new entry overriding earlier mistakes
  20. Datomic architecture Datomic: Hickey - reinvent the database as a

    persistent data structure. Query any fact, any time. Simplified view of Datomic architecture. Describe. Talk about Hadoop again.
  21. Databases • Reads without write locks • Queries against any

    point in time • Denormalization for faster reads AFTER Talk more about the Log (Kreps is next)
  22. Filesystems POSIX filesystem model is horribly non-scalable. HDFS, log-structured filesystem

    Important caveat: data corruption due to buggy OS/hardware faults is always a possibility
  23. Filesystems • Data will not be modified by other entities

    • Trivial distribution of data POSIX filesystem model is horribly non-scalable. HDFS, log-structured filesystem Important caveat: data corruption due to buggy OS/hardware faults is always a possibility
  24. https://www.usenix.org/system/files/conference/inflow14/inflow14-yang.pdf Paper from 2014. SSDs use something called Flash Translation

    Layer (FTL) which is a data mapping layer that operates much like a log-structured file. Turns out that layering a log- structured filesystem and log-structured data atop this is a bad idea because there are now multiple incompatible GC and metadata systems messing with the optimal flash management strategy. So, immutability isn’t always a good thing. You have to use it wisely. Seagate, KINETIC
  25. Infrastructure Custom AMIs. AWS APIs. Docker. Unikernels. The appeal of

    immutable infrastructure is roughly proportional to the scale of the service. Mirrors the separation of concerns in FP.
  26. Infrastructure • Greater confidence in ability to restore from backups

    • Predictable behavior of application in a known environment Custom AMIs. AWS APIs. Docker. Unikernels. The appeal of immutable infrastructure is roughly proportional to the scale of the service. Mirrors the separation of concerns in FP.
  27. A treatise concerning eternal and immutable morality Wikimedia Commons John

    R. Daily http://tinyurl.com/cm-immutable @macintux Datomic git Kinetic HDFS clojure Erlang Docker Om tinyurl.com/cm-immutable Book published in 1731