Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bringing the Unix Philosophy to Big Data

Bryan Cantrill
December 18, 2013
13

Bringing the Unix Philosophy to Big Data

My presentation from FutureStack 13. Video: http://www.youtube.com/watch?v=S0mviKhVmBI

Bryan Cantrill

December 18, 2013
Tweet

Transcript

  1. Bringing the Unix
    Philosophy to Big Data
    SVP, Engineering
    [email protected]
    Bryan Cantrill
    @bcantrill

    View Slide

  2. Unix
    • When Unix appeared in the early 1970s, it was not just a
    new system, but a new way of thinking about systems
    • Instead of a sealed monolith, the operating system was
    a collection of small, easily understood programs
    • First Edition Unix (1971) contained many programs that
    we still use today (ls, rm, cat, mv)
    • Its very name conveyed this minimalist aesthetic: Unix is
    a homophone of “eunuchs” — a castrated Multics
    We were a bit oppressed by the big system mentality. Ken
    wanted to do something simple. — Dennis Ritchie

    View Slide

  3. Unix: Let there be light
    • In 1969, Doug McIlroy had the idea of connecting
    different components:
    At the same time that Thompson and Ritchie were sketching
    out a file system, I was sketching out how to do data
    processing on the blackboard by connecting together
    cascades of processes
    • This was the primordial pipe, but it took three years to
    persuade Thompson to adopt it:
    And one day I came up with a syntax for the shell that went
    along with the piping, and Ken said, “I’m going to do it!”

    View Slide

  4. Unix: ...and there was light
    And the next morning we had this
    orgy of one-liners. — Doug McIlroy

    View Slide

  5. The Unix philosophy
    • The pipe — coupled with the small-system aesthetic —
    gave rise to the Unix philosophy, as articulated by Doug
    McIlroy:
    • Write programs that do one thing and do it well
    • Write programs to work together
    • Write programs that handle text streams, because
    that is a universal interface
    • Four decades later, this philosophy remains the single
    most important revolution in software systems thinking!

    View Slide

  6. • In 1986, Jon Bentley posed the challenge that became
    the Epic Rap Battle of computer science history:
    Read a file of text, determine the n most frequently used
    words, and print out a sorted list of those words along with
    their frequencies.
    • Don Knuth’s solution: an elaborate program in WEB, a
    Pascal-like literate programming system of his own
    invention, using a purpose-built algorithm
    • Doug McIlroy’s solution shows the power of the Unix
    philosophy:
    tr -cs A-Za-z '\n' | tr A-Z a-z | \
    sort | uniq -c | sort -rn | sed ${1}q
    Doug McIlroy v. Don Knuth: FIGHT!

    View Slide

  7. Big Data: History repeats itself?
    • The original Google MapReduce paper (Dean et al.,
    OSDI ’04) poses a problem disturbingly similar to
    Bentley’s challenge nearly two decades prior:
    Count of URL Access Frequency: The function processes
    logs of web page requests and outputs ⟨URL, 1⟩. The
    reduce function adds together all values for the same URL
    and emits a ⟨URL, total count⟩ pair
    • But the solutions do not adhere to the Unix philosophy...
    • ...and nor do they make use of the substantial Unix
    foundation for data processing
    • e.g., Appendix A of the OSDI ’04 paper has a 71 line
    word count in C++ — with nary a wc in sight

    View Slide

  8. Big Data: Challenges
    • Must be able to scale storage to allow for “big data” —
    quantities of data that dwarf a single machine
    • Must allow for massively parallel execution
    • Must allow for multi-tenancy
    • To make use of both the Unix philosophy and its toolset,
    must be able to virtualize the operating system

    View Slide

  9. Scaling storage
    • There are essentially three protocols for scalable
    storage: block, file and object
    • Block (i.e., a SAN) is far too low an abstraction — and
    notoriously expensive to scale
    • File (i.e., NAS) is too permissive an abstraction — it
    implies a coherent store for arbitrary (partial) writes,
    trying (and failing) to be both C and A in CAP
    • Object (e.g., S3) is similar “enough” to a file-based
    abstraction, but by not allowing partial writes, allows for
    proper CAP tradeoffs

    View Slide

  10. Object storage
    • Object storage systems do not allow for partial updates
    • For both durability and availability, objects are generally
    erasure encoded across spindles on different nodes
    • A different approach is to have a highly reliable local file
    system that erasure encodes across local spindles —
    with entire objects duplicated across nodes for
    availability
    • ZFS pioneered both reliability and efficiency of this
    model with RAID-Z — and has refined it over the past
    decade of production use
    • ZFS is one of the four foundational technologies in
    Joyent’s open source SmartOS

    View Slide

  11. Virtualizing the operating system?
    • Historically — since the 1960s — systems have been
    virtualized at the level of hardware
    • Hardware virtualization has its advantages, but it’s
    heavyweight: operating systems are not designed to
    share resources like DRAM, CPU, I/O devices, etc.
    • One can instead virtualize at the level of the operating
    system: a single OS kernel that creates lightweight
    containers — on the metal, but securely partitioned
    • Pioneered by BSD’s jails; taken to a logical extreme by
    zones found in Joyent’s SmartOS

    View Slide

  12. • Can we combine the efficiency and reliability of ZFS
    with the abstraction provided by zones to develop an
    object store that has compute as a first-class citizen?
    • ZFS rollback allows for zones to be trashed — simply
    rollback the zone after compute completes on an object
    • Add a job scheduling system that allows for both map
    and reduce phases of distributed work
    • Would allow for the Unix toolset to be used on arbitrary
    large amounts of data — unlocking big data one-liners
    • If it perhaps seems obvious now, it wasn’t at the time...
    Idea: ZFS + Zones?

    View Slide

  13. Idea: ZFS + Zones?

    View Slide

  14. • Building a sophisticated distributed system on top of
    ZFS and zones, we have built Manta, an internet-facing
    object storage system offering in situ compute
    • That is, the description of compute can be brought to
    where objects reside instead of having to backhaul
    objects to transient compute
    • The abstractions made available for computation are
    anything that can run on the OS...
    • ...and as a reminder, the OS — Unix — was built around
    the notion of ad hoc unstructured data processing, and
    allows for remarkably terse expressions of computation
    Manta: ZFS + Zones!

    View Slide

  15. • Manta allows for an arbitrarily scalable variant of
    McIlroy’s solution to Bentley’s challenge:
    mfind -t o /bcantrill/public/v7/usr/man | \
    mjob create -o -m "tr -cs A-Za-z '\n' | \
    tr A-Z a-z | sort | uniq -c" -r \
    "awk '{ x[\$2] += \$1 }
    END { for (w in x) { print x[w] \" \" w } }' | \
    sort -rn | sed ${1}q"
    • This description not only terse, it is high performing: data
    is left at rest — with the “map” phase doing heavy
    reduction of the data stream
    • As such, Manta — like Unix — is not merely syntactic
    sugar; it converges compute and data in a new way
    Manta: Unix for Big Data

    View Slide

  16. • Eventual consistency represents the wrong CAP
    tradeoffs for most; we prefer consistency over
    availability for writes (but still availability for reads)
    • Many more details:
    http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/
    • Celebrity endorsement:
    Manta: CAP tradeoffs

    View Slide

  17. • Hierarchical storage is an excellent idea (ht: Multics);
    Manta implements proper directories, delimited with a
    forward slash
    • Manta implements a snapshot/link hybrid dubbed a
    snaplink; can be used to effect versioning
    • Manta has full support for CORS headers
    • Manta uses SSH-based HTTP auth for client-side
    tooling (IETF draft-cavage-http-signatures-00)
    • Manta SDKs exist for node.js, Java, Ruby, Python
    • “npm install manta” for command line interface
    Manta: Other design principles

    View Slide

  18. • We believe compute/data convergence to be the future
    of big data: stores of record must support computation
    as a first-class, in situ operation
    • We believe that Unix is a natural way of expressing this
    computation — and that the OS is the right level at
    which to virtualize to support this securely
    • We believe that ZFS is the only sane storage substrate
    underpinning for such a system
    • Manta will surely not be the only system to represent the
    confluence of these — but it is the first
    • We are actively retooling our software stack in terms of
    Manta — Manta is changing the way we develop
    software!
    Manta and the future of big data

    View Slide

  19. • Product page:
    http://joyent.com/products/manta
    • node.js module:
    https://github.com/joyent/node-manta
    • Manta documentation:
    http://apidocs.joyent.com/manta/
    • IRC, e-mail, Twitter, etc.:
    #manta on freenode, [email protected], @mcavage,
    @dapsays, @yunongx, @joyent
    • Here’s to the orgy of big data one-liners!
    Manta: More information

    View Slide