Bringing the Unix Philosophy to Big Data

Bringing the Unix Philosophy to Big Data SVP, Engineering [email protected]
Bryan Cantrill @bcantrill

Unix • When Unix appeared in the early 1970s, it
was not just a new system, but a new way of thinking about systems • Instead of a sealed monolith, the operating system was a collection of small, easily understood programs • First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv) • Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie

Unix: Let there be light • In 1969, Doug McIlroy
had the idea of connecting different components: At the same time that Thompson and Ritchie were sketching out a ﬁle system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes • This was the primordial pipe, but it took three years to persuade Thompson to adopt it: And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”

Unix: ...and there was light And the next morning we
had this orgy of one-liners. — Doug McIlroy

The Unix philosophy • The pipe — coupled with the
small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy: • Write programs that do one thing and do it well • Write programs to work together • Write programs that handle text streams, because that is a universal interface • Four decades later, this philosophy remains the single most important revolution in software systems thinking!

• In 1986, Jon Bentley posed the challenge that became
the Epic Rap Battle of computer science history: Read a ﬁle of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. • Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm • Doug McIlroy’s solution shows the power of the Unix philosophy: tr -cs A-Za-z '\n' | tr A-Z a-z | \ sort | uniq -c | sort -rn | sed ${1}q Doug McIlroy v. Don Knuth: FIGHT!

Big Data: History repeats itself? • The original Google MapReduce
paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior: Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair • But the solutions do not adhere to the Unix philosophy... • ...and nor do they make use of the substantial Unix foundation for data processing • e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight

Big Data: Challenges • Must be able to scale storage
to allow for “big data” — quantities of data that dwarf a single machine • Must allow for massively parallel execution • Must allow for multi-tenancy • To make use of both the Unix philosophy and its toolset, must be able to virtualize the operating system

Scaling storage • There are essentially three protocols for scalable
storage: block, ﬁle and object • Block (i.e., a SAN) is far too low an abstraction — and notoriously expensive to scale • File (i.e., NAS) is too permissive an abstraction — it implies a coherent store for arbitrary (partial) writes, trying (and failing) to be both C and A in CAP • Object (e.g., S3) is similar “enough” to a ﬁle-based abstraction, but by not allowing partial writes, allows for proper CAP tradeoffs

Object storage • Object storage systems do not allow for
partial updates • For both durability and availability, objects are generally erasure encoded across spindles on different nodes • A different approach is to have a highly reliable local file system that erasure encodes across local spindles — with entire objects duplicated across nodes for availability • ZFS pioneered both reliability and efficiency of this model with RAID-Z — and has refined it over the past decade of production use • ZFS is one of the four foundational technologies in Joyent’s open source SmartOS

Virtualizing the operating system? • Historically — since the 1960s
— systems have been virtualized at the level of hardware • Hardware virtualization has its advantages, but it’s heavyweight: operating systems are not designed to share resources like DRAM, CPU, I/O devices, etc. • One can instead virtualize at the level of the operating system: a single OS kernel that creates lightweight containers — on the metal, but securely partitioned • Pioneered by BSD’s jails; taken to a logical extreme by zones found in Joyent’s SmartOS

• Can we combine the efﬁciency and reliability of ZFS
with the abstraction provided by zones to develop an object store that has compute as a ﬁrst-class citizen? • ZFS rollback allows for zones to be trashed — simply rollback the zone after compute completes on an object • Add a job scheduling system that allows for both map and reduce phases of distributed work • Would allow for the Unix toolset to be used on arbitrary large amounts of data — unlocking big data one-liners • If it perhaps seems obvious now, it wasn’t at the time... Idea: ZFS + Zones?

Idea: ZFS + Zones?

• Building a sophisticated distributed system on top of ZFS
and zones, we have built Manta, an internet-facing object storage system offering in situ compute • That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute • The abstractions made available for computation are anything that can run on the OS... • ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation Manta: ZFS + Zones!

• Manta allows for an arbitrarily scalable variant of McIlroy’s
solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | \ mjob create -o -m "tr -cs A-Za-z '\n' | \ tr A-Z a-z | sort | uniq -c" -r \ "awk '{ x[\$2] += \$1 } END { for (w in x) { print x[w] \" \" w } }' | \ sort -rn | sed ${1}q" • This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream • As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way Manta: Unix for Big Data

• Eventual consistency represents the wrong CAP tradeoffs for most;
we prefer consistency over availability for writes (but still availability for reads) • Many more details: http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/ • Celebrity endorsement: Manta: CAP tradeoffs

• Hierarchical storage is an excellent idea (ht: Multics); Manta
implements proper directories, delimited with a forward slash • Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning • Manta has full support for CORS headers • Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00) • Manta SDKs exist for node.js, Java, Ruby, Python • “npm install manta” for command line interface Manta: Other design principles

• We believe compute/data convergence to be the future of
big data: stores of record must support computation as a first-class, in situ operation • We believe that Unix is a natural way of expressing this computation — and that the OS is the right level at which to virtualize to support this securely • We believe that ZFS is the only sane storage substrate underpinning for such a system • Manta will surely not be the only system to represent the confluence of these — but it is the first • We are actively retooling our software stack in terms of Manta — Manta is changing the way we develop software! Manta and the future of big data

• Product page: http://joyent.com/products/manta • node.js module: https://github.com/joyent/node-manta • Manta
documentation: http://apidocs.joyent.com/manta/ • IRC, e-mail, Twitter, etc.: #manta on freenode, [email protected], @mcavage, @dapsays, @yunongx, @joyent • Here’s to the orgy of big data one-liners! Manta: More information

Bringing the Unix Philosophy to Big Data

Bringing the Unix Philosophy to Big Data

Bryan Cantrill

More Decks by Bryan Cantrill

Featured

Transcript

Bringing the Unix Philosophy to Big Data SVP, Engineering [email protected]

Unix • When Unix appeared in the early 1970s, it

Unix: Let there be light • In 1969, Doug McIlroy

Unix: ...and there was light And the next morning we

The Unix philosophy • The pipe — coupled with the

• In 1986, Jon Bentley posed the challenge that became

Big Data: History repeats itself? • The original Google MapReduce

Big Data: Challenges • Must be able to scale storage

Scaling storage • There are essentially three protocols for scalable

Object storage • Object storage systems do not allow for

Virtualizing the operating system? • Historically — since the 1960s

• Can we combine the efﬁciency and reliability of ZFS

Idea: ZFS + Zones?

• Building a sophisticated distributed system on top of ZFS

• Manta allows for an arbitrarily scalable variant of McIlroy’s

• Eventual consistency represents the wrong CAP tradeoffs for most;

• Hierarchical storage is an excellent idea (ht: Multics); Manta

• We believe compute/data convergence to be the future of

• Product page: http://joyent.com/products/manta • node.js module: https://github.com/joyent/node-manta • Manta