MongoDB + Hadoop: Taming the Elephant in the Room

Brendan McAdams 10gen, Inc. [email protected] @rit Taming The Elephant In
The Room with MongoDB + Hadoop Integration

Big Data at a Glance • Big Data can be
gigabytes, terabytes, petabytes or exabytes • An ideal big data system scales up and down around various data sizes – while providing a uniform view • Major concerns • Can I read & write this data efficiently at different scale? • Can I run calculations on large portions of this data? Large Dataset Primary Key as “username”

Storing & Scaling Big Data MongoDB and Hadoop

Big Data at a Glance • Systems like Google File
System (which inspired Hadoop’s HDFS) and MongoDB’s Sharding handle the scale problem by chunking • Break up pieces of data into smaller chunks, spread across many data nodes • Each data node contains many chunks • If a chunk gets too large or a node overloaded, data can be rebalanced Large Dataset Primary Key as “username” a b c d e f g h s t u v w x y z ...

Big Data at a Glance Large Dataset Primary Key as
“username” a b c d e f g h s t u v w x y z

Big Data at a Glance Large Dataset Primary Key as
“username” a b c d e f g h s t u v w x y z MongoDB Sharding ( as well as HDFS ) breaks data into chunks (~64 mb)

Large Dataset Primary Key as “username” Scaling Data Node 1
25% of chunks Data Node 2 25% of chunks Data Node 3 25% of chunks Data Node 4 25% of chunks a b c d e f g h s t u v w x y z Representing data as chunks allows many levels of scale across n data nodes

Scaling Data Node 1 Data Node 2 Data Node 3
Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The set of chunks can be evenly distributed across n data nodes

Add Nodes: Chunk Rebalancing Data Node 1 Data Node 2
Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z The goal is equilibrium - an equal distribution. As nodes are added (or even removed) chunks can be redistributed for balance.

Writes Routed to Appropriate Chunk Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z

Writes Routed to Appropriate Chunk Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z Writes are eﬃciently routed to the appropriate node & chunk

Chunk Splitting & Balancing Data Node 1 Data Node 2
Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z Write to key“ziggy” z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 If a chunk gets too large (default in MongoDB - 64mb per chunk), It is split into two new chunks z1

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 Each new part of the Z chunk (left & right) now contains half of the keys

Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z2 z1 As chunks continue to grow and split, they can be rebalanced to keep an equal share of data on each server.

Reads with Key Routed Efficiently Data Node 1 Data Node
2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z1 Read Key “xavier” Reading a single value by Primary Key Read routed efficiently to specific chunk containing key z2

2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Key “xavier” Reading a single value by Primary Key Read routed eﬃciently to speciﬁc chunk containing key z1 z2

2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y Read Keys “T”->”X” Reading multiple values by Primary Key Reads routed eﬃciently to speciﬁc chunks in range t u v w x z1 z2

Processing Big Data MongoDB and Hadoop

Processing Scalable Big Data •Just as we must be able
to scale our storage of data (from gigabytes through exabytes and beyond), we must be able to process it. • We had two questions, one of which we’ve answered... • Can I read & write this data efficiently at different scale? • Can I run calculations on large portions of this data?

Processing Scalable Big Data • The answer to calculating big
data is much the same as storing it • We need to break our data into bite sized pieces • Build functions which can be composed together repeatedly on partitions of our data • Process portions of the data across multiple calculation nodes • Aggregate the results into a final set of results

Processing Scalable Big Data • These pieces are not chunks
– rather, the individual data points that make up each chunk • Chunks make up a useful data transfer units for processing as well • Transfer Chunks as “Input Splits” to calculation nodes, allowing for scalable parallel processing • The most common application of these techniques is MapReduce • Based on a Google Whitepaper, works with two primary functions – map and reduce – to calculate against large datasets

MapReduce to Calculate Big Data • MapReduce is designed to
effectively process data at varying scales • Composable function units can be reused repeatedly for scaled results • MongoDB Supports MapReduce with JavaScript • Limitations on its scalability • In addition to the HDFS storage component, Hadoop is built around MapReduce for calculation • MongoDB can be integrated to MapReduce data on Hadoop • No HDFS storage needed - data moves directly between MongoDB and Hadoop’s MapReduce engine

MapReduce to Calculate Big Data • MapReduce made up of
a series of phases, the primary of which are • Map • Shuffle • Reduce • Let’s look at a typical MapReduce job • Email records • Count # of times a particular user has received email

MapReducing Email to: tyler from: brendan subject: Ruby Support to:
brendan from: tyler subject: Re: Ruby Support to: mike from: brendan subject: Node Support to: brendan from: mike subject: Re: Node Support to: mike from: tyler subject: COBOL Support to: tyler from: mike subject: Re: COBOL Support (WTF?)

Map Step to: tyler from: brendan subject: Ruby Support to:
brendan from: tyler subject: Re: Ruby Support to: mike from: brendan subject: Node Support to: brendan from: mike subject: Re: Node Support to: mike from: tyler subject: COBOL Support to: tyler from: mike subject: Re: COBOL Support (WTF?) key: tyler value: {count: 1} key: brendan value: {count: 1} key: mike value: {count: 1} key: brendan value: {count: 1} key: mike value: {count: 1} key: tyler value: {count: 1} map function emit(k, v) map function breaks each document into a key (grouping) & value

Group/Shuffle Step key: tyler value: {count: 1} key: brendan value:
{count: 1} key: mike value: {count: 1} key: brendan value: {count: 1} key: mike value: {count: 1} key: tyler value: {count: 1} Group like keys together, creating an array of their distinct values (Automatically done by M/R frameworks)

Group/Shuffle Step key: brendan values: [{count: 1}, {count: 1}] key:
mike values: [{count: 1}, {count: 1}] key: tyler values: [{count: 1}, {count: 1}] Group like keys together, creating an array of their distinct values (Automatically done by M/R frameworks)

Reduce Step key: brendan values: [{count: 1}, {count: 1}] key:
mike values: [{count: 1}, {count: 1}] key: tyler values: [{count: 1}, {count: 1}] For each key reduce function flattens the list of values to a single result reduce function aggregate values return (result) key: tyler value: {count: 2} key: mike value: {count: 2} key: brendan value: {count: 2}

Processing Scalable Big Data • MapReduce provides an effective system
for calculating and processing our large datasets (from gigabytes through exabytes and beyond) • MapReduce is supported in many places including MongoDB & Hadoop • We have effective answers for both of our concerns. • Can I read & write this data efficiently at different scale? • Can I run calculations on large portions of this data?

Integrating MongoDB + Hadoop

Separation of Concern • Data storage and data processing are
often separate concerns • MongoDB has limited ability to aggregate and process large datasets (JavaScript parallelism - alleviated some with New Aggregation Framework) • Hadoop is built for scalable processing of large datasets

MapReducing in MongoDB - Single Server Large Dataset (single mongod)
Primary Key as “username” Only one MapReduce thread available

MapReducing in MongoDB - Sharding One MapReduce thread per shard
(no per-chunk parallelism) Data Node 1 Data Node 2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z One MapReduce thread per shard (no per-chunk parallelism) One MapReduce thread per shard (no per-chunk parallelism) One MapReduce thread per shard (no per-chunk parallelism) One MapReduce thread per shard (no per-chunk parallelism) ... Architecturally, the number of processing nodes is limited to our number of data storage nodes.

The Right Tool for the Job • JavaScript isn’t always
the most ideal language for many types of calculations • Slow • Limited datatypes • No access to complex analytics libraries available on the JVM • Rich, powerful ecosystem of tools on the JVM + Hadoop • Hadoop has machine learning, ETL, and many other tools which are much more flexible than the processing tools in MongoDB

Being a Good Neighbor • Integration with Customers’ Existing Stacks
& Toolchains is Crucial • Many users & customers already have Hadoop in their stacks • They want us to “play nicely” with their existing toolchains • Different groups in companies may mandate all data be processable in Hadoop

Capabilities

Introducing the MongoDB Hadoop Connector • Recently, we released v1.0.0
of this Integration: The MongoDB Hadoop Connector • Read/Write between MongoDB + Hadoop (Core MapReduce) in Java • Write Pig (ETL) jobs’ output to MongoDB • Write MapReduce jobs in Python via Hadoop Streaming • Collect massive amounts of Logging output into MongoDB via Flume

Hadoop Connector Capabilities • Split large datasets into smaller chunks
(“Input Splits”) for parallel Hadoop processing • Without splits, only one mapper can run • Connector can split both sharded & unsharded collections • Sharded: Read individual chunks from config server into Hadoop • Unsharded: Create splits, similar to how sharding chunks are calculated

MapReducing MongoDB + Hadoop - Single Server Large Dataset (single
mongod) Primary Key as “username” Each Hadoop node runs a processing task per core a b c d e f g h s t u v w x y z Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core

MapReducing MongoDB + Hadoop - Sharding Data Node 1 Data
Node 2 Data Node 3 Data Node 4 Data Node 5 a b c d e f g h s t u v w x y z z Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core Each Hadoop node runs a processing task per core

Parallel Processing of Splits • Ship “Splits” to Mappers as
hostname, database, collection, & query • Each Mapper reads the relevant documents in • Parallel processing for high performance • Speaks BSON between all layers!

MongoDB Hadoop Connector In Action

Python Streaming •The Hadoop Streaming interface is much easier to
demo (it’s also my favorite feature, and was the hardest to implement) • Java gets a bit ... “verbose” on slides versus Python • Java Hadoop + MongoDB integrates cleanly though for those inclined • Map functions get an initial key of type Object and value of type BSONObject • Represent _id and the full document, respectively •Processing 1.75 gigabytes of the Enron Email Corpus (501,513 emails) • I ran this test on a 6 node Hadoop cluster • Grab your own copy of this dataset at: http://goo.gl/fSleC

A Sample Input Doc { "_id" : ObjectId("4f2ad4c4d1e2d3f15a000000"), "body" :
"Here is our forecast\n\n ", "subFolder" : "allen-p/_sent_mail", "mailbox" : "maildir", "filename" : "1.", "headers" : { "X-cc" : "", "From" : "[email protected]", "Subject" : "", "X-Folder" : "\\Phillip_Allen_Jan2002_1\\Allen, Phillip K.\\'Sent Mail", "Content-Transfer-Encoding" : "7bit", "X-bcc" : "", "To" : "[email protected]", "X-Origin" : "Allen-P", "X-FileName" : "pallen (Non-Privileged).pst", "X-From" : "Phillip K Allen", "Date" : "Mon, 14 May 2001 16:39:00 -0700 (PDT)", "X-To" : "Tim Belden ", "Message-ID" : "<18782981.1075855378110.JavaMail.evans@thyme>", "Content-Type" : "text/plain; charset=us-ascii", "Mime-Version" : "1.0" } }

Setting up Hadoop Streaming •Install the Python support module on
each Hadoop Node: •Build (or download) the Streaming module for the Hadoop adapter: $ sudo pip install pymongo_hadoop $ git clone http://github.com/mongodb/mongo-hadoop.git $ ./sbt mongo-hadoop-streaming/assembly

Mapper Code #!/usr/bin/env python import sys sys.path.append(".") from pymongo_hadoop import
BSONMapper def mapper(documents): i = 0 for doc in documents: i = i + 1 if 'headers' in doc and 'To' in doc['headers'] and 'From' in doc['headers']: from_field = doc['headers']['From'] to_field = doc['headers']['To'] recips = [x.strip() for x in to_field.split(',')] for r in recips: yield {'_id': {'f':from_field, 't':r}, 'count': 1} BSONMapper(mapper) print >> sys.stderr, "Done Mapping."

Reducer Code #!/usr/bin/env python import sys sys.path.append(".") from pymongo_hadoop import
BSONReducer def reducer(key, values): print >> sys.stderr, "Processing from/to %s" % str(key) _count = 0 for v in values: _count += v['count'] return {'_id': key, 'count': _count} BSONReducer(reducer)

Running the MapReduce hadoop jar mongo-hadoop-streaming-assembly-1.0.0.jar -mapper /home/ec2-user/enron_map.py -reducer /home/ec2-user/enron_reduce.py
-inputURI mongodb://test_mongodb:27020/enron_mail.messages -outputURI mongodb://test_mongodb:27020/enron_mail.sender_map

Results! mongos> db.streaming.output.find({"_id.t": /^kenneth.lay/}) { "_id" : { "t" :
"[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 2 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 2 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 4 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 2 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 2 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 3 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 3 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 4 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 6 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 3 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 2 } { "_id" : { "t" : "[email protected]", "f" : "[email protected]" }, "count" : 1 } has more

Parallelism is Good The Input Data was split into 44
pieces for parallel processing... ... coincidentally, there were exactly 44 chunks on my sharded setup. Even with an unsharded collection, MongoHadoop can calculate splits!

We aren’t restricted to Python •For Mongo-Hadoop 1.0, Streaming only
shipped Python support •Currently in git master, and due to be released with 1.1 is support for two additional languages • Ruby (Tyler Brock - @tylerbrock) • Node.JS (Mike O’Brien - @mpobrien) •The same Enron MapReduce job can be accomplished with either of these languages as well

Ruby + Mongo-Hadoop Streaming •As there isn’t an official release
for Ruby support yet, you’ll need to build the gem by hand out of git •Like with Python, make sure you install this gem on each of your Hadoop nodes •Once the gem is built & installed, you’ll have access to the mongo- hadoop module from Ruby

Enron Map from Ruby #!/usr/bin/env ruby require 'mongo-hadoop' MongoHadoop.map do
|document| if document.has_key?('headers') headers = document['headers'] if ['To', 'From'].all? { |header| headers.has_key? (header) } to_field = headers['To'] from_field = headers['From'] recipients = to_field.split(',').map { |recipient| recipient.strip } recipients.map { |recipient| {:_id => {:f => from_field, :t => recipient}, :count => 1} } end end end

Enron Reduce from Ruby #!/usr/bin/env ruby require 'mongo-hadoop' MongoHadoop.reduce do
|key, values| count = values.reduce { |sum, current| sum += current['count'] } { :_id => key, :count => count } end

Running the Ruby MapReduce hadoop jar mongo-hadoop-streaming-assembly-1.0.0.jar -mapper examples/enron/enron_map.rb -reducer
examples/enron/enron_reduce.rb -inputURI mongodb://127.0.0.1/enron_mail.messages -outputURI mongodb://127.0.0.1/enron_mail.output

Node.JS + Mongo-Hadoop Streaming •As there isn’t an official release
for Node.JS support yet, you’ll need to build the Node module by hand out of git •Like with Python, make sure you install this gem on each of your Hadoop nodes •Once the gem is built & installed, you’ll have access to the node_mongo_hadoop module from Node.JS

Enron Map from Node.JS #!/usr/bin/env node var node_mongo_hadoop = require('node_mongo_hadoop')
var trimString = function(str){ return String(str).replace(/^\s+|\s+$/g, ''); } function mapFunc(doc, callback){ if(doc.headers && doc.headers.From && doc.headers.To){ var from_field = doc['headers']['From'] var to_field = doc['headers']['To'] var recips = [] to_field.split(',').forEach(function(to){ callback( {'_id': {'f':from_field, 't':trimString(to)}, 'count': 1} ) }); } } node_mongo_hadoop.MapBSONStream(mapFunc);

Enron Reduce from Node.JS #!/usr/bin/env node var node_mongo_hadoop = require('node_mongo_hadoop')
function reduceFunc(key, values, callback){ var count = 0; values.forEach(function(v){ count += v.count }); callback( {'_id':key, 'count':count } ); } node_mongo_hadoop.ReduceBSONStream(reduceFunc);

Running the Node.JS MapReduce hadoop jar mongo-hadoop-streaming-assembly-1.0.0.jar -mapper examples/enron/enron_map.rb -reducer
examples/enron/enron_reduce.rb -inputURI mongodb://127.0.0.1/enron_mail.messages -outputURI mongodb://127.0.0.1/enron_mail.output

[Joining the Hive mind]

Hive + MongoDB • Over the past weekend, I ended
up with a few spare hours and started playing with a frequently requested feature • Hive is a Hadoop based Data Warehousing system, providing a SQL- like language (dubbed “QL”) • Designed for large datasets stored on HDFS • Lots of SQL-like facilities such as data summarization, aggregation and analysis; all compile down to Hadoop MapReduce tasks • Custom User Defined functions can even replace inefficient Hive queries with raw MapReduce • Many users have requested support for this with MongoDB Data

Sticking BSON in the Hive • Step 1 involved teaching
Hive to read MongoDB Backup files - essentially, raw BSON • While there are some APIs that we can use to talk directly to MongoDB, we haven’t explored that yet • With this code, it is possible to load a .bson file (typically produced by mongodump) directly into Hive and query it • No conversion needed to a “native” Hive format - BSON is read directly • Still needs some polish and tweaking, but this is now slated to be included in the upcoming 1.1 release

Loading BSON into Hive • As Hive emulates a Relational
Database, tables need schema ( evaluating ways to ‘infer’ schema to make this more automatic ) • Let’s load some MongoDB collections into Hive and play with the data!

Loading BSON into Hive

Loading BSON into Hive • We have BSON Files to
load, now we need to instruct Hive about their Schemas ...

Loading BSON into Hive

Deﬁning Hive Schemas • We’ve given some instructions to Hive
about the structure as well as storage of our MongoDB files. Let’s look at “scores” close CREATE TABLE scores ( student int, name string, score int ) ROW FORMAT SERDE "com.mongodb.hadoop.hive.BSONSerde" STORED AS INPUTFORMAT "com.mongodb.hadoop.hive.input.BSONFileInputFormat" OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION "/Users/brendan/code/mongodb/mongo-hadoop/hive/demo/meta/scores"; • Our first line defines the structure - with a column for ‘student’, ‘name’ and ‘score’, each having a SQL-like datatype. • ROW FORMAT SERDE instructs Hive to use a Serde of ‘BSONSerde’ • In Hive, a Serde is a special codec that explains how to read and write (serialize and deserialize) a custom data format containing Hive rows • We also need to tell Hive to use an INPUTFORMAT of ‘BSONFileInputFormat’ which tells it how to Read BSON files off of disk into individual rows (the Serde is instructions for how to turn individual lines of BSON into a Hive friendly format) • Finally, we specify where Hive should store the metadata, etc with LOCATION

Loading Data to Hive •Finally, we need to load data
into the Hive table, from our raw BSON file • Now we can query! hive> LOAD DATA LOCAL INPATH "dump/training/scores.bson" INTO TABLE scores;

Querying Hive

Querying Hive • Most standard SQL-like queries work - though
I’m not going to enumerate the ins and outs of HiveQL today • What we can do with Hive that we can’t with MongoDB ... is joins • In addition to the scores data, I also created a collection of student ids and randomly generated names. Let’s look at joining these to our scores in Hive

Joins from BSON + Hive

Joins from BSON + Hive hive> SELECT u.firstName, u.lastName, u.sex,
s.name, s.score FROM > scores s JOIN students u ON u.studentID = s.student > ORDER BY s.score DESC; DELPHIA DOUIN Female exam 99 DOMINIQUE SUAZO Male essay 99 ETTIE BETZIG Female exam 99 ADOLFO PIRONE Male exam 99 IVORY NETTERS Male essay 99 RAFAEL HURLES Male essay 99 KRISTEN VALLERO Female exam 99 CONNIE KNAPPER Female quiz 99 JEANNA DIVELY Female exam 99 TRISTAN SEGAL Male exam 99 WILTON TRULOVE Male essay 99 THAO OTSMAN Female essay 99 CLARENCE STITZ Male quiz 99 LUIS GUAMAN Male exam 99 WILLARD RUSSAK Male quiz 99 MARCOS HOELLER Male quiz 99 TED BOTTCHER Male essay 99 LAKEISHA NAGAMINE Female essay 99 ALLEN HITT Male exam 99 MADELINE DAWKINS Female essay 99

This is just the beginning...

Looking Forward • Mongo Hadoop Connector 1.0.0 is released and
available • Docs: http://api.mongodb.org/hadoop/ • Downloads & Code: http://github.com/mongodb/mongo-hadoop

Looking Forward •Lots More Coming; 1.1.0 expected in Summer 2012
• Support for reading from Multiple Input Collections (“MultiMongo”) • Static BSON Support... Read from and Write to Mongo Backup files! • S3 / HDFS stored, mongodump format • Great for big offline batch jobs (this is how Foursquare does it) • Pig input (Read from MongoDB into Pig) • Performance improvements (e.g. pipelining BSON for streaming) • Future: Expanded Ecosystem support (Cascading, Oozie, Mahout, etc)

Looking Forward • We are looking to grow our integration
with Big Data • Not only Hadoop, but other data processing systems our users want such as Storm, Disco and Spark. • Initial Disco support (Nokia’s Python MapReduce framework) is almost complete; look for it this summer • If you have other data processing toolchains you’d like to see integration with, let us know!

http://linkd.in/joinmongo @mongodb facebook.com/mongodb Did I Mention We’re Hiring? http://www.10gen.com/careers (
Jobs of all sorts, all over the world! ) [Download the Hadoop Connector] http://github.com/mongodb/mongo-hadoop [Docs] http://api.mongodb/ *Contact Me* [email protected] (twitter: @rit)

MongoDB + Hadoop: Taming the Elephant in the Room

MongoDB + Hadoop: Taming the Elephant in the Room

More Decks by mongodb

Featured

Transcript