allows for the distributed processing of large data sets across clusters of computers using simple programming models. • Terabyte and Petabtye datasets • Data warehousing • Advanced analytics
Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predic/ve Analy/cs Ad Targe/ng Sen/ment Analysis
Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predic/ve Analy/cs Ad Targe/ng Sen/ment Analysis
Catalogs Security & Fraud Internet of Things Mobile Apps Customer Data Mgmt Single View Social Churn Analysis Recommender Warehouse & ETL Risk Modeling Trade Surveillance Predic/ve Analy/cs Ad Targe/ng Sen/ment Analysis
file-system that stores data on commodity machines in a Hadoop cluster • YARN – Resource management platform responsible for managing and scheduling compute resources in a Hadoop cluster
union, distinct, sample, intersection, and more – foreach, count, collect, take, and many more Spark is a fast and powerful engine for processing Hadoop data. It is designed to perform both general data processing (similar to MapReduce) and new workloads like streaming, interac?ve queries, and machine learning.
Output formats • Computes splits to read data • Support for – Filtering data with MongoDB queries – Authentication – Reading directly from shard Primaries – ReadPreferences and Replica Set tags – Appending to existing collections
age int) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES("mongo.columns.mapping”="_id,name,age”) TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”) • Access collections as Hive tables • Use with MongoStorageHandler or BSONStorageHandler
into Spark app Train model from existing ratings Create user-movie pairings Predict ratings for all pairings Write predictions to MongoDB collection Web application exposes recommendations Repeat the process weekly MovieWeb Workflow
Started with MongoDB and Hadoop – http://docs.mongodb.org/ecosystem/tutorial/getting- started-with-hadoop/ • MongoDB-Spark Demo – http://github.com/crcsmnky/mongodb-spark-demo