Slide 1

Slide 1 text

NoSQL: The New Normal for Big Data and Beyond LUCA OLIVARI @lucaolivari

Slide 2

Slide 2 text

The Big Data Unknown

Slide 3

Slide 3 text

3 Top Big Data Challenges? Translation? Most struggle to know what Big Data is, how to manage it and who can manage it Source: Gartner

Slide 4

Slide 4 text

4 Understanding Big Data – It’s Not Very “Big” from Big Data Executive Summary – 50+ top executives from Government and F500 firms 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB)

Slide 5

Slide 5 text

Innovation As Iteration

Slide 6

Slide 6 text

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

Slide 7

Slide 7 text

7 Still Using These Things?

Slide 8

Slide 8 text

8 What about Relational Databases?

Slide 9

Slide 9 text

9 RDBMS Makes Development Hard Relational Database Object Relational Mapping Application Code XML Config DB Schema

Slide 10

Slide 10 text

10 And Even Harder To Iterate New Table New Table New Column Name Pet Phone Email New Column 3 months later…

Slide 11

Slide 11 text

11 RDBMS Is Expensive To Scale “Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.” IBM Press Release 28 Aug, 2012

Slide 12

Slide 12 text

12 Big Data != Big Upfront Payment

Slide 13

Slide 13 text

13 RDBMS From Complexity to Simplicity MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }

Slide 14

Slide 14 text

14 Spoiled for choice Source: http://db-engines.com/en/blog_post/22 Rank   DBMS Type Score Changes 1. Oracle RDBMs 1617.19 +33.35 2. MySQL RDBMs 1254.27 -77.07 3. Microsoft SQL Server RDBMs 1234.46 +27.45 4. PostgreSQL RDBMs 190.83 +13.82 5. DB2 RDBMs 165.90 -9.93 6. MongoDB Document 161.87 +12.40 7.   Microsoft Access RDBMs 41.6 -0.89 8. SQLite RDBMs 78.78 +0.90 9. Sybase RDBMs 77.75 +4.09 10. Teradata RDBMs 60.12 +5.70 DB-Engines Ranking now covers more than 200 database management systems

Slide 15

Slide 15 text

15 Remember the Long Tail?

Slide 16

Slide 16 text

16 It Didn’t Work Out So Well

Slide 17

Slide 17 text

17 Use Popular, Well-Known Technologies Source: Silicon Angle, 2012

Slide 18

Slide 18 text

18 Ask the Right Questions… “Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.” (Gartner, 2012)

Slide 19

Slide 19 text

19 Leverage Existing Skills

Slide 20

Slide 20 text

20 Search as a Sign?

Slide 21

Slide 21 text

When To Use Hadoop, NoSQL

Slide 22

Slide 22 text

22 Enterprise Big Data Stack EDW Hadoop Management & Monitoring Security & Auditing RDBMS CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Online Data Offline Data

Slide 23

Slide 23 text

23 Consideration – Online vs. Offline •  Long-running •  High-Latency •  Availability is lower priority •  Real-time •  Low-latency •  High availability Online Offline vs.

Slide 24

Slide 24 text

24 Consideration – Online vs. Offline Online Offline vs.

Slide 25

Slide 25 text

25 Hadoop Is Good for… Risk Modeling Churn Analysis Recommendation Engine Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction Search Quality Data Lake

Slide 26

Slide 26 text

26 MongoDB/NoSQL Is Good for… 360° View of the Customer Mobile & Social Apps Fraud Detection User Data Management Content Management & Delivery Reference Data Product Catalogs Machine to Machine Apps Data Hub

Slide 27

Slide 27 text

How To Use The Two Together?

Slide 28

Slide 28 text

28 Finding Waldo http://whereswaldo.com/

Slide 29

Slide 29 text

29 Customer example: Online Travel Travel •  Flights, hotels and cars •  Real-time offers •  User profiles, reviews •  User metadata (previous purchases, clicks, views) •  User segmentation •  Offer recommendation engine •  Ad serving engine •  Bundling engine Algorithms MongoDB Connector for Hadoop

Slide 30

Slide 30 text

30 Predictive Analytics Government •  Predictive analytics system for crime, health issues •  Diverse, unstructured (incl. geospatial) data from 30+ agencies •  Correlate data in real-time •  Long-form trend analysis •  MongoDB data dumped into Hadoop, analyzed, re-inserted into MongoDB for better real- time response Algorithms MongoDB + Hadoop

Slide 31

Slide 31 text

31 Data Hub Insurance •  Insurance policies •  Demographic data •  Customer web data •  Call center data •  Real-time churn detection •  Customer action analysis •  Churn prediction algorithms Churn Analysis MongoDB Connector for Hadoop

Slide 32

Slide 32 text

32 Machine Learning Ad-Serving •  Catalogs and products •  User profiles •  Clicks •  Views •  Transactions •  User segmentation •  Recommendation engine •  Prediction engine Algorithms MongoDB Connector for Hadoop

Slide 33

Slide 33 text

33 •  Makes MongoDB a Hadoop-enabled file system •  Read and write to live data, in-place •  Copy data between Hadoop and MongoDB •  Full support for data processing –  Hive –  MapReduce –  Pig –  Streaming –  EMR MongoDB + Hadoop Connector MongoDB Connector for Hadoop

Slide 34

Slide 34 text

No content