Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL: The New Normal for Big Data and Beyond by Luca Olivari at Big Data Spain 2013

NoSQL: The New Normal for Big Data and Beyond by Luca Olivari at Big Data Spain 2013

What started as a way for web giants to solve problems of serious scale has become the default way all enterprises manage Big Data. Despite having a catchy, if inaccurate title, there really isn't a coherent "NoSQL" category, nor is there a simple future for the range of NoSQL databases.

Session presented at Big Data Spain 2013 Conference
7th Nov 2013
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/2013/conference/nosql-the-new-normal-for-big-data-and-beyond

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 14, 2013
Tweet

Transcript

  1. NoSQL: The New Normal for Big Data and Beyond LUCA

    OLIVARI @lucaolivari
  2. The Big Data Unknown

  3. 3 Top Big Data Challenges? Translation? Most struggle to know

    what Big Data is, how to manage it and who can manage it Source: Gartner
  4. 4 Understanding Big Data – It’s Not Very “Big” from

    Big Data Executive Summary – 50+ top executives from Government and F500 firms 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB)
  5. Innovation As Iteration

  6. “I have not failed. I've just found 10,000 ways that

    won't work.” ― Thomas A. Edison
  7. 7 Still Using These Things?

  8. 8 What about Relational Databases?

  9. 9 RDBMS Makes Development Hard Relational Database Object Relational Mapping

    Application Code XML Config DB Schema
  10. 10 And Even Harder To Iterate New Table New Table

    New Column Name Pet Phone Email New Column 3 months later…
  11. 11 RDBMS Is Expensive To Scale “Clients can also opt

    to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.” IBM Press Release 28 Aug, 2012
  12. 12 Big Data != Big Upfront Payment

  13. 13 RDBMS From Complexity to Simplicity MongoDB { _id :

    ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  14. 14 Spoiled for choice Source: http://db-engines.com/en/blog_post/22 Rank   DBMS Type

    Score Changes 1. Oracle RDBMs 1617.19 +33.35 2. MySQL RDBMs 1254.27 -77.07 3. Microsoft SQL Server RDBMs 1234.46 +27.45 4. PostgreSQL RDBMs 190.83 +13.82 5. DB2 RDBMs 165.90 -9.93 6. MongoDB Document 161.87 +12.40 7.   Microsoft Access RDBMs 41.6 -0.89 8. SQLite RDBMs 78.78 +0.90 9. Sybase RDBMs 77.75 +4.09 10. Teradata RDBMs 60.12 +5.70 DB-Engines Ranking now covers more than 200 database management systems
  15. 15 Remember the Long Tail?

  16. 16 It Didn’t Work Out So Well

  17. 17 Use Popular, Well-Known Technologies Source: Silicon Angle, 2012

  18. 18 Ask the Right Questions… “Organizations already have people who

    know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.” (Gartner, 2012)
  19. 19 Leverage Existing Skills

  20. 20 Search as a Sign?

  21. When To Use Hadoop, NoSQL

  22. 22 Enterprise Big Data Stack EDW Hadoop Management & Monitoring

    Security & Auditing RDBMS CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Online Data Offline Data
  23. 23 Consideration – Online vs. Offline •  Long-running •  High-Latency

    •  Availability is lower priority •  Real-time •  Low-latency •  High availability Online Offline vs.
  24. 24 Consideration – Online vs. Offline Online Offline vs.

  25. 25 Hadoop Is Good for… Risk Modeling Churn Analysis Recommendation

    Engine Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction Search Quality Data Lake
  26. 26 MongoDB/NoSQL Is Good for… 360° View of the Customer

    Mobile & Social Apps Fraud Detection User Data Management Content Management & Delivery Reference Data Product Catalogs Machine to Machine Apps Data Hub
  27. How To Use The Two Together?

  28. 28 Finding Waldo http://whereswaldo.com/

  29. 29 Customer example: Online Travel Travel •  Flights, hotels and

    cars •  Real-time offers •  User profiles, reviews •  User metadata (previous purchases, clicks, views) •  User segmentation •  Offer recommendation engine •  Ad serving engine •  Bundling engine Algorithms MongoDB Connector for Hadoop
  30. 30 Predictive Analytics Government •  Predictive analytics system for crime,

    health issues •  Diverse, unstructured (incl. geospatial) data from 30+ agencies •  Correlate data in real-time •  Long-form trend analysis •  MongoDB data dumped into Hadoop, analyzed, re-inserted into MongoDB for better real- time response Algorithms MongoDB + Hadoop
  31. 31 Data Hub Insurance •  Insurance policies •  Demographic data

    •  Customer web data •  Call center data •  Real-time churn detection •  Customer action analysis •  Churn prediction algorithms Churn Analysis MongoDB Connector for Hadoop
  32. 32 Machine Learning Ad-Serving •  Catalogs and products •  User

    profiles •  Clicks •  Views •  Transactions •  User segmentation •  Recommendation engine •  Prediction engine Algorithms MongoDB Connector for Hadoop
  33. 33 •  Makes MongoDB a Hadoop-enabled file system •  Read

    and write to live data, in-place •  Copy data between Hadoop and MongoDB •  Full support for data processing –  Hive –  MapReduce –  Pig –  Streaming –  EMR MongoDB + Hadoop Connector MongoDB Connector for Hadoop
  34. None