Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to NoSQL with MongoDB - SQLi

Introduction to NoSQL with MongoDB - SQLi

Generic presentation done with SQLi Lyon to introduce NoSQL.

Tugdual Grall

March 05, 2015
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. Tugdual “Tug” Grall • MongoDB – Technical Evangelist • Couchbase

    – Technical Evangelist • eXo – CTO • Oracle – Developer/Product Manager – Mainly Java/SOA • Developer in consulting firms {“about” : “me”} • Web – @tgrall – http://blog.grallandco.com – tgrall • NantesJUG cofounder
 • Pet Project – http://www.resultri.com
 • [email protected][email protected]
  2. Living in the Post-transactional Future Order-processing systems largely “done” (RDBMS);

    new focus on better search and recommendations or adapting prices on the fly (NoSQL) Vast majority of its engineering is focused on recommending better movies (NoSQL), not processing monthly bills (RDBMS) Easy part is processing the credit card (RDBMS). Hard part is making it location aware, so it knows where you are and what you’re buying (NoSQL)
  3. Stay up! .  .  . Application Scale out Add more

    “Web” servers RDBMS Scale Up Get bigger server RDBMS App  Server
  4. NoSQL to Scale out! .  .  . Application Scale out

    Add more “Web” servers NoSQL Scale Out Add more servers NoSQL App  Server .  .  .
  5. And makes things hard to change Name Age Phone Email

    New Column New Table New Table New Column
  6. Relational Database Challenges Data Types Unstructured data Semi-structured data Polymorphic

    data Agile Development Iterative Short development cycles New workloads Volume of Data Petabytes of data Trillions of records Millions of queries/sec New Architectures Horizontal scaling Commodity servers Cloud computing
  7. Baseball Bat -3 length to weight ratio 2-5/8" barrel diameter

    Two-piece construction R2 alloy barrel provides outstanding durability, performance and "pop" R2 composite handle shifts weight into the bat's knob for ultra-fast swing speeds Rifle Barrel design removes weight from the barrel for thinner wall thickness Acoustic barrel offers that sweet-sounding "ping" Contact grip helps eliminate sting and vibration AIR Elite is RIP-IT's® fastest BBCOR bat and the one with the most performance BBCOR certified - approved for high school and collegiate play Includes RIP-IT's® "Love It Or Return It" 30 Day Refund Policy with free return shipping Manufacturer's warranty: 400 days Made in the USA Model: B1403E
  8. Bat Product Table Category Model Name Brand Length to weight

    ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AZ3000 aluminum AZ3000 aluminum BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99
  9. Lets Add Gloves Size: 12" Infield/Outfield/Pitcher model 2-Piece Web pattern

    Most popular MLB® pattern among pitchers Pro Stock® American steerhide leather offers rugged durability and a superior feel Dual-Welting™ on "exposed edges" of the fingers helps maintain pocket shape and durability Pro Stock™ hand-designed pattern for unbeatable craftsmanship Dri-Lex® ultra-breathable wrist lining repels moisture from your hand Black leather with rich brown embellishments Pattern: B212 Model: WTA2000BBB212 Wilson
  10. Bat and Glove Product Table Category Model Name Brand Length

    to weight ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AL AL BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99 Category Model Name Brand Size Position Pattern Web Pattern Material Color Country Price Glove WTA2000B BB212 A2000 Wilson 12" Infield B212 2-piece Leather black Vietnam $299.99 Glove PRO112PT HOH Pro Rawlings 11.25" Outfield Pro taper Modified Trap-Eze Horween Leather black China $229.99
  11. Add some baseballs Cover: Full grain leather for excellent durability

    Core: Cushioned cork core Additions/Technologies: Made to the exact specifications of MLB Stitching/Seams: 108 classic red stitches/Rawlings® Major League seaming League/Certification(s): MLB Balls included per purchase: individual Recommended Age: All ages Model : ROMLB Rawlings
  12. Bat and Glove Product Table Category Model Name Brand Length

    to weight ratio Barrel Dia Type Barrel Handle Cert. Country Price Bat B1403E Air Elite RIP-IT -3 2 5/8 Composite R2 Alloy R2 composite BBCOR USA $399.99 Bat B1403 Prototype RIP-IT -3 2 5/8 One-piece R1 Alloy R1 Alloy BBCOR USA $199.99 Bat MCB1B One Marucci -3 2 5/8 One-piece AL AL BBCOR Imported $199.99 Bat BB14S1 S1 Easton -3 2 5/8 Composite IMX SIC Black Carbon BBCOR China $399.99 Category Model Name Brand Size Position Pattern Web Pattern Material Color Country Price Glove WTA2000B BB212 A2000 Wilson 12" Infield B212 2-piece Leather black Vietnam $299.99 Glove PRO112PT HOH Pro Rawlings 11.25" Outfield Pro taper Modified Trap-Eze Horween Leather black China $229.99 Category Model Name Brand Color Cover Core Cert. Country Price Baseball DICRLLB1 PBG Little League Rawlings white Leather Cork
 rubber Little League China $4.99 Baseball ROML MLB Rawlings white Leather cork China $6.99
  13. Sparse Table Category Model Name Brand Length to weight ratio

    Barrel Dia Type Barrel Handle Certificati on Country Price Size Position Pattern Web Pattern Material Color Cover Core Bat B1403E Air  Elite RIP-­‐IT -­‐3 2  5/8 Composite R2  Alloy R2   composite BBCOR USA $399.99   Bat B1403 Prototype RIP-­‐IT -­‐3 2  5/8 One-­‐piece R1  Alloy R1  Alloy BBCOR USA $199.99   Bat MCB1B One Marucci -­‐3 2  5/8 One-­‐piece AZ3000   aluminum AZ3000   aluminum BBCOR Imported $199.99   Bat BB14S1 S1 Easton -­‐3 2  5/8 Composite IMX SIC  Black   Carbon BBCOR China $399.99   Glove WTA2000BB B212 A2000 Wilson Vietnam $299.99   12" Infield B212 2-­‐piece Leather black Glove PRO112PT HOH  Pro Rawlings China $229.99   11.25" Outfield Pro  taper Modified   Trap-­‐Eze Horween   Leather black Baseball DICRLLB1PB G Little  League Rawlings Little  League China $4.99   white Leather cork  and   rubber Baseball ROML MLB Rawlings China $6.99   white Leather cork Continue adding columns as you add new products
  14. Maybe this design will work better prodID property value 1

    length/weight -3 1 barrel dia 2 5/8 1 type composite 1 certification BBCOR … 5 size 12 5 position infield 5 pattern B212 5 material leather 5 color black … 8 color white 8 cover leather 8 core cork prodID Category Model Name Brand Country Price 1 Bat B1403E Air Elite RIP-IT USA $399.99 2 Bat B1403 Prototype RIP-IT USA $199.99 3 Bat MCB1B One Marucci Imported $199.99 4 Bat BB14S1 S1 Easton China $399.99 5 Glove WTA2000BBB 212 A2000 Wilson Vietnam $299.99 6 Glove PRO112PT HOH Pro Rawlings China $229.99 7 Baseball DICRLLB1PBG Little League Rawlings China $4.99 8 Baseball ROML MLB Rawlings China $6.99
  15. MongoDB uses “Documents” { category: “glove”, model: “PRO112PT”, name: “Air

    Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”) } Fields Values Field values are typed string number date
  16. Documents are rich structures { category: “glove”, model: “PRO112PT”, name:

    “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”] } Fields  can  contain  arrays
  17. Documents are rich structures { category: “glove”, model: “PRO112PT”, name:

    “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, } Fields can contain sub- documents
  18. Documents are rich structures { category: “glove”, model: “PRO112PT”, name:

    “Air Elite”, brand: “Rawlings”, price: 229.99, available: Date(“2013-03-31”), position: [“infield”, “outfield”, “pitcher”], endorsed: {name: “Ryan Howard”, team: “Phillies”, position: “first base”}, history: [{date: Date(“2013-03-31”), price: 279.99}, {date: Date(“2013-06-01”), price: 259.79}, {date: Date(“2013-08-15”), price: 229.99}] } Fields can contain an array of sub-documents
  19. Variation is easy with document model { category: bat, model:

    B1403E, name: Air Elite, brand: “Rip-IT”, price: 399.99 diameter: “2 5/8”, barrel: R2 Alloy, handle: R2 Composite, type: composite, } { category: glove, model: PRO112PT, name: Air Elite, brand: “Rawlings”, price: “229.99” size: 11.25, position: outfield, pattern: “Pro taper”, material: leather, color: black } { category: ball, model: ROML, name: MLB, brand: “Rawlings”, price: “6.99” cover: leather, core: cork, color: white }
  20. { "_id" : 45218468309, "date" : ISODate("2015-01-28T09:40:50.615Z"), "customer" : {

    "id" : 654321, "name" : "John Doe" }, "ship_to" : { "name" : "John Doe", "street" : “Rue du Code", "city" : “69000 Lyon", }, "items" : [ { "sku" : "WA34R", "description" : "Wireless Qwerty Keyboard", "quantity" : 1, "unit_price" : 41.5, "price" : 41.5, "vat" : 20 }, { "sku" : "MW003", "description" : "MiWatch", "quantity" : 2, "unit_price" : 245, "price" : 490, "vat" : 20 } ], "price" : { "total" : 531.5 , "vat" : 106.3 } } Document Data Model Relational MongoDB
  21. Document Data Model Relational MongoDB {   first_name: ‘Paul’,  

    surname: ‘Miller’,   city: ‘London’,   location: [45.123,47.232],   cars: [   { model: ‘Bentley’,   year: 1973,   value: 100000, … },   { model: ‘Rolls Royce’,   year: 1965,   value: 330000, … }   }   }
  22. Document Model Benefits Agility and flexibility Data model supports business

    change Rapidly iterate to meet new requirements Intuitive, natural data representation Eliminates ORM layer Developers are more productive Reduces the need for joins, disk seeks Programming is more simple Performance delivered at scale { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  23. Morphia MEAN  Stack Java Python Perl Ruby Support for the

    most popular languages and frameworks Drivers & Ecosystem
  24. THE LARGEST ECOSYSTEM 9,000,000+
 MongoDB Downloads 200,000+
 Online Education Registrants

    35,000+
 MongoDB User Group Members 35,000+
 MongoDB Management Service (MMS) Users 750+
 Technology and Services Partners 2,000+
 Customers Across All Industries
  25. High Availability Replica Set – two or more copies
 Self-healing

    shard
 Addresses availability considerations: High Availability Disaster Recovery Maintenance
 Deployment Flexibility Data locality to users Workload isolation: operational & analytics
  26. Single Data Center Automated failover Tolerates server failures Tolerates rack

    failures Number of replicas defines failure tolerance
  27. Single-click provisioning, scaling & upgrades, admin tasks Monitoring, with charts,

    dashboards and alerts on 100+ metrics Backup and restore, with point-in-time recovery, support for sharded clusters MongoDB Ops Manager The Best Way to Manage MongoDB In Your Data Center Up to 95% Reduction in Operational Overhead
  28. How MongoDB Ops Manager helps you Scale  Easily Meet  SLAs

    Best  Practices,   Automated Cut  Management   Overhead
  29. How Ops Manager Works Ops Manager mongod mongod mongod Agent

    Agent Agent New Config. N ew C onfig. New Config.
  30. *Included with MongoDB Enterprise Advanced BUSINESS NEEDS SECURITY FEATURES Authentication

    SCRAM, LDAP*, Kerberos*, x.509 Certificates Authorization Built-in Roles, User-Defined Roles, Field-Level Redaction Auditing Admin, DML, DDL, Role-based Encryption Network: SSL (with FIPS 140-2)*, Disk: Partner Solutions Enterprise-Grade Security
  31. Scale 250M Ticks/Sec 300K+ Ops/Sec 500K+ Ops/Sec Fed Agency Performance

    1,400 Servers 1,000+ Servers 250+ Servers Entertainment Co. Cluster Petabytes 10s of billions of objects 13B documents Data Asian Internet Co.
  32. Example: MongoDB Management Service Cloud service for managing MongoDB systems

    100+ system metrics visualized and alerted 35,000+ MongoDB systems submitting data every 60 seconds 90% updates, 10% reads ~30,000 updates/second ~3.2B operations/day Eight x86-64 servers
  33. MongoDB Performance* Top 5 Marketing Firm Government Agency Top 5

    Investment Bank Data Key/value 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents Queries Key-based 1 – 100 docs/query   80/20 read/write Compound queries   Range queries   MapReduce   20/80 read/write Compound queries   Range queries   50/50 read/write Servers ~250 ~50 4 Ops/sec 1,200,000 500,000 30,000 * These figures are provided as examples. Your application governs your performance.
  34. For More Information Resource Location Case Studies mongodb.com/customers Presentations mongodb.com/presentations

    Free Online Training education.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.org MongoDB Downloads mongodb.com/download Additional Info [email protected]
  35. Dynamic Schema { policyNum: 123, type: auto, customerId: abc, payment:

    899, deductible: 500, make: Taurus, model: Ford, VIN: 123ABC456, } { policyNum: 456, type: life, customerId: efg, payment: 240,
  36. Comparing Data Models MongoDB Key/Value Relational Rich Data Model Yes

    No No Dynamic Schema Yes Yes No Typed Data Yes No Yes Data Locality Yes Yes No Field Updates Yes No Yes Easy for Programmers Yes Yes No
  37. Indexes // Index nested documents > db.customers.ensureIndex( “policies.agent”:1 ) >

    db.customers.find({‘policies.agent’:’Fred’}) // geospatial index > db.customers.ensureIndex( “property.location”: “2d” ) > db.customers.find( “property.location” : { $near : [22,42] } ) // text index > db.customers.ensureIndex( “policies.notes”: “text” )
  38. Query Operators Conditional  Operators     $all,  $exists,  $mod,  $ne,

     $in,  $nin,  $nor,  $or,  $size,  $type   $lt,  $lte,  $gt,  $gte   //  find  customers  with  any  claims   >  db.customers.find(  {claims:  {$exists:  true  }}  )   //  find  customers  matching  a  regular  expression   >  db.customers.find(  {last:  /^rog*/i  }  )   //  count  customers  by  city   >  db.customers.find(  {city:  ‘Philadelphia’}  ).count()
  39. Comparing Query Models MongoDB Key/Value Relational Key/Value Yes Yes Yes

    Secondary Indexes Yes No Yes Index Intersection Yes No Yes Range Queries Yes No Yes Geospatial Yes No Yes Text Search Yes No Yes Aggregation Yes No Yes MapReduce Yes Yes No
  40. Comparing Operational Capabilities MongoDB Key/Value Relational Automatic Failover Yes Limited

    Yes Data Center Awareness Yes No Expensive Add- Ons Continuous Backup Yes No Yes Point in Time Recovery Yes No Yes Caching Layer Needed No No Often Automatic Sharding Yes Yes No
  41. Store files larger than 16MB i.e. video, images - Load

    chunks without reading entire file into memory Atomically sync files with their metadata Shard and distribute around the cluster GridFS doc.jpg doc.jpg (meta data) doc.jpg (1) GridFS API fs.files fs.chunks Driver
  42. MongoDB & Hadoop Applications powered by Analysis powered by Low

    latency Rich fast querying Flexible indexing Ad hoc aggregations in database Known data relationships Great at looking at any subset of data Longer jobs and queries Analytical processing Often highly partitionable Unknown data relationships Great at looking at all of data MongoDB Connector
 for Hadoop
  43. Analytics Landscape Batch  /  Predictive  /  Ad  Hoc   (mins

     –  hours) Real-­‐Time  Dashboards  /   Scoring   (<30  ms) Planned  Reporting   (secs  –  mins  ) Experimental Legacy
  44. Analytics Landscape Response Data   Supported Maturity Analytical   Capabilities

    Ease  of  Use Real-­‐Time • 㾓 㾓 ◕ Batch • 㾓 ◕ ◦ Batch 㾓 㾓 ◔ 㾓 Interactive 㾓 ◔ ◔ 㾓 Interactive • ◦ 㾓 ◦ Interactive ◔ • • •
  45. MongoDB Use Cases Single View Internet of Things Mobile Real-Time

    Analytics Catalog Personalization Content Management
  46. Challenge: Achieve Cross Asset View Batch Batch Batch Issues  

    •Yesterday’s  data   •Details  lost   •Inflexible  schema   •Slow  performance Batch Impact   •What  happened  today?   •Worse  customer  satisfaction •Missed  opportunities   •Lost  revenue   Batch Batch Reporting Customers Payments Products Data   Mart Data   Mart Data   Mart Datawarehouse
  47. .  .  .  .   Solution: Use New Database Customers

    Payments Products .  .  .  .   Operational   Data  Layer Customers   Service Operational   Reporting Open  Data  API Datawarehouse Strategic   Reporting Benefits   • Real-­‐time   • Complete  details   • Agile   • Higher  customer  retention • New  products   • …
  48. Single View of Customer Insurance leader generates coveted 360-degree view

    of customers in 90 days – “The Wall” Problem Why MongoDB Results • No single view of customer • 145 yrs of policy data, 70+ systems, 15+ apps • 2 years, $25M in failing to aggregate in RDBMS • Poor customer experience • Agility – prototype in 9 days; • Dynamic schema & rich querying – combine disparate data into one data store • Hot tech to attract top talent • Production in 90 days with 70 feeders • Unified customer view available to all channels • Increased call center productivity • Better customer experience, reduced churn, more upsell opps • Dozens more projects on same data platform
  49. Product Catalog Serves variety of content and user services on

    multiple platforms to 7M web and mobile users Problem Why MongoDB Results • MySQL reached scale ceiling – could not cope with performance and scalability demands • Metadata management too challenging with relational model • Hard to integrate external data sources • Unrivaled performance • Simple scalability and high availability • Intuitive mapping • Eliminated 6B+ rows of attributes – instead creates single document per user / piece of content • Supports 115,000+ queries per second • Saved £2M+ over 3 yrs. • “Lead time for new implementations is cut massively” • MongoDB is default choice for all new projects
  50. Personnalisation Server Accelerate Time To Market Problem Why MongoDB Results

    • Expensive Oracle Based Solution • 20 people, 16 months • Performance issues • 3 iterations • Cannot take new requirements • Mature Technology • Dynamic Schema • Fault Tolerance • Performance • 4 Developers • 4 months • Add new features • Faster • Smaller • Easier
  51. Reference Data Distribution
 Global Bank Distribute reference data globally in

    real-time for fast local accessing and querying Problem Why MongoDB Results • Delays up to 36 hours in distributing data by batch • Charged multiple times globally for same data • Incurring regulatory penalties from missing SLAs • Had to manage 20 distributed systems with same data • Dynamic schema: easy to load initially & over time • Auto-replication: data distributed in real-time, read locally • Both cache and database: cache always up-to-date • Simple data modeling & analysis: easy changes and understanding • Will avoid about $40,000,000 in costs and penalties over 5 years • Only charged once for data • Data in sync globally and read locally • Capacity to move to one global shared data service
  52. Reference Data Distribution
 Challenge: Ref data difficult to change and

    distribute Golden   Copy Batch Batch Batch Batch Batch Batch Batch Batch
  53. Reference Data Distribution
 Solution: Persistent dynamic cache replicated globally Real-­‐time

    Real-­‐time Real-­‐time Real-­‐time Real-­‐time Real-­‐time Real-­‐time Real-­‐time
  54. Mobile / Open Data API PIM Database • Legacy Application

    • Product Information NoSQL • REST API • Product Data • Additional Metadata
  55. Polyglot Persistence Big  Data/Analysis Document RDBMS • Log  Capture  

    • Recommendations   • Predictions   • Ad  Campaign • Products   • User  Profiles   • Game  Actions   • Sessions   • Shopping  Cart • Financial  Data   • Reporting