Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GeeCon 2011 - NoSQL and In Memory Data Grids fr...

GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

GeeCon 2011 : NoSQL and In Memory Data Grids from a developer perspective by Cyrille Le Clerc and Michael Figuière - Xebia

Cyrille Le Clerc

April 15, 2012
Tweet

More Decks by Cyrille Le Clerc

Other Decks in Technology

Transcript

  1. NoSQL & DataGrids from a Developer Perspective Cyrille Le Clerc

    - Michaël Figuière Sunday, April 15, 12
  2. On the Web side • Huge amount of data •

    High availability • Fault tolerance • Scalability on commodity hardware Similar needs for Web giants : - Created Dynamo - < 40 min of unavailability per year - Created BigTable & MapReduce - Stores every webpages of Internet Sunday, April 15, 12
  3. Amazon : the birth of Dynamo Fill cart Checkout Payment

    Process order Prepare Send Requires high availability, key-value store is enough Requires complex requests, temporal unavailability is acceptable Sunday, April 15, 12
  4. On the Financial side • Very low latency • Rich

    queries & transactions • Scalability • Data consistency Needs within financial market : - Released Coherence in 2001 - Started as a distributed cache - Released Gigaspaces XAP in 2001 - Routes the request inside the data Sunday, April 15, 12
  5. Use Case : Train Ticketing System With trains, stations, seats,

    booking and passengers Sunday, April 15, 12
  6. Store everything in a Mainframe ! Up to 3 To

    of RAM ! More than $1,000,000 IBM z11 Sunday, April 15, 12
  7. Data Partitioning Split data for scalability MainFrame Small servers Partition

    gamma Partition beta Partition alpha Sunday, April 15, 12
  8. Data Replication synchro Duplicate data for high availability and scalability

    Partition alpha Node 1 Node 2 Node 3 Sunday, April 15, 12
  9. Partitioned Data Modeling TrainStop date TrainStation code name Train code

    type Seat number price Booking reduction Passenger name Typical relational data model Sunday, April 15, 12
  10. Partitionned Data Modeling TrainStop date Seat number price Booking reduction

    Passenger name Reference data Duplicated in each partition TrainStation code name Root entity Partitioning ready entities tree Train code type Find the root entity and denormalize Sunday, April 15, 12
  11. Partitionned Data Modeling Remove unused data TrainStop date Seat number

    price Booking reduction Passenger name booked TrainStation code name Train code type Sunday, April 15, 12
  12. Partitionned Data Modeling TrainStop date TrainStation code name Seat number

    price booked Train code type Sharding ready data structure Sunday, April 15, 12
  13. Data Consistency with replicas Node 1 write to all read

    from one { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]} Node 2 Node 3 Node 1 Node 2 Node 3 Sunday, April 15, 12
  14. Data Consistency with replicas { "name": "Barbie Computer", "price": 15.50,

    "tags" : [ "doll", "barbie" ]} write to one read from all Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Sunday, April 15, 12
  15. Data Consistency with replicas • You can adjust the balance

    between number of writes and number of reads • See Eventual Consistency Sunday, April 15, 12
  16. Data Consistency with Multiple Data Centers West Coast East Coast

    { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]} Sunday, April 15, 12
  17. Data Consistency with Multiple Data Centers { "name": "Barbie Computer",

    "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]} set price to $ 20.00 propagation delay ! West Coast East Coast Sunday, April 15, 12
  18. Data Consistency with Multiple Data Centers { "name": "Barbie Computer",

    "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie", “girl” ]} set price to $ 20.00 add tag “girl” reconciliation API needed ! West Coast East Coast Sunday, April 15, 12
  19. Data Consistency with Multiple Data Centers { "name": "Barbie Computer",

    "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie", “girl” ]} set price to $ 20.00 add tag “girl” Network partitioning West Coast East Coast Sunday, April 15, 12
  20. Data Consistency with Multiple Data Centers Tokyo New York London

    World wide replication for financial market Sunday, April 15, 12
  21. CAP Theorem Consistency Availability Partition Tolerance Only 2 of these

    3 properties can be achieved in storage system Sunday, April 15, 12
  22. Request Driven Data Modeling • Relational data modeling is business

    driven • With partitioning, data modeling had to be adapted for requests • NoSQL & DataGrids data modeling is request driven Adaptation to requests comes with tuning Because network latency matters Two requests may require to store data twice Sunday, April 15, 12
  23. Example with a user profile johndoe User profile as byte[]

    Similar to a Java HashMap Sunday, April 15, 12
  24. Write Example with Riak RiakClient riak = new RiakClient("http://server1:8098/riak"); RiakObject

    userProfileObj = new RiakObject("bucket", "johndoe", serializer.serialize(userProfile); riak.store(userProfileObj); Inserts a user profile into Riak Sunday, April 15, 12
  25. Read Example with Riak FetchResponse response = riak.fetch("bucket", "johndoe"); if

    (response.hasObject()) { userProfileObj = response.getObject(); } Fetch a user profile using its key in Riak Sunday, April 15, 12
  26. Column Families Store Relational DB Column families DB For each

    Row ID we have a list of key-value pairs Key-value pairs are sorted by keys Sunday, April 15, 12
  27. Example with a shopping cart 17:21 Iphone 17:32 DVD Player

    17:44 MacBook johndoe 6:10 Camera 8:29 Ipad willsmith 14:45 PlayStation 15:01 Asus EEE 15:03 Iphone pitdavis Sunday, April 15, 12
  28. Write Example with Cassandra Cluster cluster = HFactory.getOrCreateCluster("cluster", new CassandraHostConfigurator("server1:9160"));

    Keyspace keyspace = HFactory.createKeyspace("EcommerceKeyspace", cluster); Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); mutator.insert("johndoe", "ShoppingCartColumnFamily", HFactory.createStringColumn("14:21", "Iphone")); Inserts a column into the ShoppingCartColumnFamily Sunday, April 15, 12
  29. Read Example with Cassandra SliceQuery<String, String, String> query = HFactory.createSliceQuery(keyspace,

    stringSerializer, stringSerializer, stringSerializer); query.setColumnFamily("ShoppingCartColumnFamily") .setKey("johndoe") .setRange("", "", false, 10); QueryResult<ColumnSlice<String, String>> result = query.execute(); Reads a slice of 10 columns from ShoppingCartColumnFamily Sunday, April 15, 12
  30. Example with an item of a catalog { "name": "Iphone",

    "price": 559.0, "vendor": "Apple", "rating": 4.6, "tags": [ "phone", "touch" ] } item_1 The database is aware of document’s fields and can offers complex queries Sunday, April 15, 12
  31. Write Example with MongoDB Mongo mongo = new Mongo("mongos_1", 27017);

    DB db = mongo.getDB("Ecommerce"); DBCollection catalog = db.getCollection("Catalog"); BasicDBObject doc = new BasicDBObject(); doc.put("name", "Iphone"); doc.put("price", 559.0); catalog.insert(doc); Inserts an item document into MongoDB Sunday, April 15, 12
  32. Read Example with MongoDB BasicDBObject query = new BasicDBObject(); query.put("price",

    new BasicDBObject("$lt", 600)); DBCursor cursor = catalog.find(query); while(cursor.hasNext()) { System.out.println(cursor.next()); } Queries for all items with a price lower than 600 Sunday, April 15, 12
  33. Example with train booking with IBM eXtremeScale With Data Grids,

    sub entities can have cross relations @Entity(schemaRoot=true) public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version; ... } TrainStop date Seat number price booked Train code type Sunday, April 15, 12
  34. Write Example with IBM eXtreme Scale void persist(Train train) {

    entityManager.persist(train); } Inserts a train into eXtreme Scale eXtreme Scale provides a JPA Style API Sunday, April 15, 12
  35. Read Example with IBM eXtreme Scale /** Find by key

    */ Train findById(String id) { return (Train) entityManager.find(Train.class, id); } /** Query Language */ Train findByTrain(String code) { Query q = entityManager.createQuery("select t from Train t where t.code=:code"); q.setParameter("code", code); return (Train) q.getSingleResult(); } Simple and complex queries with eXtreme Scale Sunday, April 15, 12
  36. More APIs • Another Java EE versus Spring battle ?

    JSR 347 Data Grids vs. Spring Data Unified API ontop of relational, document, column, key-value ? Object to tuple projection API Sunday, April 15, 12
  37. Transactions • NoSQL usually means NO transactions • Except when

    it means eXtreme Transactions ! Sunday, April 15, 12
  38. Transactions Concurrency warehouse stocks 231 264 2 637 canon-eos: 1

    ipod : 1 headphone : 1 iphone: 1 ... ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 concurrency on iphone 121 311 Place order cancel order if one product is missing 12 Sunday, April 15, 12
  39. SQL Transactions warehouse stocks 231 264 2 637 canon-eos: 1

    ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! begin for each shoppingCart.product select for update ... update ... commit 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  40. SQL Transactions warehouse stocks 231 264 2 637 canon-eos: 1

    ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! select for update ... 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  41. SQL Transactions warehouse stocks 231 264 2 637 canon-eos: 1

    ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! select for update ... 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  42. Transactions with Manual Compensation warehouse stocks 231 264 2 637

    code “do”, “undo” and the chain 121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! } DO stock = stock + quantity; UNDO canon-eos: 1 ipod : 1 headphone : 1 iphone: 1 ... Place order Sunday, April 15, 12
  43. Transactions with Manual Compensation warehouse stocks 231 264 0 637

    121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  44. Transactions with Manual Compensation warehouse stocks 231 264 0 636

    121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  45. Transactions with Manual Compensation warehouse stocks 231 264 0 636

    121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 no more iphone ! barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
  46. Transactions with Manual Compensation warehouse stocks 231 264 0 636

    121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 interrupted barbie : 1 iphone: 1 cabbage-doll: 1 cancelled Place order Sunday, April 15, 12
  47. Transactions with Manual Compensation warehouse stocks 231 264 0 636

    +1 121 311 12 if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO -1 -1 -1 interrupted barbie : 1 iphone: 1 cabbage-doll: 1 cancelled if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! DO stock = stock + quantity; UNDO undo if(stock - quantity > 0) { stock = stock - quantity; } else { throw exception() ! } DO stock = stock + quantity; UNDO Place order Sunday, April 15, 12
  48. Transactions with Manual Compensation • Code “do” & “undo” &

    chain execution • What about interrupted chain execution ? Data corruption ? Sunday, April 15, 12
  49. Transactions with Manual Compensation • Code “do” & “undo” &

    chain execution • What about interrupted chain execution ? Data corruption ? data store managed transaction chain execution Sunday, April 15, 12
  50. Key-Value Store • Get and Set by key • Riak

    and Voldemort provide a great scalability • Memcached and Redis offer low overhead and latency Simple but enough for a lot of use cases Great to persist continuously growing datasets Great for cache and live data Sunday, April 15, 12
  51. Column Families Store • Get and Set by key of

    a list of columns • Queries are simples, but columns slice fetching is possible • Data model is too low level for many complex data modeling Makes it possible to fetch and update partial data Great for pagination Should typically be used for the largest scalability needs Sunday, April 15, 12
  52. Document Store • Schema less • Complex queries are available

    • Scalability may be limited if not querying using partition key Great for continuously updated schemas Necessary for filtering and search Can be handle using multiple storage and limited queries Sunday, April 15, 12
  53. In Memory Data Grid • Very Low Latency & eXtreme

    Transaction Processing (XTP) • In Memory - No Persistence • High budget and Developer skills required Investment banking, booking & inventory systems Most of the time backed with a database Some Open Source alternatives are appearing Sunday, April 15, 12
  54. Polyglot storage for eCommerce Application Solr MongoDB Cassandra Coherence Products

    search Warehouse inventory Product catalog User account and Shopping cart Sunday, April 15, 12
  55. Why NoSQL & DataGrids matter ? • Polyglot Storage: databases

    that fit the needs of every type of data • Linear Scalability: being able to handle any further business requirements • High Availability: multi-servers and multi-datacenters • Elasticity: natural integration with Cloud Computing philosophy • Some new use cases now available Sunday, April 15, 12