High availability • Fault tolerance • Scalability on commodity hardware Similar needs for Web giants : - Created Dynamo - < 40 min of unavailability per year - Created BigTable & MapReduce - Stores every webpages of Internet Sunday, April 15, 12
Process order Prepare Send Requires high availability, key-value store is enough Requires complex requests, temporal unavailability is acceptable Sunday, April 15, 12
queries & transactions • Scalability • Data consistency Needs within financial market : - Released Coherence in 2001 - Started as a distributed cache - Released Gigaspaces XAP in 2001 - Routes the request inside the data Sunday, April 15, 12
Passenger name Reference data Duplicated in each partition TrainStation code name Root entity Partitioning ready entities tree Train code type Find the root entity and denormalize Sunday, April 15, 12
driven • With partitioning, data modeling had to be adapted for requests • NoSQL & DataGrids data modeling is request driven Adaptation to requests comes with tuning Because network latency matters Two requests may require to store data twice Sunday, April 15, 12
userProfileObj = new RiakObject("bucket", "johndoe", serializer.serialize(userProfile); riak.store(userProfileObj); Inserts a user profile into Riak Sunday, April 15, 12
"price": 559.0, "vendor": "Apple", "rating": 4.6, "tags": [ "phone", "touch" ] } item_1 The database is aware of document’s fields and can offers complex queries Sunday, April 15, 12
DB db = mongo.getDB("Ecommerce"); DBCollection catalog = db.getCollection("Catalog"); BasicDBObject doc = new BasicDBObject(); doc.put("name", "Iphone"); doc.put("price", 559.0); catalog.insert(doc); Inserts an item document into MongoDB Sunday, April 15, 12
new BasicDBObject("$lt", 600)); DBCursor cursor = catalog.find(query); while(cursor.hasNext()) { System.out.println(cursor.next()); } Queries for all items with a price lower than 600 Sunday, April 15, 12
sub entities can have cross relations @Entity(schemaRoot=true) public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version; ... } TrainStop date Seat number price booked Train code type Sunday, April 15, 12
ipod : 1 headphone : 1 iphone: 1 ... ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 concurrency on iphone 121 311 Place order cancel order if one product is missing 12 Sunday, April 15, 12
ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! begin for each shoppingCart.product select for update ... update ... commit 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! select for update ... 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
ipod : 1 headphone : 1 iphone: 1 ... lock duration = f(shoppingcart.length) if too many locks on the rows, then lock table ! select for update ... 121 311 12 ipad : 1 iphone: 1 barbie : 1 iphone: 1 cabbage-doll: 1 Place order Sunday, April 15, 12
and Voldemort provide a great scalability • Memcached and Redis offer low overhead and latency Simple but enough for a lot of use cases Great to persist continuously growing datasets Great for cache and live data Sunday, April 15, 12
a list of columns • Queries are simples, but columns slice fetching is possible • Data model is too low level for many complex data modeling Makes it possible to fetch and update partial data Great for pagination Should typically be used for the largest scalability needs Sunday, April 15, 12
• Scalability may be limited if not querying using partition key Great for continuously updated schemas Necessary for filtering and search Can be handle using multiple storage and limited queries Sunday, April 15, 12
Transaction Processing (XTP) • In Memory - No Persistence • High budget and Developer skills required Investment banking, booking & inventory systems Most of the time backed with a database Some Open Source alternatives are appearing Sunday, April 15, 12
that fit the needs of every type of data • Linear Scalability: being able to handle any further business requirements • High Availability: multi-servers and multi-datacenters • Elasticity: natural integration with Cloud Computing philosophy • Some new use cases now available Sunday, April 15, 12