Save 37% off PRO during our Black Friday Sale! »

NoSE: Schema Design for NoSQL Applications

NoSE: Schema Design for NoSQL Applications

Database design is critical for high performance in relational databases and many tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate for this setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema.

We implemented a prototype of this approach for the Cassan-dra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.

C1e4202cb329b1e3343a571dff94c68c?s=128

Michael Mior

May 17, 2016
Tweet

Transcript

  1. NoSE: Schema Design for NoSQL Applications Michael J. Mior, Kenneth

    Salem, Ashraf Aboulnaga, Rui Liu
  2. NoSE • NoSQL App Development • Problem Formulation • NoSE

    Design and Implementation • Evaluation
  3. NoSQL • Eventually consistent, horizontally scalable, flexible schema • Many

    different types of NoSQL databases ◦ Document stores ◦ Key-value stores ◦ Graph databases ◦ … ◦ Extensible record stores
  4. Extensible Record Store Data Model CREATE COLUMNFAMILY "ReservationsByGuest"( "GuestID" uuid,

    "ResID" uuid, "ResStartDate" timestamp, "RoomID" uuid, PRIMARY KEY(("GuestID"), "ResStartDate", "ResID", "RoomID") ); Partitioning key Clustering key
  5. Database Application Development 1. Define application requirements 2. Decide on

    a data model for the target system 3. Implement the application according to the model a. Database access b. Application logic
  6. Database Application Development 1. Define application requirements 2. Decide on

    a data model for the target system 3. Implement the application according to the model a. Database access b. Application logic }NoSE
  7. Schema Design Best Practices Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

  8. Schema Design Best Practices Model column families around query patterns

    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  9. Schema Design Best Practices Model column families around query patterns

    But start your design with entities and relationships, if you can Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  10. Schema Design Best Practices Model column families around query patterns

    But start your design with entities and relationships, if you can De-normalize and duplicate for read performance Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  11. Schema Design Best Practices Model column families around query patterns

    But start your design with entities and relationships, if you can De-normalize and duplicate for read performance But don’t de-normalize if you don’t need to Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  12. Schema Design Best Practices Model column families around query patterns

    But start your design with entities and relationships, if you can De-normalize and duplicate for read performance But don’t de-normalize if you don’t need to Leverage wide rows for ordering, grouping, and filtering Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  13. Schema Design Best Practices Model column families around query patterns

    But start your design with entities and relationships, if you can De-normalize and duplicate for read performance But don’t de-normalize if you don’t need to Leverage wide rows for ordering, grouping, and filtering But don’t go too wide Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
  14. Schema Design Example For a given guest, return the cities

    that guest has stayed in CREATE COLUMNFAMILY "CitiesByGuest" ("GuestID" uuid, "City" text, PRIMARY KEY(("GuestID"), "City")); CREATE COLUMNFAMILY "HotelsByGuest" ("GuestID" uuid, "HotelID" uuid, PRIMARY KEY(("GuestID"), "HotelID")); CREATE COLUMNFAMILY "HotelsByID" ("HotelID" uuid, "HotelCity" text, PRIMARY KEY(("HotelID"), "HotelCity"));
  15. NoSE Overview Input Output Conceptual schema Workload Selected column families

    Query implementation plans NoSE
  16. Application Conceptual Model Hotel HotelID HotelName HotelPhone HotelAddress HotelCity HotelState

    HotelZip Room RoomID RoomNumber RoomRate RoomFloor Reservation ResID ResStartDate ResEndDate Guest GuestID GuestName GuestEmail Point of Interest POIID POIName POIDescription Amenity AmenityID AmenityDescription
  17. Application Workload For a given guest, return the cities that

    guest has stayed in SELECT Hotel.HotelCity FROM Hotel.Room.Reservation.Guest WHERE Guest.GuestID = ? Hotel HotelID HotelCity Room RoomID Reservation ResID Guest GuestID
  18. NoSE Architecture NoSE Input Output Candidate Enumeration Query Planning Schema

    Optimization Plan Recommendation Conceptual schema Workload Selected column families Query implementation plans
  19. Query Planning Example SELECT Name FROM Hotel WHERE Hotel.State =

    ‘NY’ AND Hotel.Reservation.Room.Guest.GuestID = ? ORDER BY Name GuestID ↓ RoomID RoomID ↓ HotelID HotelID ↓ Name, State Name State
  20. Schema Optimization Construct a linear program to optimize execution time

    Cost of using column family j to answer query i Use of column family j for query i in the final plan Presence of column family j in final schema Size of column family j
  21. Schema Optimization Add constraints to ensure each query has a

    valid plan Minimize the cost Ensure column families used are present Limit maximum storage space
  22. Updates • Updates make denormalization more expensive • Add statements

    to update conceptual entities • New column families are added to support updates • Costs for updates are added to the linear program
  23. Evaluation • Application defined by the RUBiS online auction benchmark

    • Generate a schema and query plans recommended by NoSE • Two schemas for comparison ◦ Normalized (as much as possible) ◦ Expert-selected
  24. Evaluation - Schema Performance

  25. Conclusion • NoSE automates schema design for NoSQL applications •

    Conforms to best practices without requiring expertise • Schemas are better than those produced manually with an average of 1.8x and up to 125x performance improvement
  26. Questions? git.io/nose-icde