Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSE: Schema Design for NoSQL Applications

NoSE: Schema Design for NoSQL Applications

Database design is critical for high performance in relational databases and many tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate for this setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema.

We implemented a prototype of this approach for the Cassan-dra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.

Michael Mior

May 17, 2016
Tweet

More Decks by Michael Mior

Other Decks in Research

Transcript

  1. NoSE: Schema Design for
    NoSQL Applications
    Michael J. Mior, Kenneth Salem, Ashraf Aboulnaga, Rui Liu

    View full-size slide

  2. NoSE
    ● NoSQL App Development
    ● Problem Formulation
    ● NoSE Design and Implementation
    ● Evaluation

    View full-size slide

  3. NoSQL
    ● Eventually consistent, horizontally scalable, flexible schema
    ● Many different types of NoSQL databases
    ○ Document stores
    ○ Key-value stores
    ○ Graph databases
    ○ …
    ○ Extensible record stores

    View full-size slide

  4. Extensible Record Store Data Model
    CREATE COLUMNFAMILY "ReservationsByGuest"(
    "GuestID" uuid, "ResID" uuid,
    "ResStartDate" timestamp,
    "RoomID" uuid, PRIMARY KEY(("GuestID"),
    "ResStartDate", "ResID", "RoomID")
    );
    Partitioning key
    Clustering key

    View full-size slide

  5. Database Application Development
    1. Define application requirements
    2. Decide on a data model for the target system
    3. Implement the application according to the model
    a. Database access
    b. Application logic

    View full-size slide

  6. Database Application Development
    1. Define application requirements
    2. Decide on a data model for the target system
    3. Implement the application according to the model
    a. Database access
    b. Application logic
    }NoSE

    View full-size slide

  7. Schema Design Best Practices
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  8. Schema Design Best Practices
    Model column families around query patterns
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  9. Schema Design Best Practices
    Model column families around query patterns
    But start your design with entities and relationships, if you can
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  10. Schema Design Best Practices
    Model column families around query patterns
    But start your design with entities and relationships, if you can
    De-normalize and duplicate for read performance
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  11. Schema Design Best Practices
    Model column families around query patterns
    But start your design with entities and relationships, if you can
    De-normalize and duplicate for read performance
    But don’t de-normalize if you don’t need to
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  12. Schema Design Best Practices
    Model column families around query patterns
    But start your design with entities and relationships, if you can
    De-normalize and duplicate for read performance
    But don’t de-normalize if you don’t need to
    Leverage wide rows for ordering, grouping, and filtering
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  13. Schema Design Best Practices
    Model column families around query patterns
    But start your design with entities and relationships, if you can
    De-normalize and duplicate for read performance
    But don’t de-normalize if you don’t need to
    Leverage wide rows for ordering, grouping, and filtering
    But don’t go too wide
    Source: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    View full-size slide

  14. Schema Design Example
    For a given guest, return the cities that guest has stayed in
    CREATE COLUMNFAMILY "CitiesByGuest" ("GuestID" uuid,
    "City" text, PRIMARY KEY(("GuestID"), "City"));
    CREATE COLUMNFAMILY "HotelsByGuest" ("GuestID" uuid,
    "HotelID" uuid, PRIMARY KEY(("GuestID"), "HotelID"));
    CREATE COLUMNFAMILY "HotelsByID" ("HotelID" uuid,
    "HotelCity" text, PRIMARY KEY(("HotelID"), "HotelCity"));

    View full-size slide

  15. NoSE Overview
    Input Output
    Conceptual
    schema
    Workload
    Selected column
    families
    Query
    implementation
    plans
    NoSE

    View full-size slide

  16. Application Conceptual Model
    Hotel
    HotelID
    HotelName
    HotelPhone
    HotelAddress
    HotelCity
    HotelState
    HotelZip
    Room
    RoomID
    RoomNumber
    RoomRate
    RoomFloor
    Reservation
    ResID
    ResStartDate
    ResEndDate
    Guest
    GuestID
    GuestName
    GuestEmail
    Point of Interest
    POIID
    POIName
    POIDescription
    Amenity
    AmenityID
    AmenityDescription

    View full-size slide

  17. Application Workload
    For a given guest, return the cities that guest has stayed in
    SELECT Hotel.HotelCity FROM Hotel.Room.Reservation.Guest
    WHERE Guest.GuestID = ?
    Hotel
    HotelID
    HotelCity
    Room
    RoomID
    Reservation
    ResID
    Guest
    GuestID

    View full-size slide

  18. NoSE Architecture
    NoSE
    Input Output
    Candidate
    Enumeration
    Query Planning
    Schema
    Optimization
    Plan
    Recommendation
    Conceptual
    schema
    Workload
    Selected column
    families
    Query
    implementation plans

    View full-size slide

  19. Query Planning Example
    SELECT Name FROM Hotel WHERE Hotel.State = ‘NY’ AND
    Hotel.Reservation.Room.Guest.GuestID = ? ORDER BY Name
    GuestID

    RoomID
    RoomID

    HotelID
    HotelID

    Name, State
    Name
    State

    View full-size slide

  20. Schema Optimization
    Construct a linear program to optimize execution time
    Cost of using column family j to answer query i
    Use of column family j for query i in the final plan
    Presence of column family j in final schema
    Size of column family j

    View full-size slide

  21. Schema Optimization
    Add constraints to ensure each query has a valid plan
    Minimize the cost
    Ensure column families used are present
    Limit maximum storage space

    View full-size slide

  22. Updates
    ● Updates make denormalization more expensive
    ● Add statements to update conceptual entities
    ● New column families are added to support updates
    ● Costs for updates are added to the linear program

    View full-size slide

  23. Evaluation
    ● Application defined by the RUBiS online auction benchmark
    ● Generate a schema and query plans recommended by NoSE
    ● Two schemas for comparison
    ○ Normalized (as much as possible)
    ○ Expert-selected

    View full-size slide

  24. Evaluation - Schema Performance

    View full-size slide

  25. Conclusion
    ● NoSE automates schema design for NoSQL applications
    ● Conforms to best practices without requiring expertise
    ● Schemas are better than those produced manually with an
    average of 1.8x and up to 125x performance improvement

    View full-size slide

  26. Questions?
    git.io/nose-icde

    View full-size slide