Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elegant and Efficient Database Design

Avatar for Becky Sweger Becky Sweger
January 10, 2009

Elegant and Efficient Database Design

A presentation that Tim Allen, a colleague, and I gave to Wharton Computing. We cover RDBMS fundamentals such as design, indexing, etc.

Avatar for Becky Sweger

Becky Sweger

January 10, 2009
Tweet

More Decks by Becky Sweger

Other Decks in Technology

Transcript

  1.  Joshua  Jana  John-David  Jill  Jessa

     Jinger  Josiah  Joy-Anna  Jedidiah  Jeremiah  Jason  James  Justin  Jackson  Johannah  Jennifer
  2.  Pick one and enforce it: camelCase, PascalCase, under_scores 

    Why?  Cleaner code  Logical joins  Sanity of future developers and future you  Explicitly name constraints  Avoid keywords as column names  WRDS Fail: group, client, school, subscriber, and institution are all used for the same data entity
  3.  Avoid data duplication  Let end users make their

    own changes  Avoid data anomalies  Third-party tools rely on normalized data
  4. Indexes in SQL Server are organized as B-trees member name

    clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)
  5.  Unique  Full-text  Included columns  Indexed views

     XML  Filtered (new for 2008)  Spatial (new for 2008) http://msdn.microsoft.com/en-us/library/ms175049.aspx
  6.  Clustered index on every table member name non-clustered index

    (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)
  7.  Integer primary key on every table /* take checkpoint,

    clear buffers & cache */ SELECT s.term, s.section_id, COUNT(penn_id) FROM flat_section s JOIN flat_enrollment e ON s.section_id = e.section_id AND s.term = e.term GROUP BY s.term, s.section_id
  8.  Columns you join on: indexed integers are your friend!

     How are the columns used in queries?  Cardinality: 1:1, 1:many, many:many  Data type  Indexing multiple columns: moderation  Goal 1: performance!  Goal 2: smallest index file possible.
  9.  TINY: 8 bits (0 – 255): 01010101  SMALL

    INTEGER: 16 bits (0 – 65536): 0101010101010101  INTEGER: 32 bits (0 – 16777215): 01010101010101010101010101010101  VARCHAR ‘philadelphia’: 104 bits, at least (encoding UTF-8): 01010101010101010101010101010101010101010101010101010101 010101010101010101010101010101010101010101010101  Consider joining 4 tables on ‘philadelphia’: 4 initial lookups on indexes, 8 times as bulky and less cacheable as a small integer.  Index on VARCHARs only when needed  Keeps index files smaller and less chance of fragmentation; fragmented index files make Matt Frew’s life hellish (no, that is NOT a positive!)  Consider purpose: don’t index for a one time script or report
  10.  Large table: millions of rows  A record for

    each WRDS query  VARCHAR columns that should be INTEGERS  Report requests for subscribers asking number of queries for a date range group by library and file  Before indexing, full table scan: 30 secs per query  Index added: subscriber, query date, library, file  After indexing, without table scan: 0.02 secs per query
  11.  From the Query menu:  SET STATISTICS TIME, SET

    STATISTICS IO  Include Actual Execution Plan
  12.  Don’t skimp on column length: Yes/No? Maybe. Open/Closed? Under

    construction. Black/White? Grey.  Know required level of precision, and leave yourself room to grow.  Accuracy to the day, minute, millisecond?
  13.  Foreign Keys  Unique indexes  Check constraints 

    Default constraints  Triggers image courtesy of My New Filing Technique is Unstoppable by David Rees: http://www.mnftiu.cc/2002/11/26/filing-009/(nsfw)
  14.  Solicit feedback on database design before coding starts, not

    after.  Ask for opinions, and share your opinions!  More eyes = better database design  More ideas = better database design  Anyone have any tips… or questions?
  15. 1. SQL Server Books Online http://msdn.microsoft.com/en-us/library/ms130214.aspx 2. SQL Server 2008

    Query Performance Turning Distilled by Grant Fritchey and Sajal Dam 3. Comparing Tables Organized with Clustered Indexes versus Heaps http://technet.microsoft.com/en-us/library/cc917672.aspx 4. MS Index Design Guidelines http://msdn.microsoft.com/en-us/library/ms191195.aspx 5. Clustered Indexes vs. Nonclustered Indexes in SQL Server http://digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=443e9774-7d26-422d-a2f1- dbcafbb1e1fc&pc=5 6. SQL Server Execution Plans (free e-book, registration required) http://www.sqlservercentral.com/articles/books/65831/ 7. Uncover Hidden Data to Optimize Application Performance http://msdn.microsoft.com/en-us/magazine/cc135978.aspx 8. My New Filing Technique is Unstoppable (NSFW) http://www.mnftiu.cc/2002/11/26/filing-001/ 9. SQL in the Wild http://sqlinthewild.co.za/ 10. Journey to SQL Authority With Pinal Dave http://blog.sqlauthority.com/