Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elegant and Efficient Database Design

Becky Sweger
January 10, 2009

Elegant and Efficient Database Design

A presentation that Tim Allen, a colleague, and I gave to Wharton Computing. We cover RDBMS fundamentals such as design, indexing, etc.

Becky Sweger

January 10, 2009
Tweet

More Decks by Becky Sweger

Other Decks in Technology

Transcript

  1.  Joshua  Jana  John-David  Jill  Jessa

     Jinger  Josiah  Joy-Anna  Jedidiah  Jeremiah  Jason  James  Justin  Jackson  Johannah  Jennifer
  2.  Pick one and enforce it: camelCase, PascalCase, under_scores 

    Why?  Cleaner code  Logical joins  Sanity of future developers and future you  Explicitly name constraints  Avoid keywords as column names  WRDS Fail: group, client, school, subscriber, and institution are all used for the same data entity
  3.  Avoid data duplication  Let end users make their

    own changes  Avoid data anomalies  Third-party tools rely on normalized data
  4. Indexes in SQL Server are organized as B-trees member name

    clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)
  5.  Unique  Full-text  Included columns  Indexed views

     XML  Filtered (new for 2008)  Spatial (new for 2008) http://msdn.microsoft.com/en-us/library/ms175049.aspx
  6.  Clustered index on every table member name non-clustered index

    (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)
  7.  Integer primary key on every table /* take checkpoint,

    clear buffers & cache */ SELECT s.term, s.section_id, COUNT(penn_id) FROM flat_section s JOIN flat_enrollment e ON s.section_id = e.section_id AND s.term = e.term GROUP BY s.term, s.section_id
  8.  Columns you join on: indexed integers are your friend!

     How are the columns used in queries?  Cardinality: 1:1, 1:many, many:many  Data type  Indexing multiple columns: moderation  Goal 1: performance!  Goal 2: smallest index file possible.
  9.  TINY: 8 bits (0 – 255): 01010101  SMALL

    INTEGER: 16 bits (0 – 65536): 0101010101010101  INTEGER: 32 bits (0 – 16777215): 01010101010101010101010101010101  VARCHAR ‘philadelphia’: 104 bits, at least (encoding UTF-8): 01010101010101010101010101010101010101010101010101010101 010101010101010101010101010101010101010101010101  Consider joining 4 tables on ‘philadelphia’: 4 initial lookups on indexes, 8 times as bulky and less cacheable as a small integer.  Index on VARCHARs only when needed  Keeps index files smaller and less chance of fragmentation; fragmented index files make Matt Frew’s life hellish (no, that is NOT a positive!)  Consider purpose: don’t index for a one time script or report
  10.  Large table: millions of rows  A record for

    each WRDS query  VARCHAR columns that should be INTEGERS  Report requests for subscribers asking number of queries for a date range group by library and file  Before indexing, full table scan: 30 secs per query  Index added: subscriber, query date, library, file  After indexing, without table scan: 0.02 secs per query
  11.  From the Query menu:  SET STATISTICS TIME, SET

    STATISTICS IO  Include Actual Execution Plan
  12.  Don’t skimp on column length: Yes/No? Maybe. Open/Closed? Under

    construction. Black/White? Grey.  Know required level of precision, and leave yourself room to grow.  Accuracy to the day, minute, millisecond?
  13.  Foreign Keys  Unique indexes  Check constraints 

    Default constraints  Triggers image courtesy of My New Filing Technique is Unstoppable by David Rees: http://www.mnftiu.cc/2002/11/26/filing-009/(nsfw)
  14.  Solicit feedback on database design before coding starts, not

    after.  Ask for opinions, and share your opinions!  More eyes = better database design  More ideas = better database design  Anyone have any tips… or questions?
  15. 1. SQL Server Books Online http://msdn.microsoft.com/en-us/library/ms130214.aspx 2. SQL Server 2008

    Query Performance Turning Distilled by Grant Fritchey and Sajal Dam 3. Comparing Tables Organized with Clustered Indexes versus Heaps http://technet.microsoft.com/en-us/library/cc917672.aspx 4. MS Index Design Guidelines http://msdn.microsoft.com/en-us/library/ms191195.aspx 5. Clustered Indexes vs. Nonclustered Indexes in SQL Server http://digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=443e9774-7d26-422d-a2f1- dbcafbb1e1fc&pc=5 6. SQL Server Execution Plans (free e-book, registration required) http://www.sqlservercentral.com/articles/books/65831/ 7. Uncover Hidden Data to Optimize Application Performance http://msdn.microsoft.com/en-us/magazine/cc135978.aspx 8. My New Filing Technique is Unstoppable (NSFW) http://www.mnftiu.cc/2002/11/26/filing-001/ 9. SQL in the Wild http://sqlinthewild.co.za/ 10. Journey to SQL Authority With Pinal Dave http://blog.sqlauthority.com/