Elegant and Efficient Database Design

Tim Allen Becky Sweger

 Joshua  Jana  John-David  Jill  Jessa
 Jinger  Josiah  Joy-Anna  Jedidiah  Jeremiah  Jason  James  Justin  Jackson  Johannah  Jennifer

 Pick one and enforce it: camelCase, PascalCase, under_scores 
Why?  Cleaner code  Logical joins  Sanity of future developers and future you  Explicitly name constraints  Avoid keywords as column names  WRDS Fail: group, client, school, subscriber, and institution are all used for the same data entity

The key, the whole key, and nothing but the key.
So help me Codd.

http://en.wikipedia.org/wiki/File:Insertion_anomaly.svg

 Avoid data duplication  Let end users make their
own changes  Avoid data anomalies  Third-party tools rely on normalized data

Find a book in Van Pelt without a card catalog…

Indexes in SQL Server are organized as B-trees member name
clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)

member id non-clustered index (image from Clustered Indexes vs. Nonclustered
Indexes in SQL Server:http://tr.im/AeU5)

 Unique  Full-text  Included columns  Indexed views
 XML  Filtered (new for 2008)  Spatial (new for 2008) http://msdn.microsoft.com/en-us/library/ms175049.aspx

 Primary key = unique index (clustered or nonclustered)

 Clustered index on every table member name non-clustered index
(image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)

 Integer primary key on every table /* take checkpoint,
clear buffers & cache */ SELECT s.term, s.section_id, COUNT(penn_id) FROM flat_section s JOIN flat_enrollment e ON s.section_id = e.section_id AND s.term = e.term GROUP BY s.term, s.section_id

 Results

 Columns you join on: indexed integers are your friend!
 How are the columns used in queries?  Cardinality: 1:1, 1:many, many:many  Data type  Indexing multiple columns: moderation  Goal 1: performance!  Goal 2: smallest index file possible.

 TINY: 8 bits (0 – 255): 01010101  SMALL
INTEGER: 16 bits (0 – 65536): 0101010101010101  INTEGER: 32 bits (0 – 16777215): 01010101010101010101010101010101  VARCHAR ‘philadelphia’: 104 bits, at least (encoding UTF-8): 01010101010101010101010101010101010101010101010101010101 010101010101010101010101010101010101010101010101  Consider joining 4 tables on ‘philadelphia’: 4 initial lookups on indexes, 8 times as bulky and less cacheable as a small integer.  Index on VARCHARs only when needed  Keeps index files smaller and less chance of fragmentation; fragmented index files make Matt Frew’s life hellish (no, that is NOT a positive!)  Consider purpose: don’t index for a one time script or report

 Large table: millions of rows  A record for
each WRDS query  VARCHAR columns that should be INTEGERS  Report requests for subscribers asking number of queries for a date range group by library and file  Before indexing, full table scan: 30 secs per query  Index added: subscriber, query date, library, file  After indexing, without table scan: 0.02 secs per query

 From the Query menu:  SET STATISTICS TIME, SET
STATISTICS IO  Include Actual Execution Plan

 Dynamic Management Views & Functions: http://msdn.microsoft.com/en-us/magazine/cc135978.aspx

 Analyzes workloads

Blame the SQL Admins!

 Don’t skimp on column length: Yes/No? Maybe. Open/Closed? Under
construction. Black/White? Grey.  Know required level of precision, and leave yourself room to grow.  Accuracy to the day, minute, millisecond?

Logical deletes vs. physical deletes

 Foreign Keys  Unique indexes  Check constraints 
Default constraints  Triggers image courtesy of My New Filing Technique is Unstoppable by David Rees: http://www.mnftiu.cc/2002/11/26/filing-009/(nsfw)

 “Temporary” projects  Balance between today’s pragmatism and tomorrow’s
pain  Code review sooner

Experiment in SQL Server Management Studio to improve your times
and execution plans

 Solicit feedback on database design before coding starts, not
after.  Ask for opinions, and share your opinions!  More eyes = better database design  More ideas = better database design  Anyone have any tips… or questions?

1. SQL Server Books Online http://msdn.microsoft.com/en-us/library/ms130214.aspx 2. SQL Server 2008
Query Performance Turning Distilled by Grant Fritchey and Sajal Dam 3. Comparing Tables Organized with Clustered Indexes versus Heaps http://technet.microsoft.com/en-us/library/cc917672.aspx 4. MS Index Design Guidelines http://msdn.microsoft.com/en-us/library/ms191195.aspx 5. Clustered Indexes vs. Nonclustered Indexes in SQL Server http://digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=443e9774-7d26-422d-a2f1- dbcafbb1e1fc&pc=5 6. SQL Server Execution Plans (free e-book, registration required) http://www.sqlservercentral.com/articles/books/65831/ 7. Uncover Hidden Data to Optimize Application Performance http://msdn.microsoft.com/en-us/magazine/cc135978.aspx 8. My New Filing Technique is Unstoppable (NSFW) http://www.mnftiu.cc/2002/11/26/filing-001/ 9. SQL in the Wild http://sqlinthewild.co.za/ 10. Journey to SQL Authority With Pinal Dave http://blog.sqlauthority.com/

Elegant and Efficient Database Design

Elegant and Efficient Database Design

Becky Sweger

More Decks by Becky Sweger

Other Decks in Technology

Featured

Transcript

Tim Allen Becky Sweger

 Joshua  Jana  John-David  Jill  Jessa

 Pick one and enforce it: camelCase, PascalCase, under_scores 

The key, the whole key, and nothing but the key.

http://en.wikipedia.org/wiki/File:Insertion_anomaly.svg

 Avoid data duplication  Let end users make their

Find a book in Van Pelt without a card catalog…

Indexes in SQL Server are organized as B-trees member name

member id non-clustered index (image from Clustered Indexes vs. Nonclustered

 Unique  Full-text  Included columns  Indexed views

 Primary key = unique index (clustered or non- clustered)

 Clustered index on every table member name non-clustered index

 Integer primary key on every table /* take checkpoint,

 Results

 Columns you join on: indexed integers are your friend!

 TINY: 8 bits (0 – 255): 01010101  SMALL

 Large table: millions of rows  A record for

 From the Query menu:  SET STATISTICS TIME, SET

 Dynamic Management Views & Functions: http://msdn.microsoft.com/en-us/magazine/cc135978.aspx

 Analyzes workloads

Blame the SQL Admins!

 Don’t skimp on column length: Yes/No? Maybe. Open/Closed? Under

Logical deletes vs. physical deletes

 Foreign Keys  Unique indexes  Check constraints 

 “Temporary” projects  Balance between today’s pragmatism and tomorrow’s

Experiment in SQL Server Management Studio to improve your times

 Solicit feedback on database design before coding starts, not

1. SQL Server Books Online http://msdn.microsoft.com/en-us/library/ms130214.aspx 2. SQL Server 2008