Elegant and Efficient Database Design

by Becky Sweger

Slide 1

Slide 1 text

Tim Allen Becky Sweger

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

 Joshua  Jana  John-David  Jill  Jessa  Jinger  Josiah  Joy-Anna  Jedidiah  Jeremiah  Jason  James  Justin  Jackson  Johannah  Jennifer

Slide 4

Slide 4 text

 Pick one and enforce it: camelCase, PascalCase, under_scores  Why?  Cleaner code  Logical joins  Sanity of future developers and future you  Explicitly name constraints  Avoid keywords as column names  WRDS Fail: group, client, school, subscriber, and institution are all used for the same data entity

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

The key, the whole key, and nothing but the key. So help me Codd.

Slide 7

Slide 7 text

http://en.wikipedia.org/wiki/File:Insertion_anomaly.svg

Slide 8

Slide 8 text

 Avoid data duplication  Let end users make their own changes  Avoid data anomalies  Third-party tools rely on normalized data

Slide 9

Slide 9 text

Find a book in Van Pelt without a card catalog…

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Indexes in SQL Server are organized as B-trees member name clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)

Slide 12

Slide 12 text

member id non-clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)

Slide 13

Slide 13 text

 Unique  Full-text  Included columns  Indexed views  XML  Filtered (new for 2008)  Spatial (new for 2008) http://msdn.microsoft.com/en-us/library/ms175049.aspx

Slide 14

Slide 14 text

 Primary key = unique index (clustered or nonclustered)

Slide 15

Slide 15 text

 Clustered index on every table member name non-clustered index (image from Clustered Indexes vs. Nonclustered Indexes in SQL Server:http://tr.im/AeU5)

Slide 16

Slide 16 text

 Integer primary key on every table /* take checkpoint, clear buffers & cache */ SELECT s.term, s.section_id, COUNT(penn_id) FROM flat_section s JOIN flat_enrollment e ON s.section_id = e.section_id AND s.term = e.term GROUP BY s.term, s.section_id

Slide 17

Slide 17 text

 Results

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

 Columns you join on: indexed integers are your friend!  How are the columns used in queries?  Cardinality: 1:1, 1:many, many:many  Data type  Indexing multiple columns: moderation  Goal 1: performance!  Goal 2: smallest index file possible.

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

 TINY: 8 bits (0 – 255): 01010101  SMALL INTEGER: 16 bits (0 – 65536): 0101010101010101  INTEGER: 32 bits (0 – 16777215): 01010101010101010101010101010101  VARCHAR ‘philadelphia’: 104 bits, at least (encoding UTF-8): 01010101010101010101010101010101010101010101010101010101 010101010101010101010101010101010101010101010101  Consider joining 4 tables on ‘philadelphia’: 4 initial lookups on indexes, 8 times as bulky and less cacheable as a small integer.  Index on VARCHARs only when needed  Keeps index files smaller and less chance of fragmentation; fragmented index files make Matt Frew’s life hellish (no, that is NOT a positive!)  Consider purpose: don’t index for a one time script or report

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

 Large table: millions of rows  A record for each WRDS query  VARCHAR columns that should be INTEGERS  Report requests for subscribers asking number of queries for a date range group by library and file  Before indexing, full table scan: 30 secs per query  Index added: subscriber, query date, library, file  After indexing, without table scan: 0.02 secs per query

Slide 24

Slide 24 text

 From the Query menu:  SET STATISTICS TIME, SET STATISTICS IO  Include Actual Execution Plan

Slide 25

Slide 25 text

 Dynamic Management Views & Functions: http://msdn.microsoft.com/en-us/magazine/cc135978.aspx

Slide 26

Slide 26 text

 Analyzes workloads

Slide 27

Slide 27 text

Blame the SQL Admins!

Slide 28

Slide 28 text

 Don’t skimp on column length: Yes/No? Maybe. Open/Closed? Under construction. Black/White? Grey.  Know required level of precision, and leave yourself room to grow.  Accuracy to the day, minute, millisecond?

Slide 29

Slide 29 text

Logical deletes vs. physical deletes

Slide 30

Slide 30 text

 Foreign Keys  Unique indexes  Check constraints  Default constraints  Triggers image courtesy of My New Filing Technique is Unstoppable by David Rees: http://www.mnftiu.cc/2002/11/26/filing-009/(nsfw)

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

 “Temporary” projects  Balance between today’s pragmatism and tomorrow’s pain  Code review sooner

Slide 33

Slide 33 text

Experiment in SQL Server Management Studio to improve your times and execution plans

Slide 34

Slide 34 text

 Solicit feedback on database design before coding starts, not after.  Ask for opinions, and share your opinions!  More eyes = better database design  More ideas = better database design  Anyone have any tips… or questions?

Slide 35

Slide 35 text

1. SQL Server Books Online http://msdn.microsoft.com/en-us/library/ms130214.aspx 2. SQL Server 2008 Query Performance Turning Distilled by Grant Fritchey and Sajal Dam 3. Comparing Tables Organized with Clustered Indexes versus Heaps http://technet.microsoft.com/en-us/library/cc917672.aspx 4. MS Index Design Guidelines http://msdn.microsoft.com/en-us/library/ms191195.aspx 5. Clustered Indexes vs. Nonclustered Indexes in SQL Server http://digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=443e9774-7d26-422d-a2f1- dbcafbb1e1fc&pc=5 6. SQL Server Execution Plans (free e-book, registration required) http://www.sqlservercentral.com/articles/books/65831/ 7. Uncover Hidden Data to Optimize Application Performance http://msdn.microsoft.com/en-us/magazine/cc135978.aspx 8. My New Filing Technique is Unstoppable (NSFW) http://www.mnftiu.cc/2002/11/26/filing-001/ 9. SQL in the Wild http://sqlinthewild.co.za/ 10. Journey to SQL Authority With Pinal Dave http://blog.sqlauthority.com/