Slide 1

Slide 1 text

I N D E X M A N A G E M E N T I N D E P T H A n d re a G i u l i a n o @ b i t _ s h a r k

Slide 2

Slide 2 text

D ATA B A S E M A N A G E M E N T S Y S T E M S

Slide 3

Slide 3 text

T H E A R C H I T E C T U R E D A TA B A S E M A N A G E M E N T S Y S T E M S web forms Application Front Ends SQL Interface SQL COMMANDS Plan Executor Parser Optimizer Operator Evaluator Access File Manager SQL Engine Buffer Manager Disk Manager Transaction Manager Lock Manager Recovery Manager Concurrency Control DBMS

Slide 4

Slide 4 text

S Q L E N G I N E A R C H I T E C T U R E O F A D B M S web forms Application Front Ends SQL Interface SQL COMMANDS Plan Executor Parser Optimizer Operator Evaluator Access File Manager SQL Engine Buffer Manager Disk Manager Transaction Manager Lock Manager Recovery Manager Concurrency Control DBMS

Slide 5

Slide 5 text

A C C E S S F I L E M A N A G E R A R C H I T E C T U R E O F A D B M S web forms Application Front Ends SQL Interface SQL COMMANDS Plan Executor Parser Optimizer Operator Evaluator Access File Manager SQL Engine Buffer Manager Disk Manager Transaction Manager Lock Manager Recovery Manager Concurrency Control DBMS

Slide 6

Slide 6 text

B U F F E R M A N A G E R A R C H I T E C T U R E O F A D B M S web forms Application Front Ends SQL Interface SQL COMMANDS Plan Executor Parser Optimizer Operator Evaluator Access File Manager SQL Engine Buffer Manager Disk Manager Transaction Manager Lock Manager Recovery Manager Concurrency Control DBMS

Slide 7

Slide 7 text

D I S K M A N A G E R A R C H I T E C T U R E O F A D B M S web forms Application Front Ends SQL Interface SQL COMMANDS Plan Executor Parser Optimizer Operator Evaluator Access File Manager SQL Engine Buffer Manager Disk Manager Transaction Manager Lock Manager Recovery Manager Concurrency Control DBMS

Slide 8

Slide 8 text

U N I T O F I N F O R M AT I O N Page size ranges from 2Kb to 64Kb I/O of pages dominates the cost of the operations

Slide 9

Slide 9 text

F I L E O R G A N I Z AT I O N method of arranging the records in a file

Slide 10

Slide 10 text

–Sars, Roebuck, and Co, Consumers’ Guide, 1897 “If you don’t find it in the index, look very carefully through the entire catalog.”

Slide 11

Slide 11 text

I N D E X E S organize data records on disk to optimize retrieval operations

Slide 12

Slide 12 text

I N D E X E S data entry record stored in an index file data record record stored in a database file

Slide 13

Slide 13 text

P R O P E R T I E S O F A N I N D E X 1. Structure of data entries 2. Clustered/unclustered 3. Primary/secondary 4. Dense/sparse 5. Organization of the index

Slide 14

Slide 14 text

S T R U C T U R E O F D ATA E N T R I E S k*: data entry whose search key value is k 1. k* is a data record -extreme case 2. k* is a pair (k, rid) -the index file is independent from the data file 3. k* is a pair -the index file is independent from the data file -better use of space but variable-length data entries P R O P E R T I E S O F A N I N D E X

Slide 15

Slide 15 text

C L U S T E R E D / U N C L U S T E R E D I N D E X unclustered clustered data record file index file data records data entries data entries index entries data records P R O P E R T I E S O F A N I N D E X

Slide 16

Slide 16 text

P E R F O R M A N C E data record file index file data records data entries Clustered only few pages have to be retrieved Unclustered as many data page I/Os as the number of data entries C L U S T E R E D / U N C L U S T E R E D I N D E X

Slide 17

Slide 17 text

P R I M A RY A N D S E C O N D A RY I N D E X P R O P E R T I E S O F A N I N D E X

Slide 18

Slide 18 text

10 20 30 40 10 20 30 40 page pointers page in index file page in data file P R O P E R T I E S O F A N I N D E X D E N S E I N D E X

Slide 19

Slide 19 text

10 30 50 70 10 20 30 40 50 60 70 80 page pointers page in index file page in data file P R O P E R T I E S O F A N I N D E X S PA R S E I N D E X

Slide 20

Slide 20 text

P R O P E R T I E S O F A N I N D E X O R G A N I Z AT I O N O F T H E I N D E X sorted index the index is a sorted file tree-based index the index is a tree hash-based index the index is a function from search key values to record addresses

Slide 21

Slide 21 text

I S A M A N D B + - T R E E T R E E - B A S E D I N D E X ISAM used when the relation is static: no insertion and deletion on the tree b+-tree effective in dynamic situations with insertion and deletion

Slide 22

Slide 22 text

Data Pages Index Pages Overflow Pages Page Allocation in ISAM … … Non-leaf pages Leaf pages Primary pages Overflow page I S A M T R E E - B A S E D I N D E X

Slide 23

Slide 23 text

I S A M : O P E R AT I O N S T R E E - B A S E D I N D E X search identical to b+-tree (more on this soon). insertion find the right position on the tree and write the key (possible overflow pages). deletion remove the entry and the empty overflow page if needed. Leave the empty primary page as it is.

Slide 24

Slide 24 text

B + - T R E E T R E E - B A S E D I N D E X Index entries Data entries Index file

Slide 25

Slide 25 text

12 78 3 9 19 56 86 94 33 44 Daniel, 22, 6003 Ashby, 25, 3000 Basu, 33, 4003 Rossi, 44, 3000 Bianchi, 50, 5004 B + - T R E E T R E E - B A S E D I N D E X … … … … … … … …

Slide 26

Slide 26 text

12 78 3 9 19 56 86 94 33 44 Daniel, 22, 6003 Ashby, 25, 3000 Basu, 33, 4003 Rossi, 44, 3000 Bianchi, 50, 5004 … … … … … … … … B + - T R E E : S E A R C H T R E E - B A S E D I N D E X Start search

Slide 27

Slide 27 text

B + - T R E E : S E A R C H T R E E - B A S E D I N D E X 12 78 3 9 19 56 86 94 33 44 Daniel, 22, 6003 Ashby, 25, 3000 Basu, 33, 4003 Rossi, 44, 3000 Bianchi, 50, 5004 … … … … … … … … Start search A B L1 L2 L3 find all data entries with 24 < key < 50

Slide 28

Slide 28 text

12 78 3 9 19 56 33 44 Daniel, 22, 6003 Ashby, 25, 3000 Basu, 33, 4003 Rossi, 44, 3000 Bianchi, 50, 5004 … … … … … … Start search A B L1 L2 L3 S E A R C H : C O S T T R E E - B A S E D I N D E X f: fanout h: height m: leaves (f h) Cost of a search [logF m]

Slide 29

Slide 29 text

T R E E - B A S E D I N D E X 12 78 3 9 19 56 33 44 Daniel, 22, 6003 Ashby, 25, 3000 Basu, 33, 4003 Rossi, 44, 3000 Bianchi, 50, 5004 … … … … … … Start search A B L1 L2 L3 S E A R C H : C O S T f = 3 h = 3 m = 27 I/Os [log3 27] = 3

Slide 30

Slide 30 text

B + - T R E E : I N S E R T T R E E - B A S E D I N D E X 13 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Insertion of a data record with search key value 8

Slide 31

Slide 31 text

2* 3* 5* 7* 8* B + - T R E E : I N S E R T T R E E - B A S E D I N D E X … 5 data entry is copied up

Slide 32

Slide 32 text

B + - T R E E : I N S E R T T R E E - B A S E D I N D E X 17 data entry is pushed up 5 13 24 30

Slide 33

Slide 33 text

B + - T R E E : I N S E R T T R E E - B A S E D I N D E X 17 5* 7* 8* 14 * 16 * 19 * 20 * 22 * 24 * 27 * 29 * 33 * 34 * 38 * 39 * 5 13 24 30 2* 3* The resulting tree after the insertion of a data record with search key value 8.

Slide 34

Slide 34 text

B + - T R E E : D E L E T E T R E E - B A S E D I N D E X 17 5* 7* 8* 14 * 16 * 22 24 27 29 33 * 34 * 38 * 39 * 5 13 27 30 2* 3* The resulting tree after deleting entries 19* and 20*

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

- 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - dense unclustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E D E N S E U N C L U S T E R E D I N D E X

Slide 37

Slide 37 text

SELECT code, author, publisher FROM Book WHERE cost = %cost% - 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - dense unclustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E D E N S E U N C L U S T E R E D I N D E X

Slide 38

Slide 38 text

SELECT code, author, publisher FROM Book WHERE cost = %cost% - 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - dense unclustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher how many page accesses do we need to answer to the query? E X A M P L E D E N S E U N C L U S T E R E D I N D E X

Slide 39

Slide 39 text

E X A M P L E D E N S E U N C L U S T E R E D I N D E X - 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - dense unclustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher Let’s build the index structure each record has 4 field so in each page there are 40 fields 20 data entries fit in one leaf page of the index the tree has a fan-out of 20

Slide 40

Slide 40 text

E X A M P L E D E N S E U N C L U S T E R E D I N D E X fanout: 20 occupancy factor of 67% leads to 13 data entries in the leaves How many leaves are there in the tree? 2.000.000/13 = 153.846

Slide 41

Slide 41 text

log20 (153.846) = 4 I/O page accesses E X A M P L E D E N S E U N C L U S T E R E D I N D E X How many I/Os are needed to go to the leaves? leaves: 153.846 fanout: 20

Slide 42

Slide 42 text

E X A M P L E D E N S E U N C L U S T E R E D I N D E X - 200 records with the same value of the attribute cost (on average) - 13 data entries in the leaves - dense unclustered b+-tree index with search key cost Book code author cost publisher SELECT code, author, publisher FROM Book WHERE cost = %cost% - 15 pages (200/13) to visit for reaching data records with the same cost value (on average)

Slide 43

Slide 43 text

~ 3 sec E X A M P L E D E N S E U N C L U S T E R E D I N D E X Costs path to the leaves: 4 I/Os leaves access: 15 I/Os data records: 200 I/Os Total cost 4 + 15 + 200 = 219 I/Os

Slide 44

Slide 44 text

- 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - sparse clustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E S PA R S E C L U S T E R E D I N D E X

Slide 45

Slide 45 text

- 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - sparse clustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E S PA R S E C L U S T E R E D I N D E X SELECT code, author, publisher FROM Book WHERE cost = %cost%

Slide 46

Slide 46 text

- 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - sparse clustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E S PA R S E C L U S T E R E D I N D E X SELECT code, author, publisher FROM Book WHERE cost = %cost% how many page accesses do we need to answer to the query?

Slide 47

Slide 47 text

- 2.000.000 records - 200.000 pages - 10 data record in a page - 200 records with the same value of the attribute cost (on average) - sparse clustered b+-tree index with search key cost - alternative 2 (k*, rid) Book code author cost publisher E X A M P L E S PA R S E C L U S T E R E D I N D E X Let’s build the index structure each record has 4 field so in each page there are 40 fields 20 data entries fit in one leaf page of the index the tree has a fan-out of 20

Slide 48

Slide 48 text

E X A M P L E S PA R S E C L U S T E R E D I N D E X fanout: 20 How many pages store 2.000.000 data records? 2.000.000/10 = 200.000 each data entry points to a data record page

Slide 49

Slide 49 text

E X A M P L E S PA R S E C L U S T E R E D I N D E X fanout: 20 occupancy factor of 67% leads to 13 data entries in the leaves How many leaves are there in the tree? 200.000/13 = 15.384

Slide 50

Slide 50 text

log20 (15.384) = 3 I/O page accesses E X A M P L E S PA R S E C L U S T E R E D I N D E X How many I/Os are needed to go to the leaves? leaves: 15.384 fanout: 20 fanout: 20

Slide 51

Slide 51 text

E X A M P L E S PA R S E C L U S T E R E D I N D E X - 200 records with the same value of the attribute cost (on average) - 10 data records in a page - sparse clustered b+-tree index with search key cost Book code author cost publisher SELECT code, author, publisher FROM Book WHERE cost = %cost% - 20 pages (200/10) of data records to visit 
 (on average)

Slide 52

Slide 52 text

~ 0.3 sec E X A M P L E S PA R S E C L U S T E R E D I N D E X Costs path to the leaves: 3 I/Os data records: 20 I/Os Total cost 3 + 20 = 23 I/Os

Slide 53

Slide 53 text

E X A M P L E A VA R I A N T What if the attributes were part of the search key? SELECT code, author, publisher FROM Book WHERE cost = %cost%

Slide 54

Slide 54 text

E X A M P L E W I T H O U T I N D E X In the worst case we have to visit 2.000.000 records SELECT code, author, publisher FROM Book WHERE cost = %cost% ~ 50 min

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

$ W H O A M I Andrea Giuliano @bit_shark www.andreagiuliano.it

Slide 57

Slide 57 text

joind.in/13333 Please rate the talk!

Slide 58

Slide 58 text

Ramakrishan, Gehrke “Database Management Systems” Assets: https://farm4.staticflickr.com/3577/3492185538_a39dbb4511_b_d.jpg https://farm3.staticflickr.com/2852/10740309163_12f6a671cc_k_d.jpg https://farm3.staticflickr.com/2456/3835365695_5e515a3492_b_d.jpg https://farm5.staticflickr.com/4048/4332381194_3cfbed7f8e_b_d.jpg https://farm5.staticflickr.com/4117/4800819674_3cf963deaa_b_d.jpg https://farm5.staticflickr.com/4141/4772464179_0672159bbd_b.jpg https://farm9.staticflickr.com/8530/8574154090_bd14f9ccbf_o_d.jpg https://farm8.staticflickr.com/7370/10847923014_2b3fc30cea_k_d.jpg R E F E R E N C E S