Slide 1

Slide 1 text

Index management in shallow depth Andrea Giuliano @bit_shark

Slide 2

Slide 2 text

Architecture of a DBMS Andrea Giuliano @bit_shark

Slide 3

Slide 3 text

Disk manager The disk manager provides the following commands for a page • allocate a page • deallocate a page • read a page • write a page The size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk I/O Andrea Giuliano @bit_shark

Slide 4

Slide 4 text

Buffer manager is the software layer responsible for bringing pages from disk to main memory as needed and manages the available main memory by partitioning it into a collection of pages the buffer pool Andrea Giuliano @bit_shark

Slide 5

Slide 5 text

Mysql indexes are stored! in B-trees

Slide 6

Slide 6 text

B-tree structure data entry: record stored in an index file data record: record stored in a database file Andrea Giuliano @bit_shark

Slide 7

Slide 7 text

Clustered index data entries and data records have the same (or close) order Andrea Giuliano @bit_shark

Slide 8

Slide 8 text

Unclustered index data entries and data records have the different order criteria Andrea Giuliano @bit_shark

Slide 9

Slide 9 text

Dense index An index is dense if every value of the search key that appears in the data file appears also in at least one data entry of the index Andrea Giuliano @bit_shark

Slide 10

Slide 10 text

An index is sparse if every value of the search key that appears in the data entry points to a page of the data record Sparse index Andrea Giuliano @bit_shark

Slide 11

Slide 11 text

Search in a b-tree Cost: logF n fan-out n leaves Andrea Giuliano @bit_shark

Slide 12

Slide 12 text

Insert and delete ..too deep Insert and delete operations must keep the tree balanced towards split, redistribution and coalesce techniques. Andrea Giuliano @bit_shark

Slide 13

Slide 13 text

How can I compute I/O accesses? Andrea Giuliano @bit_shark

Slide 14

Slide 14 text

ask for code, author, publisher af all books with a given cost how many page accesses do we need to answer to the query? Book case Query: • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • dense non-clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark

Slide 15

Slide 15 text

Let’s build the index structure ! • we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20 Book case • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • dense non-clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark

Slide 16

Slide 16 text

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries) How many leaves are there in the tree?
 2.000.000/13 = 153.846 leaves Book case … fan-out: 20 Andrea Giuliano @bit_shark

Slide 17

Slide 17 text

There are 153.846 leaves In order to go to the leaves we need Book case log20 (153.846) = 4 I/O page accesses … fan-out: 20 Andrea Giuliano @bit_shark

Slide 18

Slide 18 text

Book case Remember, we have on average 200 records with the same value of the attribute cost therefore 200/13 = 15 pages (on average)
 We need to visit these leaves because the index is dense and for each tuple we have to access the 200 data record in order to obtain the other attributes … fan-out: 20 Andrea Giuliano @bit_shark

Slide 19

Slide 19 text

The total cost is: 
 4 + 15 + 200 = 219 I/O accesses Book case ~ 3 sec … fan-out: 20 Andrea Giuliano @bit_shark

Slide 20

Slide 20 text

• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • sparse clustering B+-tree index with search key cost ask for code, author, publisher af all books with a given cost how many page accesses do we need to answer to the query? BOOK code author cost publisher Book case Query: Andrea Giuliano @bit_shark

Slide 21

Slide 21 text

Let’s build the index structure ! • we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20 Book case • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • sparse clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark

Slide 22

Slide 22 text

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)
 BUT each data entry points to a data record page (and not to a tuple) How many data record pages do we have?
 2.000.000/10 = 200.000 data record pages Book case … fan-out: 20 Andrea Giuliano @bit_shark

Slide 23

Slide 23 text

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)
 BUT each data entry points to a data record (and not to a tuple) How many leaves there are in the tree?
 200.000/13 = 15.384 leaves Book case … fan-out: 20 Andrea Giuliano @bit_shark

Slide 24

Slide 24 text

There are 15.384 leaves In order to go to the leaves we need Book case log20 (15.384) = 3 I/O pages accesses Remember, we have on average 200 data records with the same value of the attribute cost therefore 200/10 = 20 data record pages to visit … fan-out: 20 Andrea Giuliano @bit_shark

Slide 25

Slide 25 text

The total cost is: 
 3 + 20 = 23 I/O accesses Book case ~ 0.3 sec … fan-out: 20 Andrea Giuliano @bit_shark

Slide 26

Slide 26 text

And what if the attributes we want
 were part of the search key? Book case Andrea Giuliano @bit_shark

Slide 27

Slide 27 text

In the worst case we have to visit all the 2.000.000 tuples Book case without index ~ 50 min Andrea Giuliano @bit_shark

Slide 28

Slide 28 text

Ο λογος δηλοι οτι Think before doing

Slide 29

Slide 29 text

? Thanks! Andrea Giuliano @bit_shark

Slide 30

Slide 30 text

References: https://www.flickr.com/photos/james_wheeler/9340597900/sizes/o/in/photolist-fep1ko-bQByHk- duQ4Qr-82aKA9-82aL6y-8Tn6uc-iPzADZ-99etoQ-cZy6e9-jyqdnW-bxHjLf-8gP59X-cZDq3h-cZDq9d- cZDq8N-cZDq7d-cZDqwy-cZDqym-cZDqCf-cZDqAo-cZDqsS-cZDqnQ-cZDqey-cZDqkA-cZDqkJ- cZDqLh-7Dg8pp-a7f1QC-a7c8rK-7Dg7n6-gCbBVr-9FZ4J1-e6XCpX-aZnsGv-ecTv5D-atFACM- gjXozL-9LBjtC-knoEf8-8LGGqw-a8Hw3M-gvL3bp-a7gmG6-aju6p2- brQ76S-7Ckbm1-85XaXe-8JBcwN-9oYU3p-a3VsvR-atFAup/ http://www.woking.gov.uk/images/instances/00004A290FD4.C0A801BA.000079A7.0015.jpg http://assets.20bits.com/20080513/b-tree.png, http://dblab.cs.toronto.edu/courses/443/2014/basic-index/dense-index.png http://dblab.cs.toronto.edu/courses/443/2014/basic-index/sparse-index.png http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg, http:// www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg http://webhostinggeeks.com/blog/wp-content/uploads/2012/07/611157_small.jpg https://farm7.staticflickr.com/6237/6230474283_50d1f0f4ac_b.jpg, https://www.flickr.com/photos/ javiercosio/6230474283/sizes/l/in/photolist- auyNWp-9GRjnM-9GRjmZ-9GRjFz-9GRjvV-9GUcPq-9GRjz8-9GRjm4-9GUcTb-9GRjCt-9GUcQC-9GUcY m-9GRjZD-9GRk1Z-9GUcMU-9GUcGh-9GRjsg-9GRjYZ-9GRjTF-9GRjGe-9GRjNe-9GRjBz-9GUcNs-9GU d25-9GUcKS-9GRjPn-9GRjRg-9GUcBh-9GUcVf-9GUcxj-9GUcuu-brLe7G-e8s4Cw- fyi4Rj-83LyYW-83HuFg-83LyLm-83LzUY-83Htrv-83Hv2H-83LBBb-83LAg9-83LBhQ-83Hw8t-83HtKD-83H sYk-afT6uk-cwVhL1-ceVgGC-8tFezr-8SeW9d/