These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses.
a page • allocate a page • deallocate a page • read a page • write a page The size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk I/O Andrea Giuliano @bit_shark
from disk to main memory as needed and manages the available main memory by partitioning it into a collection of pages the buffer pool Andrea Giuliano @bit_shark
given cost how many page accesses do we need to answer to the query? Book case Query: • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • dense non-clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark
each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20 Book case • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • dense non-clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark
have 13 data entries in the leaves (each of which can contain 20 data entries) How many leaves are there in the tree? 2.000.000/13 = 153.846 leaves Book case … fan-out: 20 Andrea Giuliano @bit_shark
the same value of the attribute cost therefore 200/13 = 15 pages (on average) We need to visit these leaves because the index is dense and for each tuple we have to access the 200 data record in order to obtain the other attributes … fan-out: 20 Andrea Giuliano @bit_shark
record in a page • 200 records with the same value of the attribute cost (on average) • sparse clustering B+-tree index with search key cost ask for code, author, publisher af all books with a given cost how many page accesses do we need to answer to the query? BOOK code author cost publisher Book case Query: Andrea Giuliano @bit_shark
each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20 Book case • 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on average) • sparse clustering B+-tree index with search key cost BOOK code author cost publisher Andrea Giuliano @bit_shark
have 13 data entries in the leaves (each of which can contain 20 data entries) BUT each data entry points to a data record page (and not to a tuple) How many data record pages do we have? 2.000.000/10 = 200.000 data record pages Book case … fan-out: 20 Andrea Giuliano @bit_shark
have 13 data entries in the leaves (each of which can contain 20 data entries) BUT each data entry points to a data record (and not to a tuple) How many leaves there are in the tree? 200.000/13 = 15.384 leaves Book case … fan-out: 20 Andrea Giuliano @bit_shark
leaves we need Book case log20 (15.384) = 3 I/O pages accesses Remember, we have on average 200 data records with the same value of the attribute cost therefore 200/10 = 20 data record pages to visit … fan-out: 20 Andrea Giuliano @bit_shark