Slide 1

Slide 1 text

Make and Learn RDBMS Yuichiro Kaneko(@yui-knk) 2019/03/22 Rails Developers Meetup 2019 Day 1

Slide 2

Slide 2 text

Thank you !!

Slide 3

Slide 3 text

4FMG*OUSPEVDUJPO • Yuichiro Kaneko • Arm Treasure Data • CDP team (Writing Rails application) • CRuby Committer 2015/12~ • GitHub (yui-knk)

Slide 4

Slide 4 text

https://www.treasuredata.com/company/careers/jobs/?team=Engineering 8FBSFIJSJOH

Slide 5

Slide 5 text

https://www.treasuredata.com/company/careers/jobs/?team=Engineering 8FBSFIJSJOH

Slide 6

Slide 6 text

5PEBZTUPQJD • Implementation of RDBMS, especially PostgreSQL • How to create DB step by step • Will not talk about CRuby/Rails!

Slide 7

Slide 7 text

/PUDPWFSFE • How to use PostgreSQL efficiently in your Rails app • What RDBMS is best for your Rails app • How to design perfect database/table schema

Slide 8

Slide 8 text

#BDLHSPVOE • Our customer can store data and execute queries to their data by Treasure Data • It can be said we are developing each part of DB • Knowledge of how DB works may be useful to understand/guess what other team members do • Creating a subset of DB helps me understand DB

Slide 9

Slide 9 text

5PCFIPOFTU +VTUGPS'VO

Slide 10

Slide 10 text

5PEBZT(PBM • Grasp the whole picture of DB • Understand which part of DB we can omit when we create DB

Slide 11

Slide 11 text

*NDSFBUJOH • https://github.com/yui-knk/minidb • About 3400 lines • Implemented by Rust • Use PostgreSQL as reference • d06fe6ce2c79420fd19ac89ace81b66579f08493 (tag: REL_11_1) • First commit date is 2018/12/31

Slide 12

Slide 12 text

4VQQPSUFE • SELECT/INSERT/DELETE/WHERE/COUNT/ORDER BY • Storage Manager • Support multiple pages • Buffer Manager • Has parser • Signed 4 bytes integer is only supported type • Use as command line tool

Slide 13

Slide 13 text

/PU4VQQPSUFE • UPDATE/JOIN • Projection • Aggregation • Transaction/Concurrency control • Vacuum • Optimizer • Multiple segments • INDEX • Launch as server • ...

Slide 14

Slide 14 text

8IBUDBO*EPXJUIlNJOJECz $ minidb execute 'insert into db1.table1 (id, age) values (1, 12), (2, 13), (3, 12), (4, 20), (5, 21)' $ minidb execute 'select * from db1.table1' "1" "12" "2" "13" "3" "12" "4" "20" "5" "21" $ minidb execute 'select count() from db1.table1' Count: 5

Slide 15

Slide 15 text

8IBUDBO*EPXJUIlNJOJECz $ minidb execute 'select * from db1.table1 order by age' "1" "12" "3" "12" "2" "13" "4" "20" "5" "21" $ minidb execute 'select count() from db1.table1 where age = 12' Count: 2 $ minidb execute 'delete from db1.table1 where age = 12' Deleted records: 2 $ minidb execute 'select count() from db1.table1 where age = 12' Count: 0

Slide 16

Slide 16 text

+---------------+ | Shared Memory | +---------------+ +------------+ | Postmaster | +------------+ +---------+ | Backend | +---------+ +---------+ | Backend | +---------+ +---------+ | Backend | +---------+ +-------------------+ | Background Writer | +-------------------+ | Checkpointer | +-------------------+ | WAL Writer | +-------------------+ | WAL Receiver | +-------------------+ | Checkpointer | +-------------------+ +----------------------+ | Autovacuum | +----------------------+ | WAL Archiver | +----------------------+ | Statistics Collector | +----------------------+ | System Logger | +----------------------+ fork fork

Slide 17

Slide 17 text

+-------------------+ | Background Writer | +-------------------+ | Checkpointer | +-------------------+ | WAL Writer | +-------------------+ | WAL Receiver | +-------------------+ | Checkpointer | +-------------------+ +----------------------+ | Autovacuum | +----------------------+ | WAL Archiver | +----------------------+ | Statistics Collector | +----------------------+ | System Logger | +----------------------+ +------------------+ | main | | * log | | * write to disk | +------------------+

Slide 18

Slide 18 text

"HFOEB • Part 1: The system catalogs and init command • Part 2: Storage Manager and Buffer Manager • Part 3: Query Execution • Part 4: Conclusion

Slide 19

Slide 19 text

Part 1: The system catalogs and init command

Slide 20

Slide 20 text

%JSFDUPSZTUSVDUVSFPG 1PTUHSF42-EBUBCBTFEJSFDUPSZ $ ls -l total 104 -rw------- 1 yuichirokaneko admin 3 1 17 22:19 PG_VERSION drwx------ 7 yuichirokaneko admin 224 1 24 21:05 base drwx------ 61 yuichirokaneko admin 1952 3 10 10:44 global drwx------ 2 yuichirokaneko admin 64 1 17 22:19 pg_commit_ts ... drwx------ 4 yuichirokaneko admin 128 1 17 22:19 pg_wal drwx------ 3 yuichirokaneko admin 96 1 17 22:19 pg_xact -rw------- 1 yuichirokaneko admin 88 1 17 22:19 postgresql.auto.conf -rw------- 1 yuichirokaneko admin 23780 1 17 22:19 postgresql.conf -rw------- 1 yuichirokaneko admin 80 1 17 22:20 postmaster.opts -rw------- 1 yuichirokaneko admin 90 3 10 10:43 postmaster.pid

Slide 21

Slide 21 text

%JSFDUPSZTUSVDUVSFPG 1PTUHSF42-EBUBCBTFEJSFDUPSZ $ ls -l total 104 -rw------- 1 yuichirokaneko admin 3 1 17 22:19 PG_VERSION drwx------ 7 yuichirokaneko admin 224 1 24 21:05 base drwx------ 61 yuichirokaneko admin 1952 3 10 10:44 global drwx------ 2 yuichirokaneko admin 64 1 17 22:19 pg_commit_ts ... drwx------ 4 yuichirokaneko admin 128 1 17 22:19 pg_wal drwx------ 3 yuichirokaneko admin 96 1 17 22:19 pg_xact -rw------- 1 yuichirokaneko admin 88 1 17 22:19 postgresql.auto.conf -rw------- 1 yuichirokaneko admin 23780 1 17 22:19 postgresql.conf -rw------- 1 yuichirokaneko admin 80 1 17 22:20 postmaster.opts -rw------- 1 yuichirokaneko admin 90 3 10 10:43 postmaster.pid I’m talking about PostgreSQL

Slide 22

Slide 22 text

%JSFDUPSZTUSVDUVSFPG 1PTUHSF42-EBUBCBTFEJSFDUPSZ • Go to database directory • "global": "Subdirectory containing cluster-wide tables, such as pg_database" [1] • "base": "Subdirectory containing per-database subdirectories" [1] • [1]: https://www.postgresql.org/docs/11/storage-file-layout.html $ ls -l drwx------ 7 224 1 24 21:05 base drwx------ 61 1952 3 10 10:44 global

Slide 23

Slide 23 text

0JE • Oid means "Object ID” • `typedef unsigned int Oid;`

Slide 24

Slide 24 text

$ ls -l global total 1160 -rw------- 1 yuichirokaneko admin 8192 1 17 22:19 1136 -rw------- 1 yuichirokaneko admin 24576 1 17 22:19 1136_fsm -rw------- 1 yuichirokaneko admin 8192 1 17 22:19 1136_vm ... -rw------- 1 yuichirokaneko admin 8192 1 24 21:10 1262 -rw------- 1 yuichirokaneko admin 24576 1 17 22:19 1262_fsm -rw------- 1 yuichirokaneko admin 8192 1 17 22:19 1262_vm ... -rw------- 1 yuichirokaneko admin 22088 3 10 10:44 pg_internal.init # select oid, relname from pg_class where oid = 1262 order by oid asc; oid | relname ------+------------- 1262 | pg_database (1 row) Oid of table

Slide 25

Slide 25 text

CBTFEJSFDUPSZ • Contains directories whose name is same with Oid • Directory 16388 is for lusiadas database $ ls -l base/ total 0 drwx------ 296 9472 2 10 21:15 1 drwx------ 296 9472 1 17 22:19 13363 drwx------ 296 9472 3 9 22:12 13364 drwx------ 297 9504 2 10 21:15 16384 drwx------ 299 9568 3 10 10:44 16388 # select oid, datname from pg_database order by oid asc; oid | datname -------+------------ 1 | template1 13363 | template0 13364 | postgres 16384 | example-db 16388 | lusiadas (5 rows)

Slide 26

Slide 26 text

CBTFEJSFDUPSZ • Directories contain data files • File 16389 is for films table $ ls -l base/16388 total 15784 -rw------- 1 8192 1 24 21:05 112 -rw------- 1 8192 1 24 21:05 113 -rw------- 1 73728 1 24 21:10 1247 -rw------- 1 24576 1 24 21:05 1247_fsm -rw------- 1 8192 1 24 21:10 1247_vm ... -rw------- 1 8192 2 23 22:31 16389 ... # select oid, relname from pg_class where oid = 16389 order by oid asc; oid | relname -------+--------- 16389 | films (1 row) Oid of table

Slide 27

Slide 27 text

*NQMFNFOUJOJUDPNNBOE $ minidb --base_dir /tmp/minidb init /tmp/minidb !"" base #"" global !"" mini_attribute $ #"" data !"" mini_class $ #"" data !"" mini_database $ #"" data #"" oid Columns Tables Databases

Slide 28

Slide 28 text

*NQMFNFOUJOJUDPNNBOE • Creating some system catalogs which are the place RDBMS stores schema metadata • "mini_database": Database information • "mini_class": Table information • "mini_attribute": Column information • Simply creating directories and files

Slide 29

Slide 29 text

*NQMFNFOUDSFBUFEBUBCBTF DPNNBOE $ minidb --base_dir /tmp/minidb create_db db1 /tmp/minidb !"" base $ #"" 10000 #"" global !"" mini_attribute $ #"" data !"" mini_class $ #"" data !"" mini_database $ #"" data #"" oid $ cat global/mini_database/data # Oid,db_name 10000,db1

Slide 30

Slide 30 text

*NQMFNFOUDSFBUFEBUBCBTF DPNNBOE • Insert a database record to "mini_database/data" file • Do not use binary format for system catalog to easy to implement • The format of this data file is “Oid,database_name" • Create directory of Oid name (10000) under the base directory

Slide 31

Slide 31 text

$ minidb --base_dir /tmp/minidb create_table db1 table1 /tmp/minidb !"" base $ #"" 10000 $ #"" 10001 #"" global !"" mini_attribute $ #"" data !"" mini_class $ #"" data !"" mini_database $ #"" data #"" oid *NQMFNFOUDSFBUFUBCMFDPNNBOE $ cat /tmp/minidb/global/mini_class/data # Oid,table_name,db_Oid 10001,table1,10000 $ cat /tmp/minidb/global/mini_attribute/data # column_name,db_Oid,table_Oid,type,length id,10000,10001,1,4 age,10000,10001,1,4 Only support `id integer, age integer` as table schema

Slide 32

Slide 32 text

*NQMFNFOUDSFBUFUBCMFDPNNBOE • Insert a table record to "mini_class/data" file • The format of this data file is “Oid,table_name,database_Oid" • Insert column records to "mini_attribute/data" file • The format of this data file is “column_name,database_Oid,table_Oid,type,length" • Type 1 means Signed 4 bytes integer • User can not specify schema of the table! • Only support `id integer, age integer` as table schema

Slide 33

Slide 33 text

)PXUPJOUFSBDUXJUIpMFT • Read the catalog file • Load data from the catalog file into an array • Add new catalog data to the array • Write the array to catalog file

Slide 34

Slide 34 text

*NQPSUBOUQPJOUT • Use simple data format, e.g. JSON, CSV … • Simply interact with catalog files

Slide 35

Slide 35 text

Part 2: Storage Manager and Buffer Manager

Slide 36

Slide 36 text

%BUBpMFTUSVDUVSFJOQH • "fork": Use same oid with different suffix for metadata files • "main": (0) / "fsm": (1) / "vm": (2) / "init": (3) • Free Space Map / Visibility Map • "segment": Divide a fork into multiple files to avoid OS' limit on file size (see: RELSEG_SIZE) • "block": Data file is divided into logical blocks, default 8KB (see: BLCKSZ) $ ls -hl base/16388/24576* -rw------- 1.0G 3 10 16:38 base/16388/24576 <- fork: 0, segment: 0 -rw------- 25M 3 10 16:38 base/16388/24576.1 <- fork: 0, segment: 1 -rw------- 280K 3 10 16:38 base/16388/24576_fsm <- fork: 1, segment: 0

Slide 37

Slide 37 text

%BUBpMFTUSVDUVSFJOQH • Internal structure of "base/16388/24576" file looks like below • It has fixed-size blocks $ ls -hl base/16388/24576* -rw------- 1.0G 3 10 16:38 base/16388/24576 -rw------- 25M 3 10 16:38 base/16388/24576.1 -rw------- 280K 3 10 16:38 base/16388/24576_fsm +---------+ | Block 0 | <- 8KB +---------+ | Block 1 | <- 8KB +---------+ | Block 2 | <- 8KB +---------+ | Block 3 | <- 8KB +---------+ ...

Slide 38

Slide 38 text

#MPDLTUSVDUVSFJOQH /* * +----------------+---------------------------------+ * | PageHeaderData | linp1 linp2 linp3 ... | * +-----------+----+---------------------------------+ * | ... linpN | | * +-----------+--------------------------------------+ * | ^ pd_lower | * | | * | v pd_upper | * +-------------+------------------------------------+ * | | tupleN ... | * +-------------+------------------+-----------------+ * | ... tuple3 tuple2 tuple1 | "special space" | * +--------------------------------+-----------------+ * ^ pd_special */ 'SPNTSDJODMVEFTUPSBHFCVGQBHFI Header 32bit +---------+ | Block 0 | <- 8KB +---------+ | Block 1 | <- 8KB +---------+ | Block 2 | <- 8KB +---------+ | Block 3 | <- 8KB +---------+ ...

Slide 39

Slide 39 text

#MPDLTUSVDUVSFJOQH • PageHeader is a header • linp has a ItemIdData type, 32bit. This holds info of each tuple, e.g. offset to the tuple, length of the tuple • Maybe linp means “line pointer” • tuple has a HeapTupleHeaderData type. This holds data of the row

Slide 40

Slide 40 text

*NQMFNFOUJOTFSUDPNNBOE • We should implement insert before select :) • Only support `id integer, age integer` as table schema • `minidb insert_into db_name table_name id age` $ minidb insert_into db1 table1 1 12 $ minidb insert_into db1 table1 2 13 $ minidb insert_into db1 table1 3 12

Slide 41

Slide 41 text

*NQMFNFOUJOTFSUDPNNBOE • Open data file • Load data into memory, this is called 'page' • Add tuple to page • Write data into data file

Slide 42

Slide 42 text

pub fn execute(&self, dbname: &str, table_name: &str, key_values: Vec) -> Result<(), String> { let rm: RecordManeger = RecordManeger::build_from_config("mini_attribute".to_string(), self.config).unwrap(); let attrs = rm.attributes(dbname, table_name); let mut slot = TupleTableSlot::new(rm.attributes_clone(dbname, table_name)); let path = self.config.data_file_path(dbname, table_name); let mut page = if path.exists() { Page::load(&path) } else { Page::new(DEFAULT_BLOCK_SIZE) }; if attrs.len() != key_values.len() { return Err(format!("Length not match. attrs: {}, key_values: {}", attrs.len(), key_values.len())); } *NQMFNFOUJOTFSUDPNNBOE open file & load data

Slide 43

Slide 43 text

for ((i, kv), attr) in key_values.iter().enumerate().zip(attrs.iter()) { if kv.key != attr.name { return Err(format!("Name not match. attrs: {}, key_values: {}", attr.name, kv.key)); } let t = ty::build_ty(&attr.type_name, &kv.value)?; slot.set_column(i, t.as_ref()); } page.add_tuple_slot_entry(&slot).unwrap(); let f = File::create(path).unwrap(); page.write_bytes(f); Ok(()) } *NQMFNFOUJOTFSUDPNNBOE add tuple to page write data to file

Slide 44

Slide 44 text

*NQMFNFOUTFMFDUDPNNBOE • `minidb select_from db_name table_name` $ minidb select_from db1 table1 "1" # <- id of 1st record "12" # <- age of 1st record "2" # <- id of 2nd record "13" # <- age of 2nd record "3" # <- id of 3rd record "12" # <- age of 3rd record

Slide 45

Slide 45 text

*NQMFNFOUTFMFDUDPNNBOE • Open data file • Load data into page • Copy tuple data into tuple slot (TupleTableSlot) • Print each tuple • Print each column

Slide 46

Slide 46 text

pub fn execute(&self, dbname: &str, table_name: &str, key: &str, value: &str) -> Result<(), String> { let rm: RecordManeger = RecordManeger::build_from_config("mini_attribute".to_string(), self.config).unwrap(); let attrs = rm.attributes_clone(dbname, table_name); let attrs_len = attrs.iter().fold(0, |acc, attr| acc + attr.len) as u32; let mut slot = TupleTableSlot::new(attrs); let path = self.config.data_file_path(dbname, table_name); let mut page = if path.exists() { Page::load(&path) } else { Page::new(DEFAULT_BLOCK_SIZE) }; *NQMFNFOUTFMFDUDPNNBOE open file & load data

Slide 47

Slide 47 text

for i in 0..(page.entry_count()) { slot.load_data(page.get_entry_pointer(i).unwrap(), attrs_len); println!("attrs_count: {:?}", slot.attrs_count()); for j in 0..(slot.attrs_count()) { let ty = slot.get_column(j); println!("{:?}", ty.as_string()); } } Ok(()) } *NQMFNFOUTFMFDUDPNNBOE copy data to tuple iterate each tuple print each column iterate each column

Slide 48

Slide 48 text

*NQPSUBOUQPJOUT • Need not to clean code of select/insert commands at present • We will introduce Query Execution logic at a later step

Slide 49

Slide 49 text

#VU • Each command read/write files • Each command allocate memory (page) • Introduce Buffer Manager and Storage Manager to extract these logic

Slide 50

Slide 50 text

+------------------+ | Other components | +------------------+ Return buffer identifiers (Buffer) ^ | (ReadBufferExtended, etc.) | v Give me block 0 of films table, lusiadas db ForkNumber 0 +----------------+ | Buffer Manager | +----------------+ load ^ | | v dump +----------------+ | Disk | +----------------+ #V⒎FS.BOBHFS

Slide 51

Slide 51 text

#V⒎FS.BOBHFS • Buffer Manager manages data in memory • All data in files are read/written via Buffer Manager • In many case we can not load all data into Buffer Manager (memory size <<<< data file size) • “Block" may be called "Page" in Buffer Manager context • User of Buffer Manager passes info of Relation(database & table) and BlockNumber to get a page

Slide 52

Slide 52 text

pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+ Get (db1, tbl2, blk0) )PXUPVTF#V⒎FS.BOBHFS

Slide 53

Slide 53 text

pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+ Get (db1, tbl2, blk0) )PXUPVTF#V⒎FS.BOBHFS

Slide 54

Slide 54 text

pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+ Get (db1, tbl2, blk0) Buffer (1) )PXUPVTF#V⒎FS.BOBHFS

Slide 55

Slide 55 text

pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+ Get (db1, tbl2, blk0) Buffer (1) Get page of Buffer (1) )PXUPVTF#V⒎FS.BOBHFS

Slide 56

Slide 56 text

pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+ Get (db1, tbl2, blk0) Buffer (1) Get page of Buffer (1) Reference of page (1) )PXUPVTF#V⒎FS.BOBHFS

Slide 57

Slide 57 text

Get (db1, tbl1, blk1) )PXUPVTF#V⒎FS.BOBHFS pages: +------+------+ | 8KB | 8KB | +------+------+ ^ ^ | | v v buffer_descriptors: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | | flag | flag | +------+------+ buffer_hash: +------+------+ | db1 | db1 | | tbl1 | tbl2 | | blk0 | blk0 | +------+------+ | | v v +------+------+ | 0 | 1 | +------+------+

Slide 58

Slide 58 text

Get (db1, tbl1, blk1) )PXUPVTF#V⒎FS.BOBHFS pages: +------+------+------+ | 8KB | 8KB | 8KB | +------+------+------+ ^ ^ ^ | | | v v v buffer_descriptors: +------+------+------+ | db1 | db1 | db1 | | tbl1 | tbl2 | tbl1 | | blk0 | blk0 | blk1 | | flag | flag | flag | +------+------+------+ buffer_hash: +------+------+------+ | db1 | db1 | db1 | | tbl1 | tbl2 | tbl1 | | blk0 | blk0 | blk1 | +------+------+------+ | | | v v v +------+------+------+ | 0 | 1 | 2 | +------+------+------+

Slide 59

Slide 59 text

Get (db1, tbl1, blk1) )PXUPVTF#V⒎FS.BOBHFS pages: +------+------+------+ | 8KB | 8KB | 8KB | +------+------+------+ ^ ^ ^ | | | v v v buffer_descriptors: +------+------+------+ | db1 | db1 | db1 | | tbl1 | tbl2 | tbl1 | | blk0 | blk0 | blk1 | | flag | flag | flag | +------+------+------+ buffer_hash: +------+------+------+ | db1 | db1 | db1 | | tbl1 | tbl2 | tbl1 | | blk0 | blk0 | blk1 | +------+------+------+ | | | v v v +------+------+------+ | 0 | 1 | 2 | +------+------+------+ Buffer (2)

Slide 60

Slide 60 text

*NQMFNFOU#V⒎FS.BOBHFS • Page: Malloced block_size memory • BufferTag: Relation(database & table) and BlockNumber // We malloc c memory for header, lines and tuples. pub struct Page { header: *mut PageHeaderData, } struct BufferTag { rnode: RelFileNode, block_num: BlockNumber, } 8 KB

Slide 61

Slide 61 text

*NQMFNFOU#V⒎FS.BOBHFS • pages: Array of Page • buffer_descriptors: Array of info of Page (BufferTag, dirty flag etc.) • buffer_hash: Map from BufferTag to buffer identifier (usize) pub struct BufferManager { config: Rc, smgr: StorageManager, buffer_descriptors: Vec, pages: Vec, // Hash from BufferTag to index of descriptor and page // See LocalBufHash in pg. buffer_hash: HashMap, }

Slide 62

Slide 62 text

-JNJUBUJPOTPG#V⒎FS.BOBHFS • No eviction logic • Can not load file into memory if all slots are used • It’s acceptable if we implement DB as command • It’s may not acceptable if we implement DB as server • No lock mechanism

Slide 63

Slide 63 text

*NQPSUBOUQPJOUT • Dump data to files by "destructor" (drop trait) • Be careful to return cached page if the page is cached • I embedded a bug because I forgot this…

Slide 64

Slide 64 text

4UPSBHF.BOBHFSJOQH • Manages file descriptors • "src/backend/storage/smgr/smgr.c" • "src/backend/storage/smgr/md.c" • Take care of segments

Slide 65

Slide 65 text

+------------------+ | Other components | +------------------+ ^ | (ReadBufferExtended, etc.) | v Give me block 0 of films table, lusiadas db, ForkNumber 0 +----------------+ | Buffer Manager | +----------------+ ^ | (smgrread, etc.) | v Give me block 0 of films table, lusiadas db, ForkNumber 0 +----------------+ | Storage Manager| +----------------+ ^ | (mdread, etc.) | v Read block size data from an appropriate file to a buf +----------------+ | Disk | +----------------+

Slide 66

Slide 66 text

typedef enum ForkNumber { InvalidForkNumber = -1, MAIN_FORKNUM = 0, FSM_FORKNUM, VISIBILITYMAP_FORKNUM, INIT_FORKNUM } ForkNumber; #define MAX_FORKNUM INIT_FORKNUM typedef struct _MdfdVec { File mdfd_vfd; /* fd number in fd.c's pool */ BlockNumber mdfd_segno; /* segment number, from 0 */ } MdfdVec; typedef struct SMgrRelationData { RelFileNodeBackend smgr_rnode; /* relation physical identifier */ struct SMgrRelationData **smgr_owner; BlockNumber smgr_targblock; /* current insertion target block */ BlockNumber smgr_fsm_nblocks; /* last known size of fsm fork */ BlockNumber smgr_vm_nblocks; /* last known size of vm fork */ int smgr_which; /* storage manager selector */ int md_num_open_segs[MAX_FORKNUM + 1]; struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1]; struct SMgrRelationData *next_unowned_reln; } SMgrRelationData; • md_seg_fds is an array of `_MdfdVec *` • see: `_mdfd_getseg` function

Slide 67

Slide 67 text

md_seg_fds: main fsm vm init +------------+--------------+--------------+--------------+ | | | | | +------------+--------------+--------------+--------------+ | | | | v v v v +------------+--------------+ | fd 24576 | fd 24576_fsm | | seg 0 | seg 0 | +------------+--------------+ | fd 24576.1 | | seg 1 | +------------+ base/16388/24576 <- fork: 0, segment: 0 base/16388/24576.1 <- fork: 0, segment: 1 base/16388/24576_fsm <- fork: 1, segment: 0 Forks Segments

Slide 68

Slide 68 text

*NQMFNFOU4UPSBHF.BOBHFS • cache: Map from RelFileNode to SMgrRelationData • SMgrRelationData: This has fd pub struct SMgrRelationData { config: Rc, pub smgr_rnode: RelFileNode, file: Option, // current insertion target block pub smgr_targblock: BlockNumber, } pub struct StorageManager { config: Rc, cache: HashMap> }

Slide 69

Slide 69 text

-JNJUBUJPOTPG4UPSBHF.BOBHFS • Fork is not supported • Segment is not supported $ ls -hl base/16388/24576* -rw------- 1.0G 3 10 16:38 base/16388/24576 <- fork: 0, segment: 0 -rw------- 25M 3 10 16:38 base/16388/24576.1 <- fork: 0, segment: 1 -rw------- 280K 3 10 16:38 base/16388/24576_fsm <- fork: 1, segment: 0

Slide 70

Slide 70 text

3FXSJUFJOTFSUDPNNBOE • Problems were • Each command read/write files • Each command allocate memory (page) • Insert command does not open file directly • Insert command does not allocate memory directly

Slide 71

Slide 71 text

pub struct InsertState<'a> { slot: &'a TupleTableSlot, ss_currentRelation: &'a RelFileNode, } impl<'a> InsertState<'a> { pub fn new(rnode: &'a RelFileNode, slot: &'a TupleTableSlot) -> InsertState<'a> { InsertState { ss_currentRelation: rnode, slot: slot, } } pub fn exec_insert(&mut self, bufmrg: &mut BufferManager) { let buf = bufmrg.read_buffer(self.ss_currentRelation.clone(), 0); let page = bufmrg.get_page_mut(buf); page.add_tuple_slot_entry(self.slot).unwrap(); } } get page from buffer manager write tuple to page

Slide 72

Slide 72 text

Part 3: Query Execution

Slide 73

Slide 73 text

5BCMFFYBNQMF =# \d films Table "public.films" Column | Type | Collation | Nullable | Default -----------+-------------------------+-----------+----------+--------- code | character(5) | | not null | title | character varying(40) | | not null | did | integer | | not null | date_prod | date | | | kind | character varying(10) | | | len | interval hour to minute | | | Indexes: "firstkey" PRIMARY KEY, btree (code) =# select * from films; code | title | did | date_prod | kind | len -------+-------+-----+-----------+------+----- c1 | T1 | 101 | | | c2 | T2 | 102 | | | c3 | T3 | 103 | | | (3 rows)

Slide 74

Slide 74 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun)

Slide 75

Slide 75 text

1BSTF • Generate AST from input text • Each node represents user input

Slide 76

Slide 76 text

1BSTF select * from films where did = 1; ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: catalogname: , schemaname: , relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1

Slide 77

Slide 77 text

"OBMZ[F3FXSJUF • Generate Query Tree • Replace user input attributes to internal representations • Semantic analysis

Slide 78

Slide 78 text

"OBMZ[F3FXSJUF Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Expanded

Slide 79

Slide 79 text

"OBMZ[F3FXSJUF Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Internal representations where did = 1

Slide 80

Slide 80 text

"OBMZ[F3FXSJUF Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … # select oid, relname from pg_class where oid = 16389 order by oid asc; oid | relname -------+--------- 16389 | films (1 row) films

Slide 81

Slide 81 text

"OBMZ[F3FXSJUF Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … where did = 1 /* catalog/pg_operator.dat */ { oid => ’96', oid_symbol => ‘Int4EqualOperator' oprname => ‘=', oprleft => ‘int4', oprright => ‘int4', oprresult => 'bool' }, /* catalog/pg_type.dat */ { oid => ’23', descr => '-2 billion to 2 billion integer, 4-byte storage', typname => ‘int4', typlen => '4' }, select * from films where did = 1;

Slide 82

Slide 82 text

0QUJNJ[F(FOFSBUFUSFFPGQMBO OPEF • We would have many way to calculate result tuples • How to get tuples from disk • Read files in order (SeqScan) / Use index (IndexScan) / Use only index (IndexOnlyScan) • How to aggregate tuples • Use internal hash table / sort tuples before aggregation • How to join tables • Nested Loops Join / (Sort-)Merge Join / Hash Join

Slide 83

Slide 83 text

0QUJNJ[F(FOFSBUFUSFFPGQMBO OPEF Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … PlannedStmt: command_type: SELECT SeqScan: targetlist TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 ... qual OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1

Slide 84

Slide 84 text

• Sort gets tuple from lefttree, SeqScan in this case select * from films where did = 1 order by code; PlannedStmt: command_type: SELECT Sort: num_cols: 1, sort_col_idx: [1], sort_operators: [1058] targetlist TargetEntry: resno: 1, resname: code Var: varno: 65001, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 65001, varattno: 2 ... qual lefttree SeqScan: targetlist TargetEntry: resno: 1, resname: Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: Var: varno: 1, varattno: 2 ... qual OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1

Slide 85

Slide 85 text

*NQMFNFOUDPVOUDPNNBOE • In pg this is implemented as aggregation • But aggregation is difficult… $ minidb select_from_count db1 table1 Count: 5

Slide 86

Slide 86 text

*NQMFNFOUDPVOUDPNNBOE +-------------+ | Count | | * counter | | * scanner | +-------------+ +---------+ | SeqScan | +---------+ +------+ | Disk | +------+

Slide 87

Slide 87 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) • Directly create Plan tree

Slide 88

Slide 88 text

impl<'a> CountState<'a> { pub fn new(lefttree: ScanState<'a>) -> CountState<'a> { CountState { lefttree: lefttree, result: 0, } } // `ExecAgg` in pg. pub fn exec_agg(&mut self, bufmrg: &mut BufferManager) { loop { let opt = self.lefttree.exec_scan(bufmrg); match opt { Some(_slot) => { self.result = self.result + 1; }, None => break } } } } get tuple count up counter scanner

Slide 89

Slide 89 text

impl CountCommnad { ... pub fn execute(&self, dbname: &str, table_name: &str) -> Result<(), String> { ... let mut rmgr = RelationManager::new(self.config.clone()); let relation = rmgr.get_relation(db_oid, table_oid); let mut bm = BufferManager::new(1, self.config.clone()); let scan = ScanState::new(relation, &rm, &mut bm); let mut count = CountState::new(scan); count.exec_agg(&mut bm); println!("Count: {}", count.result); Ok(()) } } pass scan to count execute get result of the count

Slide 90

Slide 90 text

-JNJUBUJPOPGDPVOU • Count logic is not generalized to aggregation

Slide 91

Slide 91 text

*NQMFNFOUQBSTFS • Use lalrpop, LR(1) parser generator • Rewrite select_from / select_from_count / insert_into commands to execute command • Before: "minidb select_from db1 table1” • After: "minidb execute 'select * from db1.table1’" • I wanted some progress when I changed to use parser :) • Writing parser by parser generator is fun!!!

Slide 92

Slide 92 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) • Do not analyze • Do not optimize

Slide 93

Slide 93 text

-JNJUBUJPOPGRVFSZFYFDVUPS • Do not implement analyzer • Should do semantic analysis when the query is executed • Do not implement optimizer • Can not use index even if index is implemented

Slide 94

Slide 94 text

*NQMFNFOUEFMFUF $ minidb /tmp/minidb execute 'select count() from db1.table1' Count: 5 $ minidb execute 'delete from db1.table1' Deleted records: 5 $ minidb /tmp/minidb execute 'select count() from db1.table1' Count: 0

Slide 95

Slide 95 text

*NQMFNFOUEFMFUF • Add deleted flag to tuple • Should recreate data files or write migration command • delete command sets the flag • SeqScan checks the flag • Count need not to take care of the flag

Slide 96

Slide 96 text

*NQMFNFOUEFMFUF /* * +----------------+---------------------------------+ * | PageHeaderData | linp1 linp2 linp3 ... | * +-----------+----+---------------------------------+ * | ... linpN | | * +-----------+--------------------------------------+ * | ^ pd_lower | * | | * | v pd_upper | * +-------------+------------------------------------+ * | | tupleN ... | * +-------------+------------------+-----------------+ * | ... tuple3 tuple2 tuple1 | "special space" | * +--------------------------------+-----------------+ * ^ pd_special */ +---------+ | | | Data | | | +---------+ +---------+ | flag(s) | +---------+ | | | Data | | | +---------+ Before After • Add deleted flag to tuple

Slide 97

Slide 97 text

diff --git a/src/node_seqscan.rs b/src/node_seqscan.rs index 80f7cff..2edeb2b 100644 --- a/src/node_seqscan.rs +++ b/src/node_seqscan.rs @@ -135,8 +135,19 @@ impl<'a> ScanState<'a> { let dp = bufmrg.get_page(scan_desc.rs_cbuf); let mut t_self = ItemPointerData::new(); ::tuple::item_pointer_set(&mut t_self, scan_desc.rs_cblock, lineoff); + debug!("lp_len {}", dp.get_item_ref(lineoff).lp_len()); scan_desc.rs_ctup.load_without_len(dp.get_entry_pointer(lineoff).unwrap(), t_self); - return + + // Skip deleted record + if scan_desc.rs_ctup.t_data.heap_keys_updated_p() { + debug!("Skip deleted tuple {}", lineoff); + lineoff = lineoff + 1; + linesleft = linesleft - 1; + // next + } else { + debug!("Return tuple {}", lineoff); + return + } } Check delete flag in scan

Slide 98

Slide 98 text

impl<'a> CountState<'a> { pub fn new(lefttree: ScanState<'a>) -> CountState<'a> { CountState { lefttree: lefttree, result: 0, } } // `ExecAgg` in pg. pub fn exec_agg(&mut self, bufmrg: &mut BufferManager) { loop { let opt = self.lefttree.exec_scan(bufmrg); match opt { Some(_slot) => { self.result = self.result + 1; }, None => break } } } } get tuple count up • Count need not to take care of the flag • lefttree omit deleted tuples

Slide 99

Slide 99 text

-JNJUBUJPOTPGEFMFUFDPNNBOE • delete command deletes all records because we can not pass where clause • vacuum is not implemented

Slide 100

Slide 100 text

&WBMVBUJOHFYQSFTTJPOT XIFSF DMBVTF select * from films where did = 1; QueryDesc (SELECT): planstate SeqScanState: qual ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE ss_ScanTupleSlot TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true tts_tupleDescriptor TupleDesc: natts: 6 attrs PgAttribute: attname: code ...

Slide 101

Slide 101 text

&WBMVBUJOHFYQSFTTJPOT XIFSF DMBVTF select * from films where did = 1; QueryDesc (SELECT): planstate SeqScanState: qual ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE ss_ScanTupleSlot TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true tts_tupleDescriptor TupleDesc: natts: 6 attrs PgAttribute: attname: code ...

Slide 102

Slide 102 text

&WBMVBUJOHFYQSFTTJPOT XIFSF DMBVTF select * from films where did = 1; QueryDesc (SELECT): planstate SeqScanState: qual ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE ss_ScanTupleSlot TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true tts_tupleDescriptor TupleDesc: natts: 6 attrs PgAttribute: attname: code ... Byte codes …

Slide 103

Slide 103 text

&WBMVBUJOHFYQSFTTJPOT XIFSF DMBVTF • Expression like "where did = 1" needed to be evaluated to implement where clause • "direct threaded" or "switch threaded" or “llvmjit" in pg • https://magazine.rubyist.net/articles/0008/0008-YarvManiacs.html • It's overspec, so implement it by tree traverse

Slide 104

Slide 104 text

*NQMFNFOUXIFSFDMBVTF $ minidb /tmp/minidb execute 'select count() from db1.table1' Count: 5 $ minidb /tmp/minidb execute 'select count() from db1.table1 where true' Count: 5 $ minidb /tmp/minidb execute 'select count() from db1.table1 where false' Count: 0 $ minidb /tmp/minidb execute 'select count() from db1.table1 where true = false' Count: 0 $ minidb /tmp/minidb execute 'select * from db1.table1' "1" "12" "2" "13" "3" "12" "4" "20" "5" "21" $ minidb /tmp/minidb execute 'select count() from db1.table1 where age = 12' Count: 2

Slide 105

Slide 105 text

*NQMFNFOUXIFSFDMBVTF pub enum Expr { Bool(bool), Number(i32), OpEq(Box, Box), // "=" ColumnRef(String), // column name } struct ExprEvaluator<'a> { currentTuple: &'a TupleTableSlot, expr: &'a Expr, } where did = 1 +------+ | OpEq | +------+ | | +------------------+ +-| ColumnRef("did") | | +------------------+ | | +-----------+ +-| Number(1) | +-----------+ has current tuple

Slide 106

Slide 106 text

impl<'a> ScanState<'a> { fn exec(&mut self) -> Option<&TupleTableSlot> { loop { self.seq_next(); if self.ss_currentScanDesc.rs_finished { return None; } if self.exec_qual() { return Some(self.ss_ScanTupleSlot.as_ref()); } // next tuple } } fn exec_qual(&self) -> bool { if self.qual.is_none() { // Always condition is met return true; } let evaluator = ExprEvaluator::new(self.ss_ScanTupleSlot.as_ref(), self.qual.as_ref().unwrap().as_ref()); evaluator.eval() } fetch next tuple from page check where clause set current tuple to evaluator then call eval

Slide 107

Slide 107 text

impl<'a> ExprEvaluator<'a> { fn eval_rec(&self, expr: &Expr) -> Box { match expr { Expr::Bool(b) => { Box::new(Value::Bool(b.clone())) }, Expr::Number(n) => { Box::new(Value::String(n.to_string())) }, Expr::OpEq(e1, e2) => { let v1 = self.eval_rec(e1); let v2 = self.eval_rec(e2); Box::new(Value::Bool(self.op_eq(v1.as_ref(), v2.as_ref()))) }, Expr::ColumnRef(col_name) => { let i = self.currentTuple.get_index_from_name(col_name); let c = self.currentTuple.get_column(i); Box::new(Value::String(c.as_string())) }, } } simple case eval operands then compare get value from current tuple

Slide 108

Slide 108 text

"EEXIFSFUPEFMFUFDPNNBOE diff --git a/src/ast.rs b/src/ast.rs @@ -4,8 +4,8 @@ pub enum Stmt { SelectStmt(Box, String, String, Option>), // dbname, tablename, keys, values InsertStmt(String, String, Vec, Vec>), - // dbname, tablename - DeleteStmt(String, String), + // dbname, tablename, where_clause + DeleteStmt(String, String, Option>), } diff --git a/src/parser.lalrpop b/src/parser.lalrpop @@ -12,8 +12,8 @@ pub Statement: Stmt = { Stmt::InsertStmt(fi.0, fi.1, li1, li2) }, // DELETE FROM dbname "." tablename - Delete_T From_T => { - Stmt::DeleteStmt(fi.0, fi.1) + Delete_T From_T => { + Stmt::DeleteStmt(fi.0, fi.1, ow) }, } Add where member to Node Change syntax to accept where

Slide 109

Slide 109 text

"EEXIFSFUPEFMFUFDPNNBOE diff --git a/src/dml.rs b/src/dml.rs @@ -122,7 +122,7 @@ impl DeleteCommnad { } } - pub fn execute(&self, dbname: &str, table_name: &str, cmgr: &CatalogManager) -> Result<(), String> { + pub fn execute(&self, dbname: &str, table_name: &str, cmgr: &CatalogManager, qual: &Option>) -> Result<(), String> { let db_oid = cmgr.database_rm.find_mini_database_oid(dbname) .expect(&format!("{} database should be defined.", dbname)); let table_oid = cmgr.class_rm.find_mini_class_oid(db_oid, table_name) @@ -131,7 +131,7 @@ impl DeleteCommnad { let mut rmgr = RelationManager::new(self.config.clone()); let relation = rmgr.get_relation(db_oid, table_oid); let mut bm = BufferManager::new(1, self.config.clone()); - let scan = ScanState::new(relation, &rm, &mut bm, &None); + let scan = ScanState::new(relation, &rm, &mut bm, qual); let mut delete = DeleteState::new(relation, scan); delete.exec_delete(&mut bm); Pass where expressions to scan

Slide 110

Slide 110 text

-JNJUBUJPOTPGXIFSFDMBVTF • Support only few basic operations

Slide 111

Slide 111 text

Part 4: Conclusion

Slide 112

Slide 112 text

4UFQTUPDSFBUJOHEC 1. Implement init command 2. Implement create database / create table command 3. Implement insert command / select command 4. Implement buffer manager / storage manager • Rewrite insert command to use these managers 5. Implement count command 6. Implement parser 7. Implement delete 8. Implement where clause • Add where to delete

Slide 113

Slide 113 text

-JNJUBUJPOTPGFBDIDPNQPOFOU • The system catalogs • Use simple data format, e.g. JSON, CSV, not binary format • Create table • Signed 4 bytes integer is only supported type • Only support `id integer, age integer` as table schema • Buffer Manager • No eviction logic • No lock mechanism

Slide 114

Slide 114 text

-JNJUBUJPOTPGFBDIDPNQPOFOU • Storage Manager • Fork is not supported • Segment is not supported • Count command • Count logic is not generalized to aggregation • Delete command • Vacuum is not implemented • Where clause • Support only few basic operations

Slide 115

Slide 115 text

-JNJUBUJPOTPGFBDIDPNQPOFOU • Query Execution • Do not implement analyzer • Should do semantic analysis when the query is executed • Do not implement optimizer • Can not use index even if index is implemented

Slide 116

Slide 116 text

3FGFSFODFT • Raghu Ramakrishnan, Johannes Gehrke: "Database Management Systems” • Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom: “Database Systems: Pearson New International Edition: The Complete Book” • উມஐ੒, ࠤഢণथ, ݪాొࢤ: “ʦվగ৽൛ʧ಺෦ߏ଄͔ΒֶͿ PostgreSQL ઃܭɾӡ༻ܭըͷమଇ” • “The Internals of PostgreSQL” http://www.interdb.jp/pg/index.html • “PostgreSQL ͷ֮͑ॻ͖ͷΠϯσοΫε” http://www.nminoru.jp/ ~nminoru/postgresql/

Slide 117

Slide 117 text

"DLOPXMFEHNFOUT • @takeshinoda and @hkdnet • Reviewing the slide • Oracle Corporation Japan • Giving me the chance to talk about DB in Oracle!!!

Slide 118

Slide 118 text

Thank you !!