*Data Reproducibility*
‣Important
‣Not easy to achieve
Slide 9
Slide 9 text
‣ Giving same datas as other’s have enough trouble
# it may spans across multiple type of data sources
‣ Datas sometimes need to be strictly identical
Slide 10
Slide 10 text
Slide 11
Slide 11 text
Slide 12
Slide 12 text
Document on notebook
(Sets of scripts, manual ops)
‣ Bothersome (to document / to use)
‣ Human-error Prone
Slide 13
Slide 13 text
Slide 14
Slide 14 text
Slide 15
Slide 15 text
keras.dataset.mnist
Document on notebook
(Sets of scripts, manual ops)
‣ Bothersome (to document / to use)
‣ Human-error Prone
Slide 16
Slide 16 text
‣ Easy, can instantly be reproduced
keras.dataset.mnist
Document on notebook
(Sets of scripts, manual ops)
‣ Bothersome (to document / to use)
‣ Human-error Prone
Slide 17
Slide 17 text
‣ Easy, can instantly be reproduced
‣ Less chance to be used in real work
keras.dataset.mnist
Document on notebook
(Sets of scripts, manual ops)
‣ Bothersome (to document / to use)
‣ Human-error Prone
Slide 18
Slide 18 text
Levels of Data Abstraction
Slide 19
Slide 19 text
‣ Easy, can instantly be reproduced
‣ Less chance to be used in real work
keras.dataset.mnist
Document on notebook
(Sets of scripts, manual ops)
‣ Bothersome (to document / to use)
‣ Human-error Prone
Slide 20
Slide 20 text
Preprocessing
Batching
Fetch
Load
‣ Load the data to script
(or any other training dev)
‣ Convert, Reshape, Split, …
‣ Download datas and put it to a specific place
Preprocessing
Batching
Fetch
Load
keras.dataset.mnist
What I (or we) need
Slide 24
Slide 24 text
Preprocessing
Batching
Fetch
Load
BLBHJ
Slide 25
Slide 25 text
There might be a demo
Slide 26
Slide 26 text
Slide 27
Slide 27 text
Slide 28
Slide 28 text
akagi
‣ Make it easier to access multiple types of Data Sources
# MySQL, Amazon Redshift, Amazon S3, Google Spreadsheets, FTP Servers, …
‣ Specify the datas with runnable Python code
# Use and Document at the same time
Slide 29
Slide 29 text
akagi
‣ akagi introduces Abstract Layer on Datas
# Have potential to apply common operations over them
# Data registry ?