Make your data Grab-and-Go

Slide 1

Slide 1 text

Make your data Grab-’n’-Go ayemos @ Cookpad Inc.

Slide 2

Slide 2 text

`whoami` ‣ ‘Yuichiro Someya’.split.last.reverse.downcase ‣ github.com/ayemos ‣ twitter.com/ayemos_y ‣ www.ayemos.me

Slide 3

Slide 3 text

NFEJVNDPN!BZFNPT BZFNPTNF

Slide 4

Slide 4 text

NFEJVNDPN!BZFNPT BZFNPTNF

Slide 5

Slide 5 text

github.com/ayemos/akagi

Slide 6

Slide 6 text

Make your Data Grab-’n’-Go

Slide 7

Slide 7 text

Make your Data Grab-’n’-Go *Data Reproducibility*

Slide 8

Slide 8 text

*Data Reproducibility* ‣Important ‣Not easy to achieve

Slide 9

Slide 9 text

‣ Giving same datas as other’s have enough trouble # it may spans across multiple type of data sources  ‣ Datas sometimes need to be strictly identical

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

keras.dataset.mnist Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone

Slide 16

Slide 16 text

‣ Easy, can instantly be reproduced keras.dataset.mnist Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone

Slide 17

Slide 17 text

‣ Easy, can instantly be reproduced ‣ Less chance to be used in real work keras.dataset.mnist Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone

Slide 18

Slide 18 text

Levels of Data Abstraction

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Preprocessing Batching Fetch Load ‣ Load the data to script  (or any other training dev) ‣ Convert, Reshape, Split, … ‣ Download datas and put it to a speciﬁc place

Slide 21

Slide 21 text

Preprocessing Batching Fetch Load

Slide 22

Slide 22 text

Preprocessing Batching Fetch Load keras.dataset.mnist

Slide 23

Slide 23 text

Preprocessing Batching Fetch Load keras.dataset.mnist What I (or we) need

Slide 24

Slide 24 text

Preprocessing Batching Fetch Load BLBHJ

Slide 25

Slide 25 text

There might be a demo

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

akagi ‣ Make it easier to access multiple types of Data Sources # MySQL, Amazon Redshift, Amazon S3, Google Spreadsheets, FTP Servers, … ‣ Specify the datas with runnable Python code # Use and Document at the same time

Slide 29

Slide 29 text

akagi ‣ akagi introduces Abstract Layer on Datas # Have potential to apply common operations over them # Data registry ?