Make your data Grab-and-Go

Make your data Grab-and-Go

7dc8611c26c3ca62c551109c65d04270?s=128

Yuichiro Someya

May 17, 2017
Tweet

Transcript

  1. Make your data Grab-’n’-Go ayemos @ Cookpad Inc.

  2.  `whoami` ‣ ‘Yuichiro Someya’.split.last.reverse.downcase ‣ github.com/ayemos ‣ twitter.com/ayemos_y ‣

    www.ayemos.me 
  3.   NFEJVNDPN!BZFNPT BZFNPTNF

  4.   NFEJVNDPN!BZFNPT BZFNPTNF

  5. github.com/ayemos/akagi

  6. Make your Data Grab-’n’-Go

  7. Make your Data Grab-’n’-Go *Data Reproducibility*

  8. *Data Reproducibility* ‣Important ‣Not easy to achieve

  9. ‣ Giving same datas as other’s have enough trouble #

    it may spans across multiple type of data sources
 ‣ Datas sometimes need to be strictly identical
  10.  

  11.  

  12.   Document on notebook (Sets of scripts, manual ops)

    ‣ Bothersome (to document / to use) ‣ Human-error Prone
  13.  

  14.  

  15.   keras.dataset.mnist Document on notebook (Sets of scripts, manual

    ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone
  16.  ‣ Easy, can instantly be reproduced  keras.dataset.mnist Document

    on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone
  17.  ‣ Easy, can instantly be reproduced ‣ Less chance

    to be used in real work  keras.dataset.mnist Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone
  18. Levels of Data Abstraction

  19.  ‣ Easy, can instantly be reproduced ‣ Less chance

    to be used in real work  keras.dataset.mnist Document on notebook (Sets of scripts, manual ops) ‣ Bothersome (to document / to use) ‣ Human-error Prone
  20.  Preprocessing Batching Fetch Load  ‣ Load the data

    to script
 (or any other training dev) ‣ Convert, Reshape, Split, … ‣ Download datas and put it to a specific place
  21.   Preprocessing Batching Fetch Load

  22.   Preprocessing Batching Fetch Load keras.dataset.mnist

  23.   Preprocessing Batching Fetch Load keras.dataset.mnist What I (or

    we) need
  24.   Preprocessing Batching Fetch Load BLBHJ

  25.  There might be a demo 

  26.  

  27.  

  28.  akagi ‣ Make it easier to access multiple types

    of Data Sources # MySQL, Amazon Redshift, Amazon S3, Google Spreadsheets, FTP Servers, … ‣ Specify the datas with runnable Python code # Use and Document at the same time 
  29.  akagi ‣ akagi introduces Abstract Layer on Datas #

    Have potential to apply common operations over them # Data registry ?