Upgrade to Pro — share decks privately, control downloads, hide ads and more …

John Lin - IPython Notebook - PyDSLA meetup - N...

Data Science LA
November 05, 2014
2.8k

John Lin - IPython Notebook - PyDSLA meetup - Nov 2014

Data Science LA

November 05, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. iPython  Notebook  for  Data   Analysis   John  Lin  

    johnclin.com   (TrueCar  Data  Scien?st  –  we’re  hiring!)  
  2. A  liFle  about  me  …   •  Working  at  TrueCar,

     working  on  both  data   analy?cs  and  data  engineering  projects.   •  Experimental  Economist  by  training   –  Caltech  and  University  of  Michigan.   –  Lots  of  stats/econometric.   –  Game  theory/mechanism  design.   •  Programmer   –  Built  web-­‐based  financial    markets  at  Caltech  and   Michigan.   –  Building  robust  analy?cal  data  ETLs  at  TrueCar,  with   small  and  Big  data.    Lots  and  lots  of  data  …  
  3. BIG  Picture   •  iPython  Notebook  is:   – Easy  to

     install.   – Powerful  environment  in  its  own  right.   – The  founda?onal  environment  for  other  Python   data  packages:   •  Pandas   •  Matplotlib   – A  very  good  tool.  
  4. Overview   •  Diving  straight  in.   •  Installing  the

     iPython  Notebook.   •  Some  interes?ng  features.   •  Pros  and  Gotchas  of  Using  the  iPython   Notebook.   •  How  to  learn  more?  
  5. Installing  and  Running   •  Installing  the  iPython  Notebook  

    – pip  install  ipython  pyzmq  jinja2  tornado   •  pyzmq  takes  a  bit  longer  to  build  on  a  Mac   •  Running/launching  the  iPython  notebook   – ipython  notebook   – Note  that  ipython  is  a  shell,  ipython  notebook  is  a   browser  based  interface  
  6. Pros  of  Using  the  iPython  Notebook   –  iPython  Notebook

     is  interac?ve.    Great  for  data   analysis!   •  This  may  not  seem  like  a  big  deal  at  first  if  you  haven’t  done   a  lot  of  data  processing  work,  but  it  is!   •  Imagine  the  alterna?ve:   –  Edit  the  program  file.   –  Run  the  program  and  look  at  the  output  text  in  a  text  editor.   –  Repeat  endless  ?mes.       –  And  how  do  you  visualize  the  data?    Output  to  file  and  click  to   show  on  browser?   –  The  iPython  Notebook,  along  with  pandas  and   matplotlib,  provide  a  powerful  combina?on  of  tools  to   itera?vely  examine,  process,  and  visualize  data.  
  7. Gotchas  of  Using  the  iPython   Notebook.   •  The

     raw  iPython  notebook  is  not  very   readable  as  it  contains  a  lot  of  HTML   formaang  code.   •  Hard  to  read  the  code  in  github.   – Though  it  is  easy  to  convert  a  iPython  notebook  to   other  formats  (html,  python  code)  using                       ‘ipython  nbconvert’   •  Diffs  (‘diff’  or  ‘git  diff’)  are  a  lot  less  helpful   when  comparing  iPython  notebooks.  
  8. Gotchas  of  Using  the  iPython   Notebook   •  Because

     it  encourages  interac?ve  coding,  it  is   easy  to  pollute  the  name  space.       •  This  makes  the  code  hard  to  debug  because  you   may  have  over-­‐wriFen  a  variable  and  had   forgoFen  about  it.   –  When  in  doubt,  re-­‐start  the  kernel,  and  run  the   process  through  one  step  at  a  ?me  from  the  top.   –  Rename  variables  ader  a  transforma?on  step.   –  Break  your  code  into  separate  cells.   –  Leverage  methods  and  classes  as  appropriate.  
  9. How  to  Learn  More   •  hFp://iPython.org    (The  Mothership.)

      •  hFps://github.com/ipython/ipython/wiki/A-­‐ gallery-­‐of-­‐interes?ng-­‐IPython-­‐Notebooks     (Repository  of  iPython  notebooks.)   •  hFp://con?nuum.io/wakari    (Online  hos?ng  of   iPython  notebooks.)