Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big data and reproducibility

Jeff L.
June 28, 2014

Big data and reproducibility

Talk at JHU summer institute.

Jeff L.

June 28, 2014
Tweet

More Decks by Jeff L.

Other Decks in Science

Transcript

  1. what went wrong? expertise They used silly prediction rules (Pr(FEC)

     =  5/8[Pr(F)  +  Pr(E)  +  Pr(C)]  –  ¼)  
  2. what went wrong? expertise Their predictions weren’t locked down Today:

     Pr(FEC)  =  0.8   Tomorrow:  Pr(FEC)  =  0.1    
  3. At the end of the day the Potti analysis was

    fully reproducible The problem is that the analysis was wrong
  4. The goal: a result that is reproducible (the code and

    data can be used to recreate the results) and replicable (you can perform the experiment again and get the same answer)
  5. The goal: a result that is reproducible (the code and

    data can be used to recreate the results) and replicable (you can perform the experiment again and get the same answer)
  6. Who  Reproduces  Research?   The  truth  is  A   I

     don’t   care   The  truth  is  B   The  truth  is  not  A   Original  InvesRgator   Reproducers   The  truth  is  A   ScienRsts   General   Public   ???   Slide courtesy R. Peng
  7. What  is  Data  Analysis?   Raw  Data   Cleaning  /

      ValidaRon   Pre-­‐processing   Exploratory   data  analysis   StaRsRcal  model   development   SensiRvity   analysis   Finalize   results  /  report   StaRsRcs!   Slide courtesy R. Peng
  8. 1. Reproducibility by data sharing 2. Big data is not

    just statistics   3. Analysis is often an afterthought   4. Traditional ideas still matter