Big data and reproducibility

4bd13719da0ba2c5bd2a446e14f78187?s=47 Jeff L.
June 28, 2014

Big data and reproducibility

Talk at JHU summer institute.

4bd13719da0ba2c5bd2a446e14f78187?s=128

Jeff L.

June 28, 2014
Tweet

Transcript

  1. 2.
  2. 9.
  3. 10.
  4. 11.
  5. 12.
  6. 16.

    what went wrong? expertise They used silly prediction rules (Pr(FEC)

     =  5/8[Pr(F)  +  Pr(E)  +  Pr(C)]  –  ¼)  
  7. 18.

    what went wrong? expertise Their predictions weren’t locked down Today:

     Pr(FEC)  =  0.8   Tomorrow:  Pr(FEC)  =  0.1    
  8. 19.

    At the end of the day the Potti analysis was

    fully reproducible The problem is that the analysis was wrong
  9. 21.

    The goal: a result that is reproducible (the code and

    data can be used to recreate the results) and replicable (you can perform the experiment again and get the same answer)
  10. 22.

    The goal: a result that is reproducible (the code and

    data can be used to recreate the results) and replicable (you can perform the experiment again and get the same answer)
  11. 23.

    Who  Reproduces  Research?   The  truth  is  A   I

     don’t   care   The  truth  is  B   The  truth  is  not  A   Original  InvesRgator   Reproducers   The  truth  is  A   ScienRsts   General   Public   ???   Slide courtesy R. Peng
  12. 26.

    What  is  Data  Analysis?   Raw  Data   Cleaning  /

      ValidaRon   Pre-­‐processing   Exploratory   data  analysis   StaRsRcal  model   development   SensiRvity   analysis   Finalize   results  /  report   StaRsRcs!   Slide courtesy R. Peng
  13. 29.
  14. 32.
  15. 33.
  16. 34.
  17. 35.

    1. Reproducibility by data sharing 2. Big data is not

    just statistics   3. Analysis is often an afterthought   4. Traditional ideas still matter