We are all statisticians now

4bd13719da0ba2c5bd2a446e14f78187?s=47 Jeff L.
May 29, 2014

We are all statisticians now

Talk at the Data Intensive Biology Conference in the Johns Hopkins Department of Biology 2014.

4bd13719da0ba2c5bd2a446e14f78187?s=128

Jeff L.

May 29, 2014
Tweet

Transcript

  1. We are all statisticians now

  2. None
  3. N = SAMPLE SIZE

  4. N = ($ YOU HAVE) ($ PER SAMPLE)

  5. Year $ per (human) Genome

  6. rna-seq 2008 N≈2 2010 N≈70 2013 N≈900 PMIDS: 19056941,  20220758,

     24092820  
  7. www.geni.com

  8. http://erlichlab.wi.mit.edu/familinx/index.html

  9. None
  10. https://github.com/jtleek/swfdr

  11. None
  12. research about me blogging teaching

  13. jtleek.com about me simplystatistics.org jhudatascience.org

  14. None
  15. from: jtleek@gmail.com Roger let me know you gave him a

    ballpark figure for the number of students registered for his course "Computing for Data Analysis”. Could you give me an idea of how many have registered for my course "Data Analysis?”    
  16. from: pangwei@coursera.org Hi Jeff, 7,000 students! It's pretty awesome. (You'll

    be able to check this out yourself next week, once the class sites are up.)  
  17. from: rdpeng@gmail.com You are f**ed. -roger  

  18. 9 classes 1 month long Every month

  19. Cumulative Enrollment

  20. Dude, we get it data/statistics is everywhere (what is the

    big deal)
  21. None
  22. None
  23. None
  24. what went wrong? 2 things

  25. what went wrong? transparency The data/code weren’t reproducible

  26. what went wrong? transparency There was a lack of cooperation

  27. what went wrong? expertise They used silly prediction rules (Pr(FEC)

     =  5/8[Pr(F)  +  Pr(E)  +  Pr(C)]  –  ¼)  
  28. what went wrong? expertise They had study design problems (Batch

     effects)  
  29. what went wrong? expertise Their predictions weren’t locked down Today:

     Pr(FEC)  =  0.8   Tomorrow:  Pr(FEC)  =  0.1    
  30. At the end of the day the Potti analysis was

    fully reproducible The problem is that the analysis was wrong
  31. None
  32. None
  33. None
  34. hJps://github.com/jtleek/datasharing  

  35. Reproducibility = Solved (You’re welcome)

  36. um, actually no

  37. 1st Discussion Point: Statistical thinking is (often) an afterthought

  38. hJp://bit.ly/OgW3xv  

  39. None
  40. None
  41. hJp://biomickwatson.wordpress.com/2013/04/23/a-­‐guide-­‐for-­‐the-­‐lonely-­‐bioinforma[cian/  

  42. Yes, we are witnessing the birth of Yet another “pet

    bioinformatician”. What I mean by this term is a single bioinformatician employed within a laboratory based group. hJp://biomickwatson.wordpress.com/2013/04/23/a-­‐guide-­‐for-­‐the-­‐lonely-­‐bioinforma[cian/   “ ”
  43. 2nd Discussion Point: Most data analysts are untrained

  44. One year of biology One year of physics One year

    of English Two years of chemistry (through organic chemistry) https://www.aamc.org/students/applying/requirements/ med school entrance requirements
  45. hJp://www.nejm.org/doi/full/10.1056/NEJMoa1400029  

  46. I  am  a  postdoctoral  fellow  in  redacted  group   I

     collected  data  on  redacted   …     Preliminary  analysis  has  pulled  out  some  interes[ng   things  but  we  need  some  professional  assistance   …     We  want  to  submit  at  the  end  of  next  month.   To: jtleek@gmail.com
  47. 3rd Discussion Point: Statistics is not math (and data analysis

    isn’t statistics)
  48. association between shoe size and literacy

  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. 4th Discussion Point: How do we balance skepticism & excitement?

  57. None
  58. http://gking.harvard.edu/files/gking/files/0314policyforumff.pdf

  59. 1. Statistical thinking is (often) an afterthought 2. Most data

    analysts are untrained   3. Statistics is not math (and data analysis isn’t statistics)   4. How do we balance skepticism & excitement?  
  60. jtleek.com/talks