Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BMMB554: Talks, Papers, Grants

BMMB554: Talks, Papers, Grants

What is the reality of Academia?

Anton Nekrutenko

April 12, 2017
Tweet

More Decks by Anton Nekrutenko

Other Decks in Education

Transcript

  1. Talks | Papers | Grants
    Your future

    View Slide

  2. View Slide

  3. Talks
    ‣ Most talks are boring
    ‣ Most slides are horrible
    ‣ Talks are the best medium to get your
    point across
    ‣ https://youtu.be/WAwDvbIfkos
    ‣ http://journals.plos.org/ploscompbiol/
    article?id=10.1371/journal.pcbi.0030077

    View Slide

  4. Papers
    ‣ Authorship matters
    ‣ Make things clear
    ‣ Read more non-scientific literature
    ‣ http://journals.plos.org/ploscompbiol/
    article?id=10.1371/journal.pcbi.1004205

    View Slide

  5. The following slides are
    shamelessly stolen from
    Sergei Kosakovsky Pond

    View Slide

  6. Data visualization
    ❖ A picture is worth a thousand words
    ❖ A critical part of being a successful
    scientist
    ❖ Should be done at all stages of the
    scientific process
    ❖ Data exploration
    ❖ Data analysis
    ❖ Final presentation
    ❖ Do NOT be the guy/girl that makes
    plots like this for a paper or ends up
    at http://wtfviz.net

    View Slide

  7. Charles Joseph Minard

    View Slide

  8. Do NOT abuse 3D
    Roeder K (1994) DNA fingerprinting: A review of the
    controversy (with discussion). Statistical Science
    9:222-278, Figure 4
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  9. What not to do.
    other than not to use Excel…

    View Slide

  10. The beautiful empty
    space
    Wittke-Thompson JK, Pluzhnikov A, Cox NJ (2005)
    Rational inferences about departures from Hardy-
    Weinberg equilibrium. American Journal of Human
    Genetics 76:967-986, Figure 1
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  11. What a beautiful
    straight line
    Epstein MP, Satten GA (2003) Inference on haplotype
    effects in case-control studies using unphased
    genotype data. American Journal of Human Genetics
    73:1316-1329, Figure 1
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  12. How many data
    points is that?
    Hummer BT, Li XL, Hassel BA (2001) Role for p53 in
    gene induction by double-stranded RNA. J Virol
    75:7774-7777, Figure 4
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  13. Pie charts are evil
    Cawley S, et al. (2004) Unbiased mapping of
    transcription factor binding sites along human
    chromosomes 21 and 22 points to widespread
    regulation of noncoding RNAs. Cell 116:499-509,
    Figure 1
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  14. Odd choice of graph
    type
    Kim OY, et al. (2012) Higher levels of serum
    triglyceride and dietary carbohydrate intake are
    associated with smaller LDL particle size in healthy
    Korean women. Nutrition Research and Practice
    6:120-125, Figure 1
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  15. So much wasted ink
    Jorgenson E, et al. (2005) Ethnicity and human
    genetic linkage maps. American Journal of Human
    Genetics 76:276-290, Figure 2
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  16. Kids blocks?
    Cotter DJ, et al. (2004) Hematocrit was not validated
    as a surrogate endpoint for survival amoung epoetin-
    treated hemodialysis patients. Journal of Clinical
    Epidemiology 57:1086-1095, Figure 2
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  17. 3:1 is better than
    1:3?
    Broman KW, Murray JC, Sheffield VC, White RL,
    Weber JL (1998) Comprehensive human genetic
    maps: Individual and sex-specific variation in
    recombination. American Journal of Human Genetics
    63:861-869, Figure 1
    http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/

    View Slide

  18. View Slide

  19. Less is more
    Maximize data-to-ink ratio (Edward Tufte)

    View Slide

  20. Making a good line plot.
    ❖ Remove the unnecessary
    http://vis4.net/blog/posts/doing-the-line-charts-right/

    View Slide

  21. Judicious use of space for legends
    ❖ In-chart legends often waste space
    http://vis4.net/blog/posts/doing-the-line-charts-right/

    View Slide

  22. Choose the baseline well

    View Slide

  23. Aspect ratio
    ❖ “In his text Visualizing Data, William
    Cleveland demonstrates how the
    aspect ratio of a line chart can affect
    an analyst's perception of trends in
    the data. Cleveland proposes an
    optimization technique for
    computing the aspect ratio such
    that the average absolute
    orientation of line segments in the
    chart is equal to 45 degrees. This
    technique, called banking to 45
    degrees, is designed to maximize the
    discriminability of the orientations
    of the line segments in the chart.”
    http://vis.berkeley.edu/papers/banking/
    Two plots of monthly atmospheric carbon dioxide measurements, taken from 1959 to 1990. The first plot, with an aspect ratio of 1.17, reveals an
    accelerating increase in CO2
    levels. The second plot, with an aspect ratio of 7.87, facilitates closer inspection of seasonal fluctuations, revealing a
    gradual attack followed by a steeper decay. These aspect ratios were automatically determined using multi-scale banking.

    View Slide

  24. Do not exceed resolution of visual acuity
    Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE
    Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.
    ❖ Human eye acuity is ~50 cycles/degree or about 1/200
    (0.3 pt) at 10 inches

    View Slide

  25. Do not exceed resolution of visual acuity
    ❖ Human eye acuity is ~50 cycles/degree or about 1/200
    (0.3 pt) at 10 inches
    Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE
    Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.

    View Slide

  26. Show variation with statistics
    Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE
    Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.
    Approaches to encoding min/avg/max values of downsampled data. In the top hi-low trace, the vertical
    bars are perceived as a separate layer and effectively show variance without obscuring trends in the
    average.

    View Slide

  27. Use non-linear scales when needed
    Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE
    Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.
    When drawing the position and size of densely packed genes, encode the gene’s size using a non-linear
    mapping. When the number of data values is large, such as in the OMIM gene track, hollow glyphs are
    effective. For even greater number of points, a density map is preferred.
    chr 1
    <10 10-30 30-50 50-100 100-200 >200
    size (kb)
    RAD54L
    G>A
    rs121908690
    RNASEL
    C>T
    rs74315365
    EPHB2
    SFPQ
    TPM3
    PBX1
    PAX7
    RBM15
    BCL9
    PRCC
    PRRX1
    ABL2
    LHX4
    CDC73
    LCK
    MYCL1
    MUTYH
    TAL1
    BCL10
    CSF1
    CSDE1
    ARNT
    RIT1
    NTRK1
    TPR
    PRG4
    CANCER
    CENSUS
    SNP
    OMIM
    50 100 150 200 Mb

    View Slide

  28. Aggregate data
    Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE
    Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.
    12 54 82 29 25 22 67 61 23 79
    ed theme.
    What is communicated? (A) The raw data imparts no clear message.(B). Binning indicates ranges, not individual values,
    are important. (C). Frequency distribution suggests that there is a shortage of medium-sized values. (D) Individual data
    RQKPVUECPDGTGOQXGFVQGORJCUK\GVTGPFCPFUKIPKƁECPEG
    0-30
    31-60
    61-100 30 60
    *
    A
    B
    C D
    30 60
    29
    25
    23
    22
    12
    54
    82
    79
    67
    61

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. View Slide