Pro Yearly is on sale from $80 to $50! »

Presenting Effectively with Data (in a Hurry)

Presenting Effectively with Data (in a Hurry)

Short talk given in various forms as part of CRSP 413 over the past few years. My goal here is to show and tell you a little about presenting effectively with data, and setting up your research life so that presenting effectively becomes easier.


Thomas E. Love

August 26, 2020


  1. Presenting Effectively with Data, when you’re in a hurry (…

    You’re always in a hurry) CRSP 413: Communication in Clinical Research Seminar 2020-08-26 Thomas E. Love, Ph.D.
  2. Presenting Research • Usually, this is highly abridged – Slide

    shows – Abstracts – Journal articles – Books – Websites • Announce the findings and try to convince us that the results are correct. Christopher Gandrud’s ideas, mostly – his book is Reproducible Research with R and R Studio
  3. None
  4. You have Ten Minutes? • No time for subtlety. •

    Round, a lot. • Edit, ruthlessly. – One pass through software (“default options”) is never enough. – Better for people to leave the table hungry than stuffed. • Have something to say, and say it clearly. • Some possibilities are never a good choice. I am deeply in Andrew Gelman’s debt – see
  5. All graphs are comparisons. All of statistics are comparisons. I

    am deeply in Andrew Gelman’s debt – see
  6. First Law of Statistics: DTDP • Draw • The •

    D@$% • Picture A picture is worth a lot of numbers...
  7. Karl Broman, “Creating Effective Figures and Tables” at

  8. What’s wrong with this picture?

  9. None

  11. Which of these three bar graphs describes the same data

    as pie graph A?
  12. Stay away from the pie

  13. None


  16. Clearly Communicating Quantitative Information • Are the most important elements

    or relationships visually most prominent? • Are the elements, symbol shapes and colors consistent with their use in previous graphs? • Are all of the graphical elements necessary to convey the relationships? • Are the graphical elements accurately positioned and scaled?
  17. Don’t clutter each plot…

  18. Small multiples

  19. None
  20. What are you trying to do? • Is this information

    visualization (grabby, visually striking – dramatize the problem to draw the casual viewer in deeper) • Or statistical graphics (reveal patterns and discrepancies for viewers who are already interested in the problem) • Make tradeoffs carefully – meaningful choices.
  21. From Karl Broman… Karl Broman, “Creating Effective Figures and Tables”

  22. Don’t sort alphabetically Karl Broman, “Creating Effective Figures and Tables”

  23. Karl Broman, “Creating Effective Figures and Tables” at

  24. None
  25. None
  26. None
  27. None
  28. Reproducible Research? • The goal of reproducible research is to

    tie specific instructions to data analysis so that scholarship can be recreated, better understood and verified. • This is usually facilitated by literate programming – a document that combines content and data analytic code. • Software? R and RStudio, mostly…
  29. None
  30. Goals of Reproducible Analysis • Be able to reproduce your

    own results • Allow others to reproduce your results • Reproduce an entire report, manuscript, thesis, book, website with a single system command when changes occur in: – Operating system, stat software, graphics engines, source data, derived variables, analysis, interpretation • Save time • Provide the ultimate documentation of work done for a paper, etc. /HarrellScottTutorial-useR2012.pdf
  31. Five Practical Tips for Reproducible Research 1. Document everything 2.

    Everything is a (text) file 3. All files should be human-readable 4. Explicitly tie your files together 5. Have a plan to organize, store, and make your files available.
  32. Why we do this…

  33. But other people will use my data and code to

    compete with me? • True. • But competition means that strangers will read your papers, try to learn from them, cite them, and try to do even better. • If you prefer obscurity, why are you publishing? Donohue DL 2010
  34. A book about how to be a scientist the

    modern, open-source way.
  35. FiveThirtyEight (forecast 2020-08-25)

  36. None
  37. None
  38. FiveThirtyEight (forecast 2020-08-25)

  39. None
  40. None
  41. Chatfield’s Six Rules for Data Analysis 1. Do not attempt

    to analyze the data until you understand what is being measured and why. 2. Find out how the data were collected. 3. Look at the structure of the data. 4. Carefully examine the data in an exploratory way, before attempting a more sophisticated analysis. 5. Use your common sense at all times. 6. Report the results in a clear, self-explanatory way. From Problem Solving: A Statistician’s Guide by Chris Chatfield, 2nd Edition, Chapman & Hall.

  43. Howard Wainer, Visual Revelations Diagram of all of the places

    where the planes were damaged the most
  44. be-wrong/ww2-survivorship-bias-560x333/

  45. You have Ten Minutes? • No time for subtlety. •

    Round, a lot. • Edit, ruthlessly. – One pass through software (“default options”) is never enough. – Better for people to leave the table hungry than stuffed. • Have something to say, and say it clearly. • Stay away from the pie.
  46. Statistics is too important to be left to statisticians. See also: Karl Broman’s “Creating Effective Figures and Tables” (slides) at
  47. On being “approximately right rather than exactly wrong” John Tukey

  48. Source: Hermann Brenner, "Long-term survival rates of cancer patients achieved

    by the end of the 20th century: a period analysis," The Lancet, 360 (October 12, 2002), 1131- 1135.

  50. Slopegraphs!


  52. In addition to slopegraphs, consider sparklines: intense, simple, word-sized graphics

    The most common data display is a noun accompanied by a number. For example, a medical patient's current level of glucose is reported in a clinical record as a word and number:
  53. sparklines: intense, simple, word-sized graphics Placed in the relevant context,

    a single number gains meaning. Thus, the most recent measurement of glucose should be compared with earlier measurements for the patient. This data-line shows the path of the last 80 readings of glucose:
  54. sparklines: intense, simple, word-sized graphics Lacking a scale of measurement,

    this free- floating line is de-quantified. At least we do know the value of the line’s right-most data point, which corresponds to the most recent value of glucose, the number recorded at far right. Both representations of the most recent reading are tied together with a color accent:
  55. sparklines: intense, simple, word-sized graphics Some useful context is provided

    by showing the normal range of glucose, here as a gray band. Compared to normal limits, readings above the band horizon are elevated, those below reduced:
  56. sparklines: intense, simple, word-sized graphics For clinical analysis, the task

    is to detect quickly and assess wayward deviations from normal limits, shown here by visual deviations outside the gray band. Multiplying this format brings in additional data from the medical record; a stack, which can show hundreds of variables and thousands of measurements, allows fast effective parallel comparisons:
  57. sparklines: intense, simple, word-sized graphics These little data lines, because

    of their active quality over time, are named sparklines—small, high-resolution graphics usually embedded in a full context of words, numbers, images. Sparklines are datawords: data-intense, design- simple, word-sized graphics.