Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DST4L 2015 - Ch1

DST4L 2015 - Ch1

James Davenport

February 06, 2015
Tweet

More Decks by James Davenport

Other Decks in Education

Transcript

  1. What do I want to cover today? • About me,

    my blog, my varied interests • The two methods for coming up with blog posts • Visualization & it’s role in asking questions • What is the Quantified Self? • Hands-on examples of my analysis • Discussion: the role of a library for the modern data-driven researcher?
  2. 6

  3. 7

  4. Kepler: Stellar Flare Machine • Long continuous light curves
 (up

    to ~4years) • Very precise photometry
 (~0.01%) • Enormous sample
 (>100,000 solar-type stars) • Look for outliers (super-flares) • Complete samples! 8
  5. 2 methods of idea generation • 1: have Q, look

    for data to answer it • more “scientific” • 2: have interesting data, look for a Q to ask
 or story to tell using it • I find this more common in practice now
  6. Q: What are the most common first names for lawyers

    vs scientists? Tough to get data on this! (have to steal it)
  7. Another source of first names: The Social Security Administration’s 


    “Baby Names” dataset Every first name (and gender) for SSA applicants 1890’s - Present
  8. Ingredients… • a UW Seattle campus map • locations of

    all coffee shops • average human walking speed
  9. Coffee:! Always ! within a ! 2 minute! walk! 2

    min ! 1 min ! 5! Coffee: Always within a 2 minute walk
  10. 80% of US lives within 20 miles of a Starbucks

    ifweassume.com @jradavenport
  11. Bermuda Cuba Haiti Dominican Republic Puerto Rico Jamaica Cayman Islands

    Bahamas Cancún Sable Island Nat’l Park ifweassume.com @jradavenport
  12. People engage with dynamic range & detail Encourage viewer to

    discover for themselves ifweassume.com @jradavenport
  13. Sometimes best question I can ask is: “what does it

    look like?” data questions Data Science
  14. Often I find a data source so interesting that I

    
 look hard for a “story”
  15. Problems with reading CSV files (comma separated values) a, b,

    c, d e, f, g, h 1, 2, , 4 5, 6, 7, 8 red, blue, green, black up, down, left, right a, b, b, a Easy to read:
  16. Problems with reading CSV files (comma separated values) a, b,

    c, d e, f, g, h 1, 2, “Bob, or Mary”, 4 5, 6, 7, 8 red, blue, green, black up, down, left, right a, b, b, a Hard to read:
  17. Problems with reading CSV files (comma separated values) a, b,

    c, d e, f, g, h 1, 2, “Bob, or Mary”, 4 5, 6, 7, 8 red, blue, green, black up, down, left, right a, b, b, a Hard to read:
  18. The Dimensions of Art 65,000 pieces of art from the

    Tate Modern Width Height Width ifweassume.com @jradavenport
  19. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

    2010 2011 Top 10 Best-Seller Book Covers Over Time data from USA Today ifweassume.com @jradavenport
  20. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

    2010 2011 2005 2006 2007 2008 2009 2004 2003 20 200 ifweassume.com @jradavenport
  21. Cool visualizations, but now have more questions than when I

    started! e.g. genre, average color vs rank, average color over time… ifweassume.com @jradavenport