Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hacking Data Visualisations

Hacking Data Visualisations

A look at data visualisations though the ages, and an introduction to R

Melinda Seckington

September 19, 2014
Tweet

More Decks by Melinda Seckington

Other Decks in Technology

Transcript

  1. “I feel that everyday, all of us now are being

    blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.” DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION @mseckington
  2. Theatrum Orbis Terrarum May 20, 1570 The first modern atlas,

    collected by Abraham Ortelis. ! This was a first attempt to gather all maps that were known to man at the time and bind them together. A BRIEF HISTORY OF DATA VISUALISATION
  3. A BRIEF HISTORY OF DATA VISUALISATION Bills of Mortality From

    1603, London parish clerks collected health- related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. ! John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data. BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  4. A BRIEF HISTORY OF DATA VISUALISATION 1644: First known graph

    of statistical data ! MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME
  5. A BRIEF HISTORY OF DATA VISUALISATION 1786 first bar chart

    William Playfair Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781
  6. A BRIEF HISTORY OF DATA VISUALISATION Street map of cholera

    deaths in Soho 1853 John Snow Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  7. A BRIEF HISTORY OF DATA VISUALISATION Diagram of the Causes

    of Mortality in the Army in the East ! 1858 Florence Nightingale In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56) BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN
  8. A QUICK INTRO TO R What is R? ! R

    is a free programming language and environment for statistical computing and graphics. ! @mseckington
  9. A QUICK INTRO TO R What is R? ! R

    is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. @mseckington
  10. A QUICK INTRO TO R What is R? ! R

    is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. @mseckington
  11. A QUICK INTRO TO R What is R? ! R

    is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. ! Highly and easily extensible. @mseckington
  12. ! > data()! ! list all datasets available ! >

    movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! @mseckington
  13. ! > data()! ! list all datasets available ! >

    movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! @mseckington
  14. ! > data()! ! list all datasets available ! >

    movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! > names(movies)! [1] "title" “year" “length" “budget" "rating" “votes" ! [7] “r1" “r2" “r3" “r4" “r5" “r6"! [13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! [19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! @mseckington
  15. ! > movies[7079,]! ! !! title ! ! ! !

    ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 ! 119! ! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! @mseckington
  16. ! > movies[7079,]! ! !! title ! ! ! !

    ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 ! 119! ! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! > movies[1:10,]! . . . ! ! returns rows 1 to 10 @mseckington
  17. ! > movies[,1]! . . .! ! returns 1 column

    => titles of all movies @mseckington
  18. ! > movies[,1]! . . .! ! returns 1 column

    => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! @mseckington
  19. ! > movies[,1]! . . .! ! returns 1 column

    => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! > movies[,1:10]! . . .! ! returns columns 1 to 10 @mseckington
  20. ! > hist(movies$year) Histogram of movies$year movies$year Frequency 1900 1920

    1940 1960 1980 2000 0 2000 4000 6000 8000 @mseckington
  21. ! > hist(movies$year)! ! > hist(movies$rating) Histogram of movies$rating movies$rating

    Frequency 2 4 6 8 10 0 2000 4000 6000 8000 @mseckington
  22. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! !

    > qplot(rating, ! ! ! !! data=movies, ! !! geom="histogram") @mseckington
  23. ! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! !

    > qplot(rating, ! ! ! !! data=movies, ! !! geom=“histogram")! ! > qplot(rating, ! ! !! data=movies, ! !! geom="histogram", !! binwidth=1) @mseckington
  24. ! > m = ggplot(movies, aes(rating))! ! > m +

    geom_histogram() @mseckington
  25. ! > m = ggplot(movies, aes(rating))! ! > m +

    geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..)) @mseckington
  26. ! > m = ggplot(movies, aes(rating))! ! > m +

    geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! @mseckington
  27. ! > m = ggplot(movies, aes(rating))! ! > m +

    geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! > x = m + geom_histogram(! ! ! ! ! binwidth = 0.5)! > x + facet_grid(Action ~ Comedy)! @mseckington
  28. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”,

    "API secret", "Access token", "Access secret”)! ! @mseckington
  29. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! !

    header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table) @mseckington
  30. ! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! !

    header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table)! ! > pie(source_table, ! ! ! radius=0.6, ! ! ! col=rainbow(8)) @mseckington
  31. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”,

    "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100) @mseckington
  32. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”,

    "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! @mseckington
  33. ! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”,

    "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! > tweet_corpus <- tm_map(tweet_corpus, ! ! ! ! ! ! ! ! ! ! ! content_transformer(tolower))! > tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! > tweet_corpus <- tm_map(tweet_corpus, ! ! ! ! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))