Slide 1

Slide 1 text

Hacking Data Visualisations MELINDA SECKINGTON ! @MSECKINGTON

Slide 2

Slide 2 text

@mseckington

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Hacking data visualisations @mseckington

Slide 7

Slide 7 text

Why?

Slide 8

Slide 8 text

https://www.flickr.com/photos/laurenmanning/6632168961/

Slide 9

Slide 9 text

https://www.flickr.com/photos/jamjar/5491205608

Slide 10

Slide 10 text

“I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.” DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION @mseckington

Slide 11

Slide 11 text

Tor Norretranders THE BANDWIDTH OF OUR SENSES @mseckington

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

A brief history of data visualisations

Slide 17

Slide 17 text

Theatrum Orbis Terrarum May 20, 1570 The first modern atlas, collected by Abraham Ortelis. ! This was a first attempt to gather all maps that were known to man at the time and bind them together. A BRIEF HISTORY OF DATA VISUALISATION

Slide 18

Slide 18 text

https://www.flickr.com/photos/smailtronic/2361594300

Slide 19

Slide 19 text

A BRIEF HISTORY OF DATA VISUALISATION Bills of Mortality From 1603, London parish clerks collected health- related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. ! John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data. BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

Slide 20

Slide 20 text

A BRIEF HISTORY OF DATA VISUALISATION 1644: First known graph of statistical data ! MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME

Slide 21

Slide 21 text

A BRIEF HISTORY OF DATA VISUALISATION

Slide 22

Slide 22 text

A BRIEF HISTORY OF DATA VISUALISATION 1786 first bar chart William Playfair Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781

Slide 23

Slide 23 text

A BRIEF HISTORY OF DATA VISUALISATION Street map of cholera deaths in Soho 1853 John Snow Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

Slide 24

Slide 24 text

A BRIEF HISTORY OF DATA VISUALISATION Diagram of the Causes of Mortality in the Army in the East ! 1858 Florence Nightingale In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56) BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

Slide 25

Slide 25 text

How?

Slide 26

Slide 26 text

HOW? https://www.flickr.com/photos/jdhancock/8031897271

Slide 27

Slide 27 text

https://www.flickr.com/photos/laurenmanning/5658951917/

Slide 28

Slide 28 text

HOW? @mseckington

Slide 29

Slide 29 text

HOW? @mseckington

Slide 30

Slide 30 text

HOW? @mseckington

Slide 31

Slide 31 text

HOW? @mseckington

Slide 32

Slide 32 text

HOW? @mseckington

Slide 33

Slide 33 text

A quick intro to R

Slide 34

Slide 34 text

A QUICK INTRO TO R What is R? ! @mseckington

Slide 35

Slide 35 text

A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! @mseckington

Slide 36

Slide 36 text

A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. @mseckington

Slide 37

Slide 37 text

A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. @mseckington

Slide 38

Slide 38 text

A QUICK INTRO TO R What is R? ! R is a free programming language and environment for statistical computing and graphics. ! Created by statisticians for statisticians. ! Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. ! Highly and easily extensible. @mseckington

Slide 39

Slide 39 text

A QUICK INTRO TO R

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

! > data()! ! list all datasets available ! @mseckington

Slide 42

Slide 42 text

! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! @mseckington

Slide 43

Slide 43 text

! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! @mseckington

Slide 44

Slide 44 text

! > data()! ! list all datasets available ! > movies = data(movies)! > movies <- data(movies)! ! assign movies data to movies variable ! > dim(movies)! [1] 58788! 24! ! > names(movies)! [1] "title" “year" “length" “budget" "rating" “votes" ! [7] “r1" “r2" “r3" “r4" “r5" “r6"! [13] “r7" “r8" “r9" “r10" “mpaa" “Action" ! [19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"! @mseckington

Slide 45

Slide 45 text

! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 ! 119! ! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! @mseckington

Slide 46

Slide 46 text

! > movies[7079,]! ! !! title ! ! ! ! ! year ! length budget rating votes ! 7079 Bourne Identity, The 2002 ! 119! ! 75000000 7.3 ! 29871 ! ! r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa ! 4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13! ! Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0! ! returns 1 row => all the data for 1 movies ! > movies[1:10,]! . . . ! ! returns rows 1 to 10 @mseckington

Slide 47

Slide 47 text

! > movies[,1]! . . .! ! returns 1 column => titles of all movies @mseckington

Slide 48

Slide 48 text

! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! @mseckington

Slide 49

Slide 49 text

! > movies[,1]! . . .! ! returns 1 column => titles of all movies ! > movies$title! . . .! ! same as movies[,1]! returns column with the label ‘title ! > movies[,1:10]! . . .! ! returns columns 1 to 10 @mseckington

Slide 50

Slide 50 text

! > hist(movies$year) @mseckington

Slide 51

Slide 51 text

! > hist(movies$year) Histogram of movies$year movies$year Frequency 1900 1920 1940 1960 1980 2000 0 2000 4000 6000 8000 @mseckington

Slide 52

Slide 52 text

! > hist(movies$year)! ! > hist(movies$rating) @mseckington

Slide 53

Slide 53 text

! > hist(movies$year)! ! > hist(movies$rating) Histogram of movies$rating movies$rating Frequency 2 4 6 8 10 0 2000 4000 6000 8000 @mseckington

Slide 54

Slide 54 text

! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2) @mseckington

Slide 55

Slide 55 text

! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, ! ! ! !! data=movies, ! !! geom="histogram") @mseckington

Slide 56

Slide 56 text

! > hist(movies$year)! ! > hist(movies$rating)! ! > library(ggplot2)! ! > qplot(rating, ! ! ! !! data=movies, ! !! geom=“histogram")! ! > qplot(rating, ! ! !! data=movies, ! !! geom="histogram", !! binwidth=1) @mseckington

Slide 57

Slide 57 text

! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram() @mseckington

Slide 58

Slide 58 text

! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..)) @mseckington

Slide 59

Slide 59 text

! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! @mseckington

Slide 60

Slide 60 text

! > m = ggplot(movies, aes(rating))! ! > m + geom_histogram()! ! > m + geom_histogram(! ! ! ! aes(fill = ..count..))! ! > m + geom_histogram(! ! ! ! colour = "darkgreen", ! ! ! ! fill = "white", ! ! ! ! binwidth = 0.5)! ! > x = m + geom_histogram(! ! ! ! ! binwidth = 0.5)! > x + facet_grid(Action ~ Comedy)! @mseckington

Slide 61

Slide 61 text

! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! @mseckington

Slide 62

Slide 62 text

FUTURELEARN STATS

Slide 63

Slide 63 text

! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! @mseckington

Slide 64

Slide 64 text

! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table) @mseckington

Slide 65

Slide 65 text

! > fl = read.csv(! ! ! "futurelearn_dataset.csv", ! ! header=TRUE)! ! > source_table = table(fl$age)! > pie(source_table)! ! > pie(source_table, ! ! ! radius=0.6, ! ! ! col=rainbow(8)) @mseckington

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100) @mseckington

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! @mseckington

Slide 70

Slide 70 text

! > library(twitteR)! ! > setup_twitter_oauth(! ! ! "API key”, "API secret", "Access token", "Access secret”)! ! > tweets <- searchTwitter('futurelearn', n=100)! ! > library(“tm”)! ! > tweet_text <- sapply(tweets, function(x) x$getText())! > tweet_corpus <- Corpus(VectorSource(tweet_text))! ! > tweet_corpus <- tm_map(tweet_corpus, ! ! ! ! ! ! ! ! ! ! ! content_transformer(tolower))! > tweet_corpus <- tm_map(tweet_corpus, removePunctuation)! > tweet_corpus <- tm_map(tweet_corpus, ! ! ! ! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington

Slide 73

Slide 73 text

! > library(wordcloud)! ! > wordcloud(tweet_corpus) @mseckington

Slide 74

Slide 74 text

What next?

Slide 75

Slide 75 text

A QUICK INTRO TO R

Slide 76

Slide 76 text

A QUICK INTRO TO R

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

WHAT NEXT? @mseckington

Slide 79

Slide 79 text

https://www.flickr.com/photos/jamjar/5491205608

Slide 80

Slide 80 text

@mseckington

Slide 81

Slide 81 text

Recap

Slide 82

Slide 82 text

Data visualisations are awesome @mseckington

Slide 83

Slide 83 text

R is awesome @mseckington

Slide 84

Slide 84 text

Any questions? ! @mseckington