Where's the Data? (2)

Bb08e85e1b61fce0b9ff29e29b931b3f?s=47 James Eggers
February 17, 2014

Where's the Data? (2)

Where's the Data? (2)

Bb08e85e1b61fce0b9ff29e29b931b3f?s=128

James Eggers

February 17, 2014
Tweet

Transcript

  1. Hi

  2. •  I’m James •  I’m 19 •  I do Computer

    Science, here. •  Have a weird obsession with computers @james_eggers
  3. •  Entered Young Scientist twice, won my category both times

    & EMC “Data-Hero” award. •  I made “Better Examinations.ie” which won two Irish Web Awards. •  Spoken at the Dublin Web Summit on the Main Stage, & at the Central Statistics office.
  4. What is data, really?

  5. Data is…

  6. “Data” are lots of individual bits of info that aren’t

    so useful by themselves, but become very useful when you put them together.
  7. Some examples of “data”

  8. Your shoe sizes

  9. The texts you send every day

  10. Your tweets

  11. The books you read

  12. The foods you like

  13. The number of times Kim Kardashian will get married

  14. The amount of horse-meat in your lunch

  15. Life is full of data. Too much to comprehend.

  16. If you had all the data, You could predict anything.

  17. Think of life as an equation A2 + B2 =

    C2 If you know just two variables, say A and B, then you know C too.
  18. A “life” equation If you had enough “data”, you could

    theoretically predict anything in life with a really large equation.
  19. Obviously not so possible (yet)

  20. But, we can roughly get the probability of something happening

  21. Twitter can be used to roughly predict stock market flucuations

  22. Twitter can be used to predict who might win the

    X-Factor
  23. You see tweets about earthquakes before you feel it Speed

    of light is faster than sound
  24. Not just predicting things

  25. Data helps us understand the past and present, too

  26. What are a company’s customers saying about their products?

  27. Obama’s team constantly monitored twitter for public opinion (= data

    = lots of individual opinions accumulated).
  28. So really, data = ^ information (= power?) lot’s of

  29. Some of the things I’ve done with big data

  30. The Vibes of Ireland

  31. Real-Time Mood Monitoring via tweets

  32. Probably your tweets too

  33. Algorithm analyses millions of tweets and marks them as “happy”,

    “unhappy”, or “neutral”.
  34. It was a big part of the Science Gallery’s HAPPY?

    exhibition in May
  35. It also won its category in the BT Young Scientist

    & Technology Exhibition 2011.
  36. Mood of Ireland on an Average Day (Oct 2010 –

    Dec 2010)
  37. Results

  38. People are happiest on a Friday evening, and least happy

    early on a Thursday morning.
  39. There is a definite dip in the mood during the

    middle of the week.
  40. On an average day, people are happiest at about 18:00

    (6pm) and least happy early in the morning 04:00 – 08:00.
  41. I also found that the East Coast is generally in

    a worse mood than the West Coast. When the Budget 2011 was being read, there was a dip in the overall mood.
  42. Average Mood of all people in Ireland over an Average

    week:
  43. Average Mood of all people in Ireland over an Average

    day:
  44. Average Mood of all people in Ireland over an Average

    day, west coast vs. east coast:
  45. People are nearly always happier on the West coast. The

    east coast seems to consistently lag behind in terms of overall happiness.
  46. Better Examinations.ie

  47. Most of you have had the misfortune of having to

    use examinations.ie
  48. It is one of the most backward websites on the

    internet.
  49. No.

  50. But, there is a lot of data available to use.

  51. By training algorithms to store every word in every exam

    paper, students can now search all the papers for specific questions.
  52. Searching for questions that relate to Martin Luther

  53. The Dept. of Education makes statistics about past exams, which

    is cool.
  54. But they all look like this

  55. Not so easy to understand

  56. But, computers can understand.

  57. So, I made a better way

  58. Comparing four years of exam results of Art vs. Biology.

  59. Chemistry vs. Biology

  60. English vs. Irish

  61. Exam Paper Filter

  62. Better Examinations.ie has been used by hundreds of students all

    over Ireland.
  63. It uses the data the department of education make available

    to make life easier for students.
  64. In this case, the data are the exam papers and

    all of the statistics the dept. of education create.
  65. .

  66. .

  67. None
  68. Freeflow Real-time, and automatic road traffic detection.

  69. Automatically gathered real-time road traffic data With Traffic Cameras and

    via Twitter Dispensed structured road traffic info via web app Also displayed data about ice levels on roads, average road temperature and air temperture from the national roads authority.
  70. Automatically gathered real-time road traffic data With Traffic Cameras and

    via Twitter Dispensed structured road traffic info via web app Also displayed data about ice levels on roads, average road temperature and air temperture from the national roads authority.
  71. So, instead of this, to deal with traffic reports

  72. We can just let computers do all the work for

    us
  73. How does it detect cars in an image?

  74. It’s a difficult problem

  75. Eventually I came up with something simple

  76. Instead of getting the computer to look for a car

    in an image, it looks for the absence of a car.
  77. So essentially, the more road it can see = the

    less traffic.
  78. O’Connell Bridge. Obviously the road is all a similar shade

    of grey.
  79. The Computer simply counts up the 4x4 pixel areas of

    black colour. Red = Area of empty space the computer can see
  80. Twitter was a great source of information Tweets were analyzed

    in real-time Looking for words like “accident” or “delays” Location was also found by searching for words like “on” or “at”.
  81. Then used Bing Maps to complete the address, and convert

    to latitude/longitude pair to map. Tweets that mentioned an incident, were kept for 4 hours before being cleared from the system.
  82. Then used Bing Maps to complete the address, and convert

    to latitude/longitude pair to map. Tweets that mentioned an incident, were kept for 4 hours before being cleared from the system.
  83. None
  84. None
  85. .

  86. Working with data with out programming is easy, too.

  87. Step one is to find a source(s) of data you

    want to analyse.
  88. There’s data all over the internet, just waiting to be

    analysed
  89. Find a set of statistics you wish to use, and

    copy it into excel.
  90. Excel will handle most of what you throw at it.

  91. Compare datasets to other datasets.

  92. Compare populations of different country’s to their average exam scores.

  93. Compare shoe sizes to the number of people who have

    a disease.
  94. You could find that people with bigger feet have a

    greater chance of getting disease x.
  95. Do remember though, correlation is not causation.

  96. Just because one thing seems to cause the other, it

    doesn’t mean they do.
  97. E.g. The frequency with which Lindsay Lohan finds herself in

    jail may be correlated with the rate of increasing deforestation, but that doesn’t mean the two events have an effect on each other.
  98. Make sure to have additional evidence too, just to be

    sure of your hypothesis.
  99. Thanks for listening! Questions? @james_eggers