Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where's the data?

James Eggers
February 04, 2013

Where's the data?

I gave this talk as a lecture to a 3rd year class of students studying Business in Trinity College Dublin.

James Eggers

February 04, 2013
Tweet

More Decks by James Eggers

Other Decks in Business

Transcript

  1. •  I’m James •  I’m 18 •  Doing the Leaving

    Cert, take pity •  Have a weird obsession with computers @james_eggers
  2. •  Entered Young Scientist twice, won my category both times

    & EMC “Data-Hero” award. •  I made “Better Examinations.ie” which won an Irish Web Award. •  Spoken at the Dublin Web Summit on the Main Stage, & at the Central Statistics office re: data.
  3. What I want to talk to you about 1. What data

    really is 2. How can data affect your life 3. Some things I’ve done 4. Where is all that data, anyway? 5. How can you analyse the data, too?
  4. “Data” are lots of individual bits of info that aren’t

    so useful by themselves, but become very useful when you put them together.
  5. Think of life as an equation A2 + B2 =

    C2 If you know just two variables, say A and B, then you know C too.
  6. A “life” equation If you had enough “data”, you could

    theoretically predict anything in life with a really large equation.
  7. •  1990’s •  The static web •  Websites were always

    the same, rarely changed. •  Information was stagnant and outdated.
  8. •  2000+ we start to see the web becomes more

    real-time used more widely. •  Facebook setup in 2004 which sets the stage for massive amounts of social information moving across the internet. •  Imagine it like an Information super- highway.
  9. •  APIs for accessing this information widely + easily available

    to everybody (almost). •  Massive datasets full of information to be accessed and analysed. •  Many avenues of analytics on this data yet to be explored + many ongoing creative experiments.
  10. Facebook   Twi,er   LinkedIn   3.2  Billion  Likes  +

      Comments  per  day   500+  million  Tweets  per   day   200+  Million  People.  
  11. •  Over 160 million people using Twitter. •  Collectively these

    people create 500 million Tweets /day. •  Each Tweet contains meta information (location, time, name of people mentioned in Tweet, info about user account etc). •  Accessing 2-3% of these tweets is free. •  Data from Twitter is widely used in research and statistical projects – it’s proven to work well. •  Experiments such as predicting the stocks have proven very possible with twitter data.
  12. Your lights, kettle, fridge, car, your chair (Facebook: “James just

    sat down, again”) and anything else you can think of
  13. Your kettle would love to be able to send you

    a notification when it’s boiled
  14. If you just think, there are so many was to

    use and harness the data around you.
  15. There’s a lot of data about you online •  Your

    age •  Relations •  Friends •  Likes •  What you look like •  What you say •  Who you talk to •  Your age •  occupation
  16. On an average day, people are happiest at about 18:00

    (6pm) and least happy early in the morning 04:00 – 08:00.
  17. I also found that the East Coast is generally in

    a worse mood than the West Coast. When the Budget 2011 was being read, there was a dip in the overall mood.
  18. People are nearly always happier on the West coast. The

    east coast seems to consistently lag behind in terms of overall happiness.
  19. No.

  20. By training algorithms to store every word in every exam

    paper, students can now search all the papers for specific questions.
  21. In this case, the data are the exam papers and

    all of the statistics the dept. of education create.
  22. .

  23. Automatically gathered real-time road traffic data With Traffic Cameras and

    via Twitter Dispensed structured road traffic info via web app Also displayed data about ice levels on roads, average road temperature and air temperture from the national roads authority.
  24. Automatically gathered real-time road traffic data With Traffic Cameras and

    via Twitter Dispensed structured road traffic info via web app Also displayed data about ice levels on roads, average road temperature and air temperture from the national roads authority.
  25. Instead of getting the computer to look for a car

    in an image, it looks for the absence of a car.
  26. The Computer simply counts up the 4x4 pixel areas of

    black colour. Red = Area of empty space the computer can see
  27. Twitter was a great source of information Tweets were analyzed

    in real-time Looking for words like “accident” or “delays” Location was also found by searching for words like “on” or “at”.
  28. Then used Bing Maps to complete the address, and convert

    to latitude/longitude pair to map. Tweets that mentioned an incident, were kept for 4 hours before being cleared from the system.
  29. Then used Bing Maps to complete the address, and convert

    to latitude/longitude pair to map. Tweets that mentioned an incident, were kept for 4 hours before being cleared from the system.
  30. If you do CS + Business: •  Twitter’s APIs (dev.twitter.com)

    •  Facebook’s APIs (developers.facebook.com) •  Read up on machine learning techniques •  Learn Python, it’s really good for this type of stuff. •  Find unstructured data like road traffic images from cameras, and convert it to structured data to make something cool. •  Take data from anywhere and everywhere (as long as it’s legal), put it all together and see what you get.
  31. You could find that people with bigger feet have a

    greater chance of getting disease x.
  32. E.g. The frequency with which Lindsay Lohan finds herself in

    jail may be correlated with the rate of increasing deforestation, but that doesn’t mean the two events have an effect on each other.