Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data in Action

0aa5743bd364213c11abd871b2325f65?s=47 Sumin Byeon
September 03, 2012

Big Data in Action

0aa5743bd364213c11abd871b2325f65?s=128

Sumin Byeon

September 03, 2012
Tweet

Transcript

  1. Big Data In Action University of Arizona ISTA 520 Sumin

    Byeon
  2. Smartrek • Solution to urban traffic congestion problem • Incentivizes

    users to take alternative routes • Route that minimizes congestion -- less carbon footprint
  3. Smartrek architecture • Frontend: Android, iOS, web app • User

    guidance -- navigation • Data collection • Backend: The brain
  4. • Location, timestamp, speed and direction at regular interval •

    e.g. Every 3 sec or every 5m, whichever comes first • Historical & real-time traffic data Data collection
  5. What we do with data • Imputation -- Fill in

    missing data • Transform into traffic information • Find the optimal route
  6. Missing data http://maps.google.com

  7. Imputation

  8. Intersection matrix O1 O2 O3 O4 I1 C11 C12 C13

    C14 I2 C21 C23 I3 C31 C32 C34 I4 C43 I1 O1 O2 I2
  9. Finding shortest path • Famous example: Dijkstra’s algorithm http://en.wikipedia.org/wiki/File:Dijkstra_Animation.gif

  10. Time dependent shortest path

  11. Link cost table 0:00 5 0:15 4 .. 7:45 17

    .. 23:45 5
  12. Challenge: big data • In Phoenix, 8,300 links • In

    the Bay Area, 20,000 links • +10,000 cities in the United States • 150,000,000 links in OpenStreetMap database (world-wide) • With 15-min bracket, 96 rows in the cost table, per link, per day • Periodical traffic data update & continuous sensor data incoming
  13. MapReduce • Computing model for efficient distributed computing over large

    data sets • Introduced by Google in 2004 • Open source implementations: Apache Hadoop, HBase, Cassandra, Hypertable, and more http://research.google.com/archive/mapreduce.html
  14. Things we want to do with MapReduce • Imputation -

    Fill in missing links • Link cost table for each link can be constructed independently • Intersection matrix computation
  15. Summary 4th paradigm • Not exactly making a scientific discovery

    by exploring a large amount of data, but... • Data captured by machines • Processed by software into information, knowledge