Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data in Action

Sumin Byeon
September 03, 2012

Big Data in Action

Sumin Byeon

September 03, 2012
Tweet

More Decks by Sumin Byeon

Other Decks in Science

Transcript

  1. Smartrek • Solution to urban traffic congestion problem • Incentivizes

    users to take alternative routes • Route that minimizes congestion -- less carbon footprint
  2. Smartrek architecture • Frontend: Android, iOS, web app • User

    guidance -- navigation • Data collection • Backend: The brain
  3. • Location, timestamp, speed and direction at regular interval •

    e.g. Every 3 sec or every 5m, whichever comes first • Historical & real-time traffic data Data collection
  4. What we do with data • Imputation -- Fill in

    missing data • Transform into traffic information • Find the optimal route
  5. Intersection matrix O1 O2 O3 O4 I1 C11 C12 C13

    C14 I2 C21 C23 I3 C31 C32 C34 I4 C43 I1 O1 O2 I2
  6. Challenge: big data • In Phoenix, 8,300 links • In

    the Bay Area, 20,000 links • +10,000 cities in the United States • 150,000,000 links in OpenStreetMap database (world-wide) • With 15-min bracket, 96 rows in the cost table, per link, per day • Periodical traffic data update & continuous sensor data incoming
  7. MapReduce • Computing model for efficient distributed computing over large

    data sets • Introduced by Google in 2004 • Open source implementations: Apache Hadoop, HBase, Cassandra, Hypertable, and more http://research.google.com/archive/mapreduce.html
  8. Things we want to do with MapReduce • Imputation -

    Fill in missing links • Link cost table for each link can be constructed independently • Intersection matrix computation
  9. Summary 4th paradigm • Not exactly making a scientific discovery

    by exploring a large amount of data, but... • Data captured by machines • Processed by software into information, knowledge