Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IHPC/A*Star Seminar on Big Data

IHPC/A*Star Seminar on Big Data

Harish Pillay

August 22, 2013
Tweet

More Decks by Harish Pillay

Other Decks in Technology

Transcript

  1. Big Data – Why Do I care? Harish Pillay Head,

    Community Architecture and Leadership Red Hat [email protected]
  2. Big Data: - Comes from computational sciences - describes scenarios

    where the amount (volume) of data coming IN vastly outstrips the software tools to store and even process it.
  3. We now have the tech and tools to work with

    them – and in real time!
  4. What if you can know how crowded the MRT train

    is and moved away from that part of the platform before boarding?
  5. What if you can improve the chances of knowing the

    exam questions based on analysis of previous exams?
  6. The Large Hadron Collider (LHC) produces millions of collisions every

    second in each detector, generating approximately one petabyte of data per second. None of today’s computing systems are capable of recording such rates, so sophisticated selection systems are used for a first fast electronic pre-selection, only passing one out of 10,000 events. Tens of thousands of processor cores then select 1% of the remaining events for analysis. • http://home.web.cern.ch/about/updates/2013/04/animation-shows-lhc-data-processing
  7. 1. Sign up on: • openshift.redhat.com and • github.com 2.

    install R, Hadoop 3. check out: data.gov.sg for data
  8. But do keep in mind to observe the Bonferroni Principle

    (https://en.wikipedia.org/wiki/Bonferroni_correction) [homework for you]