Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Hadoop MapReduce

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

An Introduction to Apache Hadoop MapReduce

An Introduction to Apache Hadoop MapReduce, what is it and how does
it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ?

Avatar for Mike Frampton

Mike Frampton

July 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Hadoop MapReduce • What is it ? • Why

    use it ? • How does it work • Some examples • Big users
  2. MapReduce – What is it ? • Processing engine of

    Hadoop • Developers create Map and Reduce jobs • Used for big data batch processing • Parallel processing of huge data volumes • Fault tolerant • Scalable
  3. MapReduce – Why use it ? • Your data in

    Terabyte / Petabyte range • You have huge I/O • Hadoop framework takes care of – Job and task management – Failures – Storage – Replication • You just write Map and Reduce jobs
  4. MapReduce – How does it work ? Take word counting

    as an example, something that Google does all of the time.
  5. MapReduce – How does it work ? • Input data

    split into shards • Split data mapped to key,value pairs i.e. Bear,1 • Mapped data shuffled/sorted by key i.e. Bear • Sorted data reduced i.e. Bear, 2 • Final data stored on HDFS • There might be extra map layer before shuffle • JobTracker controls all tasks in job • TaskTracker controls map and reduce
  6. MapReduce - Some examples A visual example with colours to

    show you the cycle Split -> Map -> Shuffle -> Reduce
  7. MapReduce - Some examples A visual example of MapReduce with

    job and task trackers added to individual map and reduce jobs.
  8. Hadoop MapReduce – Big users • Users – Facebook –

    Yahoo – Amazon – Ebay • Providers – Amazon – Cloudera – HortonWorks – MapR
  9. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems