Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Hadoop MapReduce

An Introduction to Apache Hadoop MapReduce

An Introduction to Apache Hadoop MapReduce, what is it and how does
it work ? What is the map reduce cycle and how are jobs managed. Why should it be used and who are big users and providers ?

Mike Frampton

July 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Hadoop MapReduce • What is it ? • Why

    use it ? • How does it work • Some examples • Big users
  2. MapReduce – What is it ? • Processing engine of

    Hadoop • Developers create Map and Reduce jobs • Used for big data batch processing • Parallel processing of huge data volumes • Fault tolerant • Scalable
  3. MapReduce – Why use it ? • Your data in

    Terabyte / Petabyte range • You have huge I/O • Hadoop framework takes care of – Job and task management – Failures – Storage – Replication • You just write Map and Reduce jobs
  4. MapReduce – How does it work ? Take word counting

    as an example, something that Google does all of the time.
  5. MapReduce – How does it work ? • Input data

    split into shards • Split data mapped to key,value pairs i.e. Bear,1 • Mapped data shuffled/sorted by key i.e. Bear • Sorted data reduced i.e. Bear, 2 • Final data stored on HDFS • There might be extra map layer before shuffle • JobTracker controls all tasks in job • TaskTracker controls map and reduce
  6. MapReduce - Some examples A visual example with colours to

    show you the cycle Split -> Map -> Shuffle -> Reduce
  7. MapReduce - Some examples A visual example of MapReduce with

    job and task trackers added to individual map and reduce jobs.
  8. Hadoop MapReduce – Big users • Users – Facebook –

    Yahoo – Amazon – Ebay • Providers – Amazon – Cloudera – HortonWorks – MapR
  9. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems