Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Apache Hadoop

Introduction to Apache Hadoop

A short presentation to introduce Apache Hadoop,
what is it and what can it do ? What are the other
products associated with it ?

Mike Frampton

July 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Hadoop • What is it ? • Architecture •

    Related Projects • Large users
  2. Hadoop – What is it ? • An open source

    system developed using Java • Supports very large data sets • Supports large clusters of servers • Designed to run on pre existing low cost hardware • Allows for fragmentation of work over cluster • Allows for fragmentation of storage over cluster • Provides resiliance via automatic failure handling
  3. Hadoop - Architecture Hadoop consists of • Hadoop Common Common

    utilities for Hadoop module support • Hadoop MapReduce Parallel processing of Hadoop data • Hadoop Yarn Scheduler and resource manager • Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
  4. Hadoop – Related Projects • Pig - for analysing large

    data sets • Hive – data warehouse system for Hadoop • Mahout – machine learning and data mining • Avro – a data serialization system • Zoo Keeper – helps build distributed applications • Chukwa – data collection and analysis
  5. Hadoop – Related Projects • Hue – Hadoop user interface

    • Oozie – work flow scheduler • Hama – bulk synchronous parallel framework – For massive scientific computations • Nutch – web crawler • Hbase – Non relational database
  6. Hadoop – Large Users • Yahoo – 10,000 core Linux

    cluster • Facebook – 100 Petabytes, growing at .5 Petabytes a day • Amazon – Its possible to run Hadoop on Amazon's EC2 and S3
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems