Introduction to Apache Hadoop

Apache Hadoop • What is it ? • Architecture •
Related Projects • Large users

Hadoop – What is it ? • An open source
system developed using Java • Supports very large data sets • Supports large clusters of servers • Designed to run on pre existing low cost hardware • Allows for fragmentation of work over cluster • Allows for fragmentation of storage over cluster • Provides resiliance via automatic failure handling

Hadoop - Architecture Hadoop consists of • Hadoop Common Common
utilities for Hadoop module support • Hadoop MapReduce Parallel processing of Hadoop data • Hadoop Yarn Scheduler and resource manager • Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.

Hadoop – Related Projects

Hadoop – Related Projects • Pig - for analysing large
data sets • Hive – data warehouse system for Hadoop • Mahout – machine learning and data mining • Avro – a data serialization system • Zoo Keeper – helps build distributed applications • Chukwa – data collection and analysis

Hadoop – Related Projects • Hue – Hadoop user interface
• Oozie – work flow scheduler • Hama – bulk synchronous parallel framework – For massive scientific computations • Nutch – web crawler • Hbase – Non relational database

Hadoop – Large Users • Yahoo – 10,000 core Linux
cluster • Facebook – 100 Petabytes, growing at .5 Petabytes a day • Amazon – Its possible to run Hadoop on Amazon's EC2 and S3

Contact Us • Feel free to contact us at –
www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

Introduction to Apache Hadoop

Introduction to Apache Hadoop

Mike Frampton

More Decks by Mike Frampton

Other Decks in Technology

Featured

Transcript

Apache Hadoop • What is it ? • Architecture •

Hadoop – What is it ? • An open source

Hadoop - Architecture Hadoop consists of • Hadoop Common Common

Hadoop – Related Projects

Hadoop – Related Projects • Pig - for analysing large

Hadoop – Related Projects • Hue – Hadoop user interface

Hadoop – Large Users • Yahoo – 10,000 core Linux

Contact Us • Feel free to contact us at –