Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hadoop 2.x HDFS Cluster Installation (VirtualBox)

Hadoop 2.x HDFS Cluster Installation (VirtualBox)

This is a straight-forward tutorial for those who are goring to use HDFS in an academic environment on their notebooks or PCs.

Amir Sedighi

December 02, 2014
Tweet

More Decks by Amir Sedighi

Other Decks in Programming

Transcript

  1. هدش عیزوت هداد شزادرپ هاگراک یتشهبدیهش -سیدرپ رتویپماک یسدنهم و

    مولع هدکشناد :سرد هدش عیزوت هداد هاگیاپ :داتسا ییابطابط یداه رتکد :هئارا یقیدص لضفلاوبا رذآ ۱۳۹۳
  2. 4 Topics • Assumptions • First Node – Installing Java

    – Downloading and Extracting Hadoop – Hadoop and Java Env Variables – Disabling IP6 – Configuring Hadoop • Cloning • HDFS – Starting HDFS • HDFS Health • FS Commands • Reclaiming Space • Reducing Replication Factor
  3. 8 Hadoop and Java Env Variables • Append the following

    definitions to /etc/profile or ~/.bashrc export HADOOP_PREFIX="/home/amir/hadoop-2.2.0" export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=$HADOOP_PREFIX export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" export JAVA_HOME=/usr/java/jdk1.7.0_55 export PATH=$PATH:$JAVA_HOME/bin:/home/amir/hadoop- 2.2.0/bin:/home/amir/hadoop-2.2.0/sbin
  4. 9 Disabling IP6 • $ sudo nano /etc/sysctl.conf # Disable

    IPv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
  5. 10 Hadoop Configuration • You would need to create or

    modify the following files inside hadoop/etc/hadoop: – slaves – core-site.xml – yarn-site.xml – hdfs-site.xml – hadoop-env.sh
  6. 14 core-site.xml • Edit core-site.xml and apply the following: <configuration>

    <property> <name>fs.defaultFS</name> <value>hdfs://u01/</value> <description>NameNode URI</description> </property> </configuration>
  7. 16 yarn-site.xml <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>u01</value> <description>The hostname of the

    RM.</description> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
  8. 22 Cloning • Extend the cluster by cloning. – NOTE:

    Find the instruction here: • http://www.slideshare.net/AmirSedighi/distrinuted-data- processing-workshop-sbu
  9. 23 HDFS • The Hadoop Distributed File System (HDFS) is

    a distributed file system designed to run on commodity hardware. • It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. • HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. • HDFS provides high throughput access to application data and is suitable for applications that have large data sets. • HDFS relaxes a few POSIX requirements to enable streaming access to file system data. • HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project.
  10. 27 HDFS Health • $ jps – NameNode – DataNode

    • Check log files • Web UI – http://u01:50070
  11. 29

  12. 31 Hadoop FS Commands • cat • chmod • chown

    • copyFromLocal • copyToLocal • cp • du • expunge • get • ls • mkdir • put • rm • tail
  13. 33 Space Reclamation • Delete Files – $ hadoop fs

    -rm /filename – $ hadoop fs -expunge • Decrease Replication Factor
  14. 34 How to change replication factor of existing files in

    HDFS • To set replication of an individual file to 4: – hadoop dfs -setrep -w 4 /path/to/file • You can also do this recursively. To change replication of entire HDFS to 1: – hadoop dfs -setrep -R -w 1 /