Hadoop 2 (YARN) : How to setup a single node in Ubuntu (Tutorial)

**Only Draft but public :)**

Tutorial Requirements:

  • Hadoop 2.7.1
  • Java 8
  • Ubuntu (ubuntu-14.04.3-desktop-amd64.iso)
  • VirtualBox

As a convention:

  • we will place our development tools/products under /opt/dev directory.
  • For each product, we will have a directory and the product versions as subdirectories.

1. Set JAVA_HOME

 

++ Add content

 

 

2. Set HADOOP_HOME

 

++ Add content

3. Add Hadoop users :

Choose a password for them

 

4. Data and Log directories

nn: name node?
snn: checkpoint
dn: datanode

 

5. So yarn user..got to be the owner of hadoop installation directory ??

 

6. Configure core-site.xml

 

++ Add content

 

7. Configure hdfs-site.xml :

NameNode : metadata server
DataNode : where the actual data is stored
SecondaryNameNode : checkpoint data for the NameNode

 

++ Add content

 

8. Configure mapreduce-site.xml

Initially the file mapreduce-site.xml doesn’t exist, but it can be cloned from mapreduce-site.xml.template

 

++ Add content

9. Configure yarn-site.xml

 

++ Add content

 

10. Format HDFS

The user hdfs which own the NameNode /var/data/hadoop/hdfs/nn must format this directory to setup a new file system.
You will /var/data/hadoop/hdfs/nn as a value for “dfs.namenode.name.dir” in $HADOOP_HOME/etc/hadoop/hdfs-site.xml

 

Check success by looking for this log:

11. Start HDFS services

 

 

 

 

 

 

 

 

 

Check the services are running by having a PID

 

 

 

12. Start Yarn Services

 

 

 

 

 

 

 

 

 

Remark : It’s almost mandatory to check if any PID for the ran service.
For example, I got no error message for starting nodemanager command.
However, I didn’t find the PID of nodemanager.
Then I decided to check the log using :

 

And I found this error :

 

So I had to fix it by correcting yarn-site.xml :

> From :

 

> To :

13. HDFS Dashboard GUI :

 

To check the logs :

 

14. Yarn (Resource Manager) Dashboard GUI :

 

15. Testing MapReduce

 

Result :