My Note on Solutions.: How to: install and config hadoop to run in fully distributed Mode, Centos

in order to run the hadoop in fully distributed mode, at least we need two VMs. I will config a hadoop cluster with two Data Nodes , one namenode and secondary namenode.

MachineName	Role
Home (192.168.209.130)	Primary + Secondary Namenode, Slave(data Node), Job Tracker Node
LA (192.168.209.132)	Slave Node (Data Node)

there are some prerequisites to run hadoop,

Create Hadoop user and group on each Server.
on Home Node, generate the ssh key and export the pub key to both servers. hadoop user on home node can access each server (login without password)
install java 1.6
configuration change. (hosts file, hadoop conf directory.)

for the configuration, basically 3 xml located in the hadoop/conf directory.

HDFS-Site.xml, the essential configration to HDFS, since we only have two data nodes, change the replciation factor to 2. and assign the path of the name server, you ‘d better keep the file into the hadoop directory, or somewhere else make sure hadoop can update that folder.

<configuration>
<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>
<property>
     <name>dfs.permissions</name>
     <value>false</value>
</property>
<property>
     
     <name>dfs.name.dir</name>
    <value>/usr/lib/hadoop/cache/hadoop/dfs/name</value>
</property>
</configuration>

Core-site.xml, assign the cluster name, I use name server Home here

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Home:8020</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/lib/hadoop/cache/${user.name}</value>
</property>
</configuration>
~

mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>home:8021</value>
</property>
</configuration>
~

then How can we tell the cluster what are those members? the magic is the masters and slavers file under conf directory

masters
just put the secondary name server, I will put Home here

slavers,
both servers, home and LA

After that, copy the hadoop directory with permission to LA server.

then on the name server , here will be home, run “bin/hadoop nameserver –format”
once done, make sure no error.

time to start up the Clusters.
How to do that, just run “bin/start-all.sh”, the script will check all those server list for their roles, and ssh to that server to start the JVM.

Basically, as I mentationed in the beginning, in Home server just 5 role with each as a jvm process.
here is the 5 jvm argument.

NameNode	java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-namenode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.namenode.NameNode
data node	java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-datanode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.datanode.DataNode
secondary name node	java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
jobtrackr	java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-jobtracker-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.mapred.JobTracker
tasktrackr	java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-tasktracker-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.mapred.TaskTracker