in order to run the hadoop in fully distributed mode, at least we need two VMs. I will config a hadoop cluster with two Data Nodes , one namenode and secondary namenode.
MachineName | Role |
Home (192.168.209.130) | Primary + Secondary Namenode, Slave(data Node), Job Tracker Node |
LA (192.168.209.132) | Slave Node (Data Node) |
there are some prerequisites to run hadoop,
- Create Hadoop user and group on each Server.
- on Home Node, generate the ssh key and export the pub key to both servers. hadoop user on home node can access each server (login without password)
- install java 1.6
- configuration change. (hosts file, hadoop conf directory.)
for the configuration, basically 3 xml located in the hadoop/conf directory.
HDFS-Site.xml, the essential configration to HDFS, since we only have two data nodes, change the replciation factor to 2. and assign the path of the name server, you ‘d better keep the file into the hadoop directory, or somewhere else make sure hadoop can update that folder. |
<configuration> |
Core-site.xml, assign the cluster name, I use name server Home here |
<configuration> <property> |
mapred-site.xml |
|
then How can we tell the cluster what are those members? the magic is the masters and slavers file under conf directory
masters just put the secondary name server, I will put Home here slavers, both servers, home and LA |
After that, copy the hadoop directory with permission to LA server.
then on the name server , here will be home, run “bin/hadoop nameserver –format”
once done, make sure no error.
time to start up the Clusters.
How to do that, just run “bin/start-all.sh”, the script will check all those server list for their roles, and ssh to that server to start the JVM.
Basically, as I mentationed in the beginning, in Home server just 5 role with each as a jvm process.
here is the 5 jvm argument.
NameNode | java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-namenode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.namenode.NameNode |
data node | java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-datanode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.datanode.DataNode |
secondary name node | java -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-secondarynamenode-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode |
jobtrackr | java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-jobtracker-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.mapred.JobTracker |
tasktrackr | java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/bin/../logs -Dhadoop.log.file=hadoop-hadoop-tasktracker-home.log -Dhadoop.home.dir=/usr/lib/hadoop/bin/.. -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.library.path=/usr/lib/hadoop/bin/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath org.apache.hadoop.mapred.TaskTracker |
once done, you can access http://home:50070, to see the cluster is running with two nodes.
More Hadoop Blogs
No comments:
Post a Comment