hadoop 1.2 集群搭建与环境配置

 

1、虚拟机环境html

见个人另外一篇博客http://www.cnblogs.com/xckk/p/6000881.htmljava

须要安装JDK环境,centos下安装JDK可参考:node

http://www.centoscn.com/image-text/install/2014/0827/3585.htmllinux

注意三台机器均要配置,由于启动时,namenode会启动其它机器。apache

 

2、hadoop环境centos

hadoop1.2.1浏览器

idk7u79-linux-i586.tar.gzbash

 

3、开始搭彻底分布式hadoop分布式

三台机器,角色配置以下oop

hostname 角色配置
centos1 namenode
centos2 datanode, secondaryNameNode
centos3 datanode

 

一、hadoop配置

   分别在三台虚拟机内拷贝hadoop-1.2.1.tar.gz,并解压: tar -zxvf hadoop-1.2.1.tar.gz

   conf目录文件配置

conf目录下文件 配置项 备注
core-site.xml

<configuration>

    <property>

         <name>fs.default.name</name>

         <value>hdfs://centos:9000</value>

     </property>

    <property>

         <name>hadoop.tmp.dir</name>

         <value>/opt/hadoop-tmp</value>

     </property>

</configuration>

一、fs.default.name:配置namenode站点地址。须要注意。9000端口是hdfs rpc协议端口,若是从浏览器访问,则是http协议,端口号是50070.例:http://cetnos:50070
二、hadoop.tmp.dir默认配置/tmp/hadoop-${user.name},hadoop文件数据会默认配置在此目录,因为tmp目录,linux重启后自动清空,所以这里对hadoop.tmp.dir进行手动指定。

hadoop-env.sh export JAVA_HOME=/usr/local/java/jdk1.7.0_79

配置jdk时在系统配置了JAVA_HOME,可是hadoop不认,必需要在这里修改为你如今的路径。

hdfs-site.xml

<configuration>

     <property>

         <name>dfs.replication</name>

         <value>2</value>

     </property>

</configuration>

lock最大附本数,配置2台dataNode,此处填2.
masters centos1 master文件用来配置secondaryNameNod, 注意是secondaryNameNode,不是nameNode
slaves

centos1
centos2  (一个节点一行)

slaves用来配置dataNode

 

二、格式化dfs

cd /home/hadoop-1.2/bin/

./hadoop namenode -format    (格式化成功后会在/opt/hadoop-tmp/dfs/name/下生成格式化文件)

 

三、hadoop路径每台机器要保持一致

配置完成后,NameNode机器上输入命令,便可完成hadoopo的hdfs分布式服务的启动。

root@centos bin]# ./start-dfs.sh

 

四、启动后结果

在浏览器中输入http://{NameNode IP}:50070便可访问配置的hdfs环境

 

4、配置中常见问题

一、JAVA_HOME is not set.

错误日志:

root@centos bin]# ./start-dfs.sh 

starting namenode, logging to /home/alvin/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-centos.out

centos1: bash: line 0: cd: /home/alvin/hadoop-1.2.1/libexec/..: No such file or directory

centos2: bash: line 0: cd: /home/alvin/hadoop-1.2.1/libexec/..: No such file or directory

centos2: starting datanode, logging to /opt/hadoop-1.2/libexec/../logs/hadoop-root-datanode-centos2.out

centos1: starting datanode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-centos1.out

centos2: Error: JAVA_HOME is not set.

centos1: Error: JAVA_HOME is not set.

centos1: bash: line 0: cd: /home/alvin/hadoop-1.2.1/libexec/..: No such file or directory

centos1: starting secondarynamenode, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-centos1.out

centos1: Error: JAVA_HOME is not set.

 

产生缘由三种:

(1)、JDK未配置

解决方法:配置JDK解决,可参考

http://www.centoscn.com/image-text/install/2014/0827/3585.html

(2)、hadoop在每台机器路径不一致,会报JAVA_HOME is not set.问题

该问题首先会想到centos1, centos2两台机器JDK环境是否有配置JAVA_HOME,可是查看均已配了JDK环境。

仔细阅读日志,发现问题是No such file or directory

centos1: bash: line 0: cd: /home/alvin/hadoop-1.2.1/libexec/..: No such file or directory

centos2: bash: line 0: cd: /home/alvin/hadoop-1.2.1/libexec/..: No such file or directory

解决方法:hadoop在每台机器配置路径要一致

(3)、hadoop-env.sh里面没有配置JAVA_HOME

解决方法:配置jdk时在系统配置了JAVA_HOME,可是hadoop不认,必需要在这里修改为你如今的路径。

 

分析缘由:

因为centos1下hadoop路径在/home/alvin/hadoop-1.2.1/下,centos2与centos3路径在/opt/hadoop-1.2/下,

启动datanode或secondaryNameNode节点时,报No such file or directory.

能够看到centos2与centos3都是按照centos1机器的hadoop路径去读取文件的。所以报 No such file or directory

 

二、启动dfs服务时,jps命令查看,NameNode启动,DataNode和SecondaryNameNode未启动。

   防火墙未关,输入命令service iptables stop关闭三台机器防火墙

 

三、org.apache.hadoop.security.AccessControlException

解决方法

在 hdfs-site.xml 添加参数:

<property>
        <name>dfs.permissions</name>
        <value>false</value>
  </property> 
</configuration>

 

四、org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /opt/hadoop-tmp/dfs/data: namenode namespaceID = 1165565627; datanode namespaceID = 1468616188

解决方法:以name为准,将namenode节点上${hadoop-tmp}/hfs/data/current/VERSION文件的namespaceID改成${hadoop-tmp}/hfs/name/current/VERSION文件的namespaceID。同时将datanode节点上的${hadoop-tmp}/hfs/data/current/VERSION文件的namespaceID相应修改

 

五、org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid

发生错误的缘由就是hdfs下的data文件夹权限设置错误,应为rwxr-xr-x,所以修改方式以下:

解决方法:chmod 755 /opt/hadoop-tmp/ –R  

 

六、ERROR security.UserGroupInformation: PriviledgedActionException as:alvin cause:java.net.ConnectException: Call to 192.168.95.134/192.168.95.134:9091 failed on connection exception: java.net.ConnectException: Connection refused

java.net.ConnectException: Call to 192.168.95.134/192.168.95.134:9091 failed on connection exception: java.net.ConnectException: Connection refused

at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)

解决方法:确认IP和端口号是否正确,mapreduce端口号是9001,这里是9091.因此报错

 

七、启动hadoop时,log中出现:java.io.IOException: NameNode is not formatted.

解决方法:初始化namenode 

./hadoop namenode –format

 

秀才坤坤 出品

转载请注明源连接地址:http://www.cnblogs.com/xckk/p/6124553.html