hadoop集群部署

时间 2019-11-17 标签 hadoop 集群部署

首先说明下，整理的比较乱，遇到问题，解决问题便可html

1. 须要确认部署的服务器ipjava

0 1 2 3 表明四个ipnode

另外须要一台服务器，作远程操控用web

2. 在操控机上执行 ssh-keygen，生成本机秘钥文件（若是已经有，跳过本步骤），好比用户 test，秘钥文件路径为 /home/test/.ssh/shell

操控机上须要安装ansible express

配置ansible安装源apache

wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo

安装ansiblebash

1 yum -y install ansible 服务器

准备 ansible host清单文件： hosts网络

内容以下：

【hadoop_host】
0ip ansible_ssh_user=test
1ip ansible_ssh_user=test
2ip ansible_ssh_user=test 
3ip ansible_ssh_user=test

确保操控机到 hadoop集群服务的网路ok，至少ssh没问题

前提工做准备完成后，准备进行初始化工做，初始化工做包括（sudo无密码，远程容许sudo执行，ulimit 系统调优）

1.首先设置test用户执行sudo无密码操做
前提是test用户在wheel组
ansible -i ./hosts hadoop_host -m shell -a " sed 's/^# %wheel.*NOPASSWD: ALL/%wheel ALL=(ALL) NOPASSWD: ALL/' -i /etc/sudoers" -s --ask-sudo-pass-k 是要求输入密码选项
输入密码后，此命令的做用是test用在在远程服务器上能够sudo无密码
2. 操做 test用户能够远程执行sudo命令权限
ansible -i ./hosts hadoop_host -m shell -a " sed -i '/Defaults.*requiretty/a Defaults: test \!requiretty' /etc/sudoers" -s --ask-sudo-pass3. ulimit参数调整

ansible -i ./hosts hadoop_host -m shell -a " sed -i  '$ a fs.file-max = 65535'  /etc/sysctl.conf && sudo sed -i 's/1024/65535/' /etc/security/limits.d/90-nproc.conf && sudo sed -i '$ a * soft nofile 65535\\n* hard nofile 65535' /etc/security/limits.conf    " -s --ask-sudo-passk

接下来须要准备ssh无秘钥命令（从操控机到hadoop集群各服务器）

参考本随笔

http://www.cnblogs.com/jackchen001/p/6381270.html

安装jdk，并在hadoop集群配置java环境变量

 前提是，ssh无秘钥通道已打通
 1 生成jdk环境变量文件
 echo '
 export JAVA_HOME=/usr/java/latest/
 export PATH=$JAVA_HOME/bin:$PATH ' >> java.sh
 2 安装jdk
 ansible -i ./hosts hadoop_host -m yum -a "name=jdk state=present" -s
 3 传送jdk环境变量文件
 ansible -i ./hosts hadoop_host -m copy -a "src=java.sh dest=/etc/profile.d/" -s
 4 更改java安装目录属组权限
  ansible -i ./hosts hadoop_host -m shell -a "chown -R hadoop.hadoop /usr/java/" -s

查阅好的ansible模块介绍文章

http://breezey.blog.51cto.com/2400275/1555530/

hadoop集群hosts映射文件
生产hosts
echo '0 master
1 slave
2 slave
3 slave' >> /tmp/hosts
发送至hadoop集群服务器上
ansible -i ./hosts hadoop_host -m copy -a "src=/tmp/hosts dest=/etc/hosts" -s
更改hostname
ansible -i ./hosts hadoop_host -m shell -a "sed -i 's/.localdomain//g' /etc/sysconfig/network && service network restart " -s

下载hadoop，并配置

下载hadoop安装包
 ansible -i ./hosts hadoop_host -m  ger_url -a "url=http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz dest=/opt/" -s
这个下载命令执行的话，hadoop集群都会去下载hadoop安装包，形成网络资源的浪费
最好是在操控机上下载，并配置好，而后发送到hadoop集群服务器上
发送命令以下
ansible -i ./hosts hadoop_host -m copy -a "src=/opt/hadoop dest=/opt/ owner=hadoop user=hadoop mode=0755" -s

hadoop环境变量配置

生成hadoop环境变量文件
echo '
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/opt/hadoop/lib/native/"
export HADOOP_COMMON_LIB_NATIVE_DIR="/opt/hadoop/lib/native/"
' >> hadoop.sh

将hadoop环境文件发送至集群
ansible -i ./hosts hadoop_host -m copy -a "src=hadoo.sh dest=/etc/profiled./" -s

最重要的一件事，hadoop用须要在集群服务器之间能够无秘钥ssh，并执行命令

操控机上
建立hadoop用户，并设置hadoo用户密码
http://www.cnblogs.com/jackchen001/p/6381270.html
操控机上
ansible -i ./hosts hadoop_host -m shell -a "ssh-keygen -q" -s
等hadoop用户建立完，并设置密码，ssh无秘钥操做都作完后
须要在每台集群服务器上，执行 rsync_key playbook
以确保集群服务器之间hadoop有权限能够自由执行命令并ssh
让hadoop用户能够sudo
ansible -i ./hosts hadoop_host -m  shell -a "sed -i '$ a %hadoop ALL=(ALL) NOPASSWD: ALL ' /etc/sudoers" -s

作好这件事，而后就上传hadoop配置文件

http://hadoop.apache.org/docs/current/ hadoop官网

配置好的hadoop配置文件以下：

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/opt/hadoop/tmp</value>
</property>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:9000</value>
</property>
<property> 
  <name>dfs.name.dir</name>           
  <value>/opt/hadoop/name</value> 
</property>

</configuration>

hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
    <name>dfs.replication</name>  
    <value>3</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>  
    <value>file:/opt/hadoop/name1,/opt/hadoop/name2</value>
</property>

<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/opt/hadoop/data1,/opt/hadoop/data2</value>
</property>

<property>
    <name>dfs.namnode.secondary.http-address</name>
    <value>slave1:9001</value>
</property>


</configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>

    <property>  
        <name>mapreduce.framework.name</name>  
        <value>yarn</value>  
    </property>

<property>
    <name>mapred.job.tracker</name>  
    <value>master:9001</value>
</property>
<property>
    <name>mapred.system.dir</name>  
    <value>/opt/hadoop/mapred_system</value>
</property>
<property>
    <name>mapred.local.dir</name>  
    <value>/opt/hadoop/mapred_local</value>
</property>

<property>
        <name>mapreduce.application.classpath</name>
        <value>
                /opt/hadoop/etc/hadoop,
                /opt/hadoop/lib/native/*,
                /opt/hadoop/share/hadoop/common/*,
                /opt/hadoop/share/hadoop/common/lib/*,
                /opt/hadoop/share/hadoop/hdfs/*,
                /opt/hadoop/share/hadoop/hdfs/lib/*,
                /opt/hadoop/share/hadoop/mapreduce/*,
                /opt/hadoop/share/hadoop/mapreduce/lib/*,
                /opt/hadoop/share/hadoop/yarn/*,
                /opt/hadoop/share/hadoop/yarn/lib/*
        </value>
</property>

</configuration>

yarn-site.xml
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

    <property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
    </property>  
    <property>  
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
        <value>org.apache.hadoop.mapred.ShuffleHandle</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.resource-tracker.address</name>  
        <value>master:8025</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.scheduler.address</name>  
        <value>master:8030</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.address</name>  
        <value>master:8040</value>  
    </property>
    <property>  
        <name>yarn.resourcemanager.admin.address</name>  
        <value>master:8033</value>  
    </property>
    <property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value>master:8034</value>  
    </property>

</configuration>

master文件内容
master
slaves文件内容
slave1
slave2
slave3
workers文件内容
slave1
slave2
slave3

hadoop-env.sh 文件中增长
export JAVA_HOME=/usr/java/latest

将以上配置文件在操控机上改好，而后传送至集群服务器上

总结了一半，本身懵了，我好尴尬啊！！！！！