Hadoop伪分布式安装

Hadoop内容
官网:http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F

Hadoop内容较复杂,安装根据不同的需求有三种模式
独立模式(standalong/local model ):无需任何守护进程,所有程序都单个的JVM上执行。本模式适用于开发阶段下的测试和调式MapReduce。
伪分布(pesudo-distributed model):Hadoop守护进程都运行在本地服务器上,模拟一个小规模的集群
全分布(full distributed model):Hadoop 守护进程运行在一个集群上。

先安装伪分布式进行结合测试

1. 安装前准备:

  1. JDK1.6.X
  2. SSH
  3. RSYNC

2. 下载hadoop文件之后解压到目录,直接解压后设定为Hadoop目录即可

在环境变量中添加JAVA目录与Hadoop目录

1
2
3
4
5
6
7
8
PATH=$PATH:$HOME/bin
#export PATH
export JAVA_HOME=/usr/local/java/jdk1.6.0_30
export PATH=$JAVA_HOME/bin:/usr/local/mysql/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
unset USERNAME
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin

执行hadoop version 查看hadoop版本,且查看是否运行

1
2
3
4
5
6
[root@test /] hadoop version
Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /usr/local/hadoop/hadoop-core-1.2.1.jar

3. 配置SSH

首先要能ssh登录到本机

1
ssh localhost

如果无法登录,执行

1
2
[root@zabbix hadoop] ssh-keygen  -t dsa -P '' -f ~/.ssh/id_sea
[root@zabbix hadoop] cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys

4. 格式化HDFS文件系统

使用Hadoop前,必须格式化一个全新的HDFS系统 ,创建一个空的文件系统。

1
hadoop namenode -frmat

5. Hadoop配置文件

conf目录中
core-site.xml 配置Common组件
hdfs-site.xml 配置HDFS属性
mapred-site.xml 配置MapReduce属性
默认情况下,三种模式的默认配置如图
image

修改配置文件

1
2
3
4
5
6
/core-site.xml
fs.default.namehdfs://localhost/
//hdfs-site.xml
dfs.replication1
//mapred-site.xml
mapred.job.trackerlocalhost:8021

6. 启动HDFS和MapReduce守护进程

1
2
3
4
5
6
7
[root@test conf] start-dfs.sh
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-zabbix.out
localhost: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-zabbix.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-zabbix.out
[root@test conf] start-mapred.sh
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-zabbix.out
localhost: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-zabbix.out

7. 查看进程是否启动

1
2
3
4
5
6
http://namenode:50030/jobtracker.jsp
http://namenode:50070/dfshealth.jsp
[root@test conf]# netstat -lnp |grep 50030
tcp 0 0 :::50030 :::* LISTEN 3071/java
[root@test conf]# netstat -lnp |grep 50070
tcp 0 0 :::50070 :::* LISTEN 2676/java

JAVA的jps查看

1
2
3
4
5
6
7
jps
3220 TaskTracker
2676 NameNode
3071 JobTracker
2803 DataNode
2962 SecondaryNameNode
3503 Jps

Hadoop常用命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
//查看HDFS文件列表
hadoop fs -ls /usr/local/log/
//创建文件目录
hadoop fs -mkdir /usr/local/log/test
//删除文件
/hadoop fs -rm /usr/local/log/11
//上传一个本机文件到HDFS中/usr/local/log/目录下
adoop fs -put /usr/local/src/infobright-4.0.6-0-x86_64-ice.rpm /usr/local/log/
//下载
hadoop fs –get /usr/local/log/infobright-4.0.6-0-x86_64-ice.rpm /usr/local/src/
//查看文件
hadoop fs -cat /usr/local/log/20131008_10/access.log.zabbix
//查看HDFS基本使用情况
[root@hh ~] hadoop dfsadmin -report
Configured Capacity: 10397683712 (9.68 GB)
Present Capacity: 9388027904 (8.74 GB)
DFS Remaining: 9324613632 (8.68 GB)
DFS Used: 63414272 (60.48 MB)
DFS Used%: 0.68%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 1
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 10397683712 (9.68 GB)
DFS Used: 63414272 (60.48 MB)
Non DFS Used: 1009655808 (962.88 MB)
DFS Remaining: 9324613632(8.68 GB)
DFS Used%: 0.61%
DFS Remaining%: 89.68%
Last contact: Tue Oct 08 13:41:05 CST 2013