Hadoop内容
官网:http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F
Hadoop内容较复杂,安装根据不同的需求有三种模式
独立模式(standalong/local model ):无需任何守护进程,所有程序都单个的JVM上执行。本模式适用于开发阶段下的测试和调式MapReduce。
伪分布(pesudo-distributed model):Hadoop守护进程都运行在本地服务器上,模拟一个小规模的集群
全分布(full distributed model):Hadoop 守护进程运行在一个集群上。
先安装伪分布式进行结合测试
1. 安装前准备:
- JDK1.6.X
- SSH
- RSYNC
2. 下载hadoop文件之后解压到目录,直接解压后设定为Hadoop目录即可
在环境变量中添加JAVA目录与Hadoop目录1
2
3
4
5
6
7
8PATH=$PATH:$HOME/bin
#export PATH
export JAVA_HOME=/usr/local/java/jdk1.6.0_30
export PATH=$JAVA_HOME/bin:/usr/local/mysql/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
unset USERNAME
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
执行hadoop version 查看hadoop版本,且查看是否运行1
2
3
4
5
6[root@test /] hadoop version
Hadoop 1.2.1
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152
Compiled by mattf on Mon Jul 22 15:23:09 PDT 2013
From source with checksum 6923c86528809c4e7e6f493b6b413a9a
This command was run using /usr/local/hadoop/hadoop-core-1.2.1.jar
3. 配置SSH
首先要能ssh登录到本机
1 | ssh localhost |
如果无法登录,执行1
2[root@zabbix hadoop] ssh-keygen -t dsa -P '' -f ~/.ssh/id_sea
[root@zabbix hadoop] cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
4. 格式化HDFS文件系统
使用Hadoop前,必须格式化一个全新的HDFS系统 ,创建一个空的文件系统。1
hadoop namenode -frmat
5. Hadoop配置文件
conf目录中
core-site.xml 配置Common组件
hdfs-site.xml 配置HDFS属性
mapred-site.xml 配置MapReduce属性
默认情况下,三种模式的默认配置如图
修改配置文件1
2
3
4
5
6/core-site.xml
fs.default.namehdfs://localhost/
//hdfs-site.xml
dfs.replication1
//mapred-site.xml
mapred.job.trackerlocalhost:8021
6. 启动HDFS和MapReduce守护进程
1 | [root@test conf] start-dfs.sh |
7. 查看进程是否启动
1 | http://namenode:50030/jobtracker.jsp |
JAVA的jps查看1
2
3
4
5
6
7jps
3220 TaskTracker
2676 NameNode
3071 JobTracker
2803 DataNode
2962 SecondaryNameNode
3503 Jps
Hadoop常用命令1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33//查看HDFS文件列表
hadoop fs -ls /usr/local/log/
//创建文件目录
hadoop fs -mkdir /usr/local/log/test
//删除文件
/hadoop fs -rm /usr/local/log/11
//上传一个本机文件到HDFS中/usr/local/log/目录下
adoop fs -put /usr/local/src/infobright-4.0.6-0-x86_64-ice.rpm /usr/local/log/
//下载
hadoop fs –get /usr/local/log/infobright-4.0.6-0-x86_64-ice.rpm /usr/local/src/
//查看文件
hadoop fs -cat /usr/local/log/20131008_10/access.log.zabbix
//查看HDFS基本使用情况
[root@hh ~] hadoop dfsadmin -report
Configured Capacity: 10397683712 (9.68 GB)
Present Capacity: 9388027904 (8.74 GB)
DFS Remaining: 9324613632 (8.68 GB)
DFS Used: 63414272 (60.48 MB)
DFS Used%: 0.68%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 1
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 10397683712 (9.68 GB)
DFS Used: 63414272 (60.48 MB)
Non DFS Used: 1009655808 (962.88 MB)
DFS Remaining: 9324613632(8.68 GB)
DFS Used%: 0.61%
DFS Remaining%: 89.68%
Last contact: Tue Oct 08 13:41:05 CST 2013