原创文档，转载请将原文url地址标明

hadoop的伪分布式模式是hadoop的一个重要的模式，在这个模式下， hadoop在本地服务器，启动多个jvm进程来模拟分布式情况，同时可以方便程序员调试相关程序。本文我们一起来学习相关的配置

一． hadoop的Pseudo-Distributed伪分布模式

Pseudo-Distributed Operation 伪分布操作

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

先进入您的相关hadoop的目录，然后依次配置下面的文件

如下图是笔者的目录及文件形式

Configuration

Use the following:
conf/core-site.xml: 配置下面文件

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

conf/hdfs-site.xml:

<name>dfs.replication</name>

</property>

</configuration>

conf/mapred-site.xml:

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

二．配置ssh免密码登陆

下面是 hadoop的原始配置方法

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

按照上面配置，在root下工作正常实现了免密码登陆，但是在普通账号下，则不好用，如下图：

解决办法，删除相关文件重新配置

rm -rf .ssh/

删除后结果如下：

确保没有 .ssh目录

然后输入

ssh sch@db

输入下面命令，

ssh-keygen -t rsa

当询问Enter file in which to save the key (/home/sch/.ssh/id_rsa): 时直接回车，选择默认，

询问输入密码时也选择回车，使用默认

执行下面的命令，特别是修改文件权限的两句好像，是关键！！

[sch@db ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[sch@db ~]$ chmod 700 ~/.ssh

[sch@db ~]$ chmod 600 ~/.ssh/authorized_keys

[sch@db ~]$ ll -a .ssh/

total 28

drwx------ 2 sch sch 4096 Oct 1 06:09 .

drwx------ 20 sch sch 4096 Oct 1 06:00 ..

-rw------- 1 sch sch 776 Oct 1 06:16 authorized_keys

-rw------- 1 sch sch 1675 Oct 1 06:08 id_rsa

-rw-r--r-- 1 sch sch 388 Oct 1 06:08 id_rsa.pub

-rw-r--r-- 1 sch sch 384 Oct 1 06:00 known_hosts

如下图

输入ssh sch@db

看到已经免密码登陆了。

拷贝当前目录下配置到，其他服务器，保证连接其他服务器也是免密码登陆的。

scp -r .ssh/ sch@red:/home/sch

scp -r .ssh/ sch@mongdb:/home/sch

scp -r .ssh/ sch@nginx:/home/sch

并测试相关登陆情况，经过测试，能正常免密码登陆。

三．配置hadoop伪分布模式程序

首先您需要进入您的hadoop的安装目录，如下

cd /work/apps/hadoop-1.2.1

然后依次执行下面的命令

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format // 格式化 hadoop本地系统

Start the hadoop daemons:
$ bin/start-all.sh // 启动hadoop本地程序

结果如下：

发现如下信息：localhost: Error: JAVA_HOME is not set.

修改

Hadoop的 conf下的hadoop-env.sh文件，修改如下设置，参考下图：

保存退出

重新启动hadoop程序，如下

bin/start-all.sh

（由于前面我们启动过一次了，但是失败了，不过可能个别组件还是运行了，因此有可能您需要运行一下 bin/stop-all.sh命令，关闭全部hadoop程序）

四．测试hadoop伪分布模式

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs). 日志名称，及目录

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

· NameNode- http://localhost:50070/

我们输入：http://192.168.186.10:50070/dfshealth.jsp 我们用服务器ip地址替代了localhost，原因我们安装的hadoop在远程的虚拟机上，我们才windows8下运行的，因此采用了ip地址替代。

· JobTracker- http://localhost:50030/

类似我们输入：http://192.168.186.10:50030/jobtracker.jsp

输入下面命令，将一个目录下文件复制到， input目录中

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

注意，要确保您的目录中不纯在input目录，否则会有一些错误信息提示。

我查看一下，我们都拷贝了那些程序

如上图上，蓝色区域显示。

默认输入的目录 input是相对目录，相对于当前用户，若是输入 “/”则放到hadoop文件系统的默认目录了。

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

运行结果如下

上面显示了，运行过程

最终允许结果

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output // 拷贝文件从hadoop到本地文件
$ cat output/* // 显示本地文件内容

运行结果如下：