例子：词频统计 WordCount 程序 - 《Apache Hadoop 入门教程》

待输入的样本文件如下：


/user/joe/wordcount/input/file01
/user/joe/wordcount/input/file02
$ bin/hadoop fs -cat /user/joe/wordcount/input/file01
Hello World, Bye World!
Hello Hadoop, Goodbye to hadoop.

运行程序：

$ bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output

通过 DistributedCache 来设置单词过滤的策略：

$ bin/hadoop fs -cat /user/joe/wordcount/patterns.txt
\.
\,
\!

再次运行，这次增加了更多的选项：

再次运行，这次去掉了大小写敏感：

$ bin/hadoop jar wc.jar WordCount2 -Dwordcount.case.sensitive=false /user/joe/wordcount/input /user/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt

输出如下：

$ bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
bye 1
goodbye 1
hadoop 2
horld 2