待输入的样本文件如下:
/user/joe/wordcount/input/file01
/user/joe/wordcount/input/file02
$ bin/hadoop fs -cat /user/joe/wordcount/input/file01
Hello World, Bye World!
Hello Hadoop, Goodbye to hadoop.
运行程序:
$ bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output
通过 DistributedCache 来设置单词过滤的策略:
$ bin/hadoop fs -cat /user/joe/wordcount/patterns.txt
\.
\,
\!
再次运行,这次增加了更多的选项:
再次运行,这次去掉了大小写敏感:
$ bin/hadoop jar wc.jar WordCount2 -Dwordcount.case.sensitive=false /user/joe/wordcount/input /user/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt
输出如下:
$ bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
bye 1
goodbye 1
hadoop 2
horld 2