待输入的样本文件如下:

    1. /user/joe/wordcount/input/file01
    2. /user/joe/wordcount/input/file02
    3. $ bin/hadoop fs -cat /user/joe/wordcount/input/file01
    4. Hello World, Bye World!
    5. Hello Hadoop, Goodbye to hadoop.

    运行程序:

    1. $ bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output

    通过 DistributedCache 来设置单词过滤的策略:

    1. $ bin/hadoop fs -cat /user/joe/wordcount/patterns.txt
    2. \.
    3. \,
    4. \!

    再次运行,这次增加了更多的选项:

      再次运行,这次去掉了大小写敏感:

      1. $ bin/hadoop jar wc.jar WordCount2 -Dwordcount.case.sensitive=false /user/joe/wordcount/input /user/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt

      输出如下:

      1. $ bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
      2. bye 1
      3. goodbye 1
      4. hadoop 2
      5. horld 2