In this article ,a basic illustration of a MapReduce programme that counts how many times a word appears in a text file. This famous Hadoop MapReduce example, sometimes known as the "Word Count" example, is meant to serve as an introduction to the concept. Assume you have the following text in a file called input.txt:
In Hadoop, Reducer takes the output of the Mapper (intermediate key-value pair) process each of them to generate the output to our map reduce program , we do aggregation or summation sort of computation.
Mapper is the first code which is responsible to migrate/ manipulate the HDFS block stored data into key and value pair.
package com.developer.code.examples.hadoop.mapred.wordcount;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reporter;public class Map extends MapReduceBase implements MapperDriver class that sets up and starts a word-counting Hadoop MapReduce process. The driver defines the mapper and reducer classes, sets up a number of parameters, and indicates the input and output pathways. Lastly, it sends the job to be completed.
package com.developer.code.examples.hadoop.mapred.wordcount;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.TextInputFormat;import org.apache.hadoop.mapred.TextOutputFormat;public class WordCount{public static void main(String[] args) throws Exception{JobConf conf = new JobConf(WordCount.class);conf.setJobName("wordcount");conf.setOutputKeyClass(Text.class);conf.setOutputValueClass(IntWritable.class);conf.setMapperClass(Map.class);conf.setCombinerClass(Reduce.class);conf.setReducerClass(Reduce.class);conf.setInputFormat(TextInputFormat.class);conf.setOutputFormat(TextOutputFormat.class);FileInputFormat.setInputPaths(conf, new Path(args[0]));FileOutputFormat.setOutputPath(conf, new Path(args[1]));JobClient.runJob(conf);}}Here we need to create a Hadoop JAR (Java Archive) in Eclipse(use other tool as well) involves developing a Hadoop MapReduce program, exporting it as a JAR file, and then executing it on the Hadoop cluster.
.Hadoop is an open source software developed by the Apache Software Foundation (ASF). You can download Hadoop directly from the project website at http://hadoop.apache.org. Cloudera is a company that provides support, consulting, and management tools for Hadoop. Cloudera also has a distribution of software called Cloudera’s Distribution Including Apache Hadoop (CDH). Here in this article contain a examples word count of a Hadoop MapReduce program, the term "driver" refers to the main class that orchestrates and configures the entire MapReduce job. The driver program is responsible for setting up the job parameters, input and output paths, defining the mapper and reducer classes, and managing other job-related configurations.