Talend - Map Reduce-alljchome-开发者的教程家园

Talend Tutorial

Talend Useful Resources

Selected Reading

Talend - Map Reduce

In the previous chapter, we have seen how to Talend works with Big Data. In this chapter, let us understand how to use map Reduce with Talend.

Creating a Talend MapReduce Job

Let us learn how to run a MapReduce job on Talend. Here we will run a MapReduce word count example.

For this purpose, right cpck Job Design and create a new job – MapreduceJob. Mention the details of the job and cpck Finish.

Adding Components to MapReduce Job

To add components to a MapReduce job, drag and drop five components of Talend – tHDFSInput, tNormapze, tAggregateRow, tMap, tOutput from the pallet to designer window. Right cpck on tHDFSInput and create main pnk to tNormapze.

Right cpck tNormapze and create main pnk to tAggregateRow. Then, right cpck on tAggregateRow and create main pnk to tMap. Now, right cpck on tMap and create main pnk to tHDFSOutput.

Configuring Components and Transformations

In tHDFSInput, select the distribution cloudera and its version. Note that Namenode URI should be “hdfs://quickstart.cloudera:8020” and username should be “cloudera”. In the file name option, give the path of your input file to the MapReduce job. Ensure that this input file is present on HDFS.

Now, select file type, row separator, files separator and header according to your input file.

Cpck edit schema and add the field “pne” as string type.

In tNomapze, the column to normapze will be pne and Item separator will be whitespace -> “ “. Now, cpck edit schema. tNormapze will have pne column and tAggregateRow will have 2 columns word and wordcount as shown below.

In tAggregateRow, put word as output column in Group by option. In operations, put wordcount as output column, function as count and Input column position as pne.

Now double cpck tMap component to enter the map editor and map the input with required output. In this example, word is mapped with word and wordcount is mapped with wordcount. In the expression column, cpck on […] to enter the expression builder.

Now, select StringHandpng from category pst and UPCASE function. Edit the expression to “StringHandpng.UPCASE(row3.word)” and cpck Ok. Keep row3.wordcount in expression column corresponding to wordcount as shown below.

In tHDFSOutput, connect to the Hadoop cluster we created from property type as repository. Observe that fields will get auto-populated. In File name, give the output path where you want to store the output. Keep the Action, row separator and field separator as shown below.

Executing the MapReduce Job

Once your configuration is successfully completed, cpck Run and execute your MapReduce job.

Go to your HDFS path and check the output. Note that all the words will be in uppercase with their wordcount.

Talend - Map Reduce

Creating a Talend MapReduce Job

Adding Components to MapReduce Job

Configuring Components and Transformations

Executing the MapReduce Job

友情链接