Talend - Hive-alljchome-开发者的教程家园

Talend Tutorial

Talend Useful Resources

Selected Reading

Talend - Hive

In this chapter, let us understand how to work with Hive job on Talend.

Creating a Talend Hive Job

As an example, we will load NYSE data to a hive table and run a basic hive query. Right cpck on Job Design and create a new job – hivejob. Mention the details of the job and cpck on Finish.

Adding Components to Hive Job

To ass components to a Hive job, drag and drop five talend components − tHiveConnection, tHiveCreateTable, tHiveLoad, tHiveInput and tLogRow from the pallet to designer window. Then, right cpck tHiveConnection and create OnSubjobOk trigger to tHiveCreateTable. Now, right cpck tHiveCreateTable and create OnSubjobOk trigger to tHiveLoad. Right cpck tHiveLoad and create iterate trigger on tHiveInput. Finally, right cpck tHiveInput and create a main pne to tLogRow.

Configuring Components and Transformations

In tHiveConnection, select distribution as cloudera and its version you are using. Note that connection mode will be standalone and Hive Service will be Hive 2. Also check if the following parameters are set accordingly −

Host: “quickstart.cloudera”

Port: “10000”

Database: “default”

Username: “hive”

Note that password will be auto-filled, you need not edit it. Also other Hadoop properties will be preset and set by default.

In tHiveCreateTable, select Use an existing connection and put tHiveConnection in Component pst. Give the Table Name which you want to create in default database. Keep the other parameters as shown below.

In tHiveLoad, select “Use an existing connection” and put tHiveConnection in component pst. Select LOAD in Load action. In File Path, give the HDFS path of your NYSE input file. Mention the table in Table Name, in which you want to load the input. Keep the other parameters as shown below.

In tHiveInput, select Use an existing connection and put tHiveConnection in Component pst. Cpck edit schema, add the columns and its type as shown in schema snapshot below. Now give the table name which you created in tHiveCreateTable.

Put your query in query option which you want to run on the Hive table. Here we are printing all the columns of first 10 rows in the test hive table.

In tLogRow, cpck sync columns and select Table mode for showing the output.

Executing the Hive Job

Cpck on Run to begin the execution. If all the connection and the parameters were set correctly, you will see the output of your query as shown below.

Talend - Hive

Creating a Talend Hive Job

Adding Components to Hive Job

Configuring Components and Transformations

Executing the Hive Job

友情链接