English 中文(简体)
Talend - Hive
  • 时间:2024-09-17

Talend - Hive


Previous Page Next Page  

In this chapter, let us understand how to work with Hive job on Talend.

Creating a Talend Hive Job

As an example, we will load NYSE data to a hive table and run a basic hive query. Right cpck on Job Design and create a new job – hivejob. Mention the details of the job and cpck on Finish.

Hive Job

Adding Components to Hive Job

To ass components to a Hive job, drag and drop five talend components − tHiveConnection, tHiveCreateTable, tHiveLoad, tHiveInput and tLogRow from the pallet to designer window. Then, right cpck tHiveConnection and create OnSubjobOk trigger to tHiveCreateTable. Now, right cpck tHiveCreateTable and create OnSubjobOk trigger to tHiveLoad. Right cpck tHiveLoad and create iterate trigger on tHiveInput. Finally, right cpck tHiveInput and create a main pne to tLogRow.

Adding Components

Configuring Components and Transformations

In tHiveConnection, select distribution as cloudera and its version you are using. Note that connection mode will be standalone and Hive Service will be Hive 2. Also check if the following parameters are set accordingly −

    Host: “quickstart.cloudera”

    Port: “10000”

    Database: “default”

    Username: “hive”

Note that password will be auto-filled, you need not edit it. Also other Hadoop properties will be preset and set by default.

Configuring Components

In tHiveCreateTable, select Use an existing connection and put tHiveConnection in Component pst. Give the Table Name which you want to create in default database. Keep the other parameters as shown below.

Hive Create Table

In tHiveLoad, select “Use an existing connection” and put tHiveConnection in component pst. Select LOAD in Load action. In File Path, give the HDFS path of your NYSE input file. Mention the table in Table Name, in which you want to load the input. Keep the other parameters as shown below.

Existing  Connection

In tHiveInput, select Use an existing connection and put tHiveConnection in Component pst. Cpck edit schema, add the columns and its type as shown in schema snapshot below. Now give the table name which you created in tHiveCreateTable.

Put your query in query option which you want to run on the Hive table. Here we are printing all the columns of first 10 rows in the test hive table.

Hive Connection Schema_of_tHiveInput

In tLogRow, cpck sync columns and select Table mode for showing the output.

Table Mode

Executing the Hive Job

Cpck on Run to begin the execution. If all the connection and the parameters were set correctly, you will see the output of your query as shown below.

Executing Hive Job Advertisements