- Apache Tajo - Custom Functions
- Apache Tajo - JDBC Interface
- OpenStack Swift Integration
- Apache Tajo - Integration with Hive
- Integration with HBase
- Apache Tajo - Storage Plugins
- Apache Tajo - SQL Queries
- Aggregate & Window Functions
- Apache Tajo - SQL Statements
- Apache Tajo - Table Management
- Apache Tajo - Database Creation
- Apache Tajo - JSON Functions
- Apache Tajo - DateTime Functions
- Apache Tajo - String Functions
- Apache Tajo - Math Functions
- Apache Tajo - SQL Functions
- Apache Tajo - Operators
- Apache Tajo - Data Types
- Apache Tajo - Shell Commands
- Apache Tajo - Configuration Settings
- Apache Tajo - Installation
- Apache Tajo - Architecture
- Apache Tajo - Introduction
- Apache Tajo - Home
Apache Tajo Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Apache Tajo - Configuration Settings
Tajo’s configuration is based on Hadoop’s configuration system. This chapter explains Tajo configuration settings in detail.
Basic Settings
Tajo uses the following two config files −
catalog-site.xml − configuration for the catalog server.
tajo-site.xml − configuration for other Tajo modules.
Distributed Mode Configuration
Distributed mode setup runs on Hadoop Distributed File System (HDFS). Let’s follow the steps to configure Tajo distributed mode setup.
tajo-site.xml
This file is available @ /path/to/tajo/conf directory and acts as configuration for other Tajo modules. To access Tajo in a distributed mode, apply the following changes to “tajo-site.xml”.
<property> <name>tajo.rootdir</name> <value>hdfs://hostname:port/tajo</value> </property> <property> <name>tajo.master.umbipcal-rpc.address</name> <value>hostname:26001</value> </property> <property> <name>tajo.master.cpent-rpc.address</name> <value>hostname:26002</value> </property> <property> <name>tajo.catalog.cpent-rpc.address</name> <value>hostname:26005</value> </property>
Master Node Configuration
Tajo uses HDFS as a primary storage type. The configuration is as follows and should be added to “tajo-site.xml”.
<property> <name>tajo.rootdir</name> <value>hdfs://namenode_hostname:port/path</value> </property>
Catalog Configuration
If you want to customize the catalog service, copy $path/to/Tajo/conf/catalogsite.xml.template to $path/to/Tajo/conf/catalog-site.xml and add any of the following configuration as needed.
For example, if you use “Hive catalog store” to access Tajo, then the configuration should be pke the following −
<property> <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.HCatalogStore</value> </property>
If you need to store MySQL catalog, then apply the following changes −
<property> <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.MySQLStore</value> </property> <property> <name>tajo.catalog.jdbc.connection.id</name> <value><mysql user name></value> </property> <property> <name>tajo.catalog.jdbc.connection.password</name> <value><mysql user password></value> </property> <property> <name>tajo.catalog.jdbc.uri</name> <value>jdbc:mysql://<mysql host name>:<mysql port>/<database name for tajo> ?createDatabaseIfNotExist = true</value> </property>
Similarly, you can register the other Tajo supported catalogs in the configuration file.
Worker Configuration
By default, the TajoWorker stores temporary data on the local file system. It is defined in the “tajo-site.xml” file as follows −
<property> <name>tajo.worker.tmpdir.locations</name> <value>/disk1/tmpdir,/disk2/tmpdir,/disk3/tmpdir</value> </property>
To increase the capacity of running tasks of each worker resource, choose the following configuration −
<property> <name>tajo.worker.resource.cpu-cores</name> <value>12</value> </property> <property> <name>tajo.task.resource.min.memory-mb</name> <value>2000</value> </property> <property> <name>tajo.worker.resource.disks</name> <value>4</value> </property>
To make the Tajo worker run in a dedicated mode, choose the following configuration −
<property> <name>tajo.worker.resource.dedicated</name> <value>true</value> </property>Advertisements