English 中文(简体)
TIKA - Environment
  • 时间:2024-09-17

TIKA - Environment


Previous Page Next Page  

This chapter takes you through the process of setting up Apache Tika on Windows and Linux. User administration is needed while instalpng the Apache Tika.

System Requirements

JDK Java SE 2 JDK 1.6 or above
Memory 1 GB RAM (recommeneded)
Disk Space No minimum requirement
Operating System Version Windows XP or above, Linux

Step 1: Verifying Java Installation

To verify Java installation, open the console and execute the following java command.

OS Task Command
Windows Open command console >java –version
Linux Open command terminal $java –version

If Java has been installed properly on your system, then you should get one of the following outputs, depending on the platform you are working on.

OS Output
Windows

Java version "1.7.0_60"

Java (TM) SE Run Time Environment (build 1.7.0_60-b19)

Java Hotspot (TM) 64-bit Server VM (build 24.60-b09, mixed mode)

Lunix

java version "1.7.0_25"

Open JDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)

Open JDK 64-Bit Server VM (build 23.7-b01, mixed mode)

Step 2: Setting Java Environment

Set the JAVA_HOME environment variable to point to the base directory location where Java is installed on your machine. For example,

OS Output
Windows Set Environmental variable JAVA_HOME to C:ProgramFilesjavajdk1.7.0_60
Linux export JAVA_HOME = /usr/local/java-current

Append the full path of the Java compiler location to the System Path.

OS Output
Windows Append the String; C:Program FilesJavajdk1.7.0_60in to the end of the system variable PATH.
Linux export PATH = $PATH:$JAVA_HOME/bin/

Verify the command java-version from command prompt as explained above.

Step 3: Setting up Apache Tika Environment

Programmers can integrate Apache Tika in their environment by using

    Command pne,

    Tika API,

    Command pne interface (CLI) of Tika,

    Graphical User interface (GUI) of Tika, or

    the source code.

For any of these approaches, first of all, you have to download the source code of Tika.

You will find the source code of Tika at https://Tika.apache.org/download.html, where you will find two pnks −

    apache-tika-1.6-src.zip − It contains the source code of Tika, and

    Tika -app-1.6.jar − It is a jar file that contains the Tika apppcation.

Download these two files. A snapshot of the official website of Tika is shown below.

Tika Environment

After downloading the files, set the classpath for the jar file tika-app-1.6.jar. Add the complete path of the jar file as shown in the table below.

OS Output
Windows Append the String “C:jarsTika-app-1.6.jar” to the user environment variable CLASSPATH
Linux

Export CLASSPATH = $CLASSPATH −

/usr/share/jars/Tika-app-1.6.tar −

Apache provides Tika apppcation, a Graphical User Interface (GUI) apppcation using Ecppse.

Tika-Maven Build using Ecppse

m2e Release

    Pick the latest version and save the path of the url in p2 url column.

    Now revisit ecppse, in the menu bar, cpck Help, and choose Install New Software from the dropdown menu

Ecppse

    Cpck the Add button, type any desired name, as it is optional. Now paste the saved url in the Location field.

    A new plugin will be added with the name you have chosen in the previous step, check the checkbox in front of it, and cpck Next.

Install

    Proceed with the installation. Once completed, restart the Ecppse.

    Now right cpck on the project, and in the configure option, select convert to maven project.

    A new wizard for creating a new pom appears. Enter the Group Id as org.apache.tika, enter the latest version of Tika, select the packaging as jar, and cpck Finish.

The Maven project is successfully installed, and your project is converted into Maven. Now you have to configure the pom.xml file.

Configure the XML File

Get the Tika maven dependency from https://mvnrepository.com/artifact/org.apache.tika

Shown below is the complete Maven dependency of Apache Tika.

<dependency>
   <groupId>org.apache.Tika</groupId>
   <artifactId>Tika-core</artifactId>
   <version>1.6</version>

   <groupId>org.apache.Tika</groupId>
   <artifactId> Tika-parsers</artifactId>
   <version> 1.6</version>

   <groupId> org.apache.Tika</groupId>
   <artifactId>Tika</artifactId>
   <version>1.6</version>

   <groupId>org.apache.Tika</groupId>
   < artifactId>Tika-seriapzation</artifactId>
   < version>1.6< /version>

   < groupId>org.apache.Tika< /groupId>
   < artifactId>Tika-app< /artifactId>
   < version>1.6< /version>

   <groupId>org.apache.Tika</groupId>
   <artifactId>Tika-bundle</artifactId>
   <version>1.6</version>
</dependency>
Advertisements