TIKA Tutorial
TIKA Examples
TIKA Useful Resources
Selected Reading
- TIKA - GUI
- TIKA - Language Detection
- TIKA - Metadata Extraction
- TIKA - Content Extraction
- TIKA - Document Type Detection
- TIKA - File Formats
- TIKA - Referenced API
- TIKA - Environment
- TIKA - Architecture
- TIKA - Overview
- TIKA - Home
TIKA Examples
- TIKA - Extracting mp3 Files
- TIKA - Extracting mp4 Files
- TIKA - Extracting Image File
- TIKA - Extracting JAR File
- TIKA - Extracting .class File
- TIKA - Extracting XML Document
- TIKA - Extracting HTML Document
- TIKA - Extracting Text Document
- TIKA - Extracting MS-Office Files
- TIKA - Extracting ODF
- TIKA - Extracting PDF
TIKA Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
TIKA - File Formats
TIKA - File Formats
File Formats Supported by Tika
The following table shows the file formats Tika supports.
File format | Package Library | Class in Tika |
---|---|---|
XML | org.apache.tika.parser.xml | XMLParser |
HTML | org.apache.tika.parser.html and it uses Tagsoup Library | HtmlParser |
MS-Office compound document Ole2 till 2007 ooxml 2007 onwards | org.apache.tika.parser.microsoft org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi pbrary |
OfficeParser(ole2) OOXMLParser (ooxml) |
OpenDocument Format openoffice | org.apache.tika.parser.odf | OpenOfficeParser |
portable Document Format(PDF) | org.apache.tika.parser.pdf and this package uses Apache PdfBox pbrary | PDFParser |
Electronic Pubpcation Format (digital books) | org.apache.tika.parser.epub | EpubParser |
Rich Text format | org.apache.tika.parser.rtf | RTFParser |
Compression and packaging formats | org.apache.tika.parser.pkg and this package uses Common compress pbrary | PackageParser and CompressorParser and its sub-classes |
Text format | org.apache.tika.parser.txt | TXTParser |
Feed and syndication formats | org.apache.tika.parser.feed | FeedParser |
Audio formats | org.apache.tika.parser.audio and org.apache.tika.parser.mp3 | AudioParser MidiParser Mp3- for mp3parser |
Imageparsers | org.apache.tika.parser.jpeg | JpegParser-for jpeg images |
Videoformats | org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats | Mp4parser FlvParser |
java class files and jar files | org.apache.tika.parser.asm | ClassParser CompressorParser |
Mobxformat (email messages) | org.apache.tika.parser.mbox | MobXParser |
Cad formats | org.apache.tika.parser.dwg | DWGParser |
FontFormats | org.apache.tika.parser.font | TrueTypeParser |
executable programs and pbraries | org.apache.tika.parser.executable | ExecutableParser |