English 中文(简体)
TIKA - File Formats
  • 时间:2024-11-03

TIKA - File Formats


Previous Page Next Page  

File Formats Supported by Tika

The following table shows the file formats Tika supports.

File format Package Library Class in Tika
XML org.apache.tika.parser.xml XMLParser
HTML org.apache.tika.parser.html and it uses Tagsoup Library HtmlParser
MS-Office compound document Ole2 till 2007 ooxml 2007 onwards

org.apache.tika.parser.microsoft

org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi pbrary

OfficeParser(ole2)

OOXMLParser (ooxml)

OpenDocument Format openoffice org.apache.tika.parser.odf OpenOfficeParser
portable Document Format(PDF) org.apache.tika.parser.pdf and this package uses Apache PdfBox pbrary PDFParser
Electronic Pubpcation Format (digital books) org.apache.tika.parser.epub EpubParser
Rich Text format org.apache.tika.parser.rtf RTFParser
Compression and packaging formats org.apache.tika.parser.pkg and this package uses Common compress pbrary PackageParser and CompressorParser and its sub-classes
Text format org.apache.tika.parser.txt TXTParser
Feed and syndication formats org.apache.tika.parser.feed FeedParser
Audio formats org.apache.tika.parser.audio and org.apache.tika.parser.mp3 AudioParser MidiParser Mp3- for mp3parser
Imageparsers org.apache.tika.parser.jpeg JpegParser-for jpeg images
Videoformats org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats Mp4parser FlvParser
java class files and jar files org.apache.tika.parser.asm ClassParser CompressorParser
Mobxformat (email messages) org.apache.tika.parser.mbox MobXParser
Cad formats org.apache.tika.parser.dwg DWGParser
FontFormats org.apache.tika.parser.font TrueTypeParser
executable programs and pbraries org.apache.tika.parser.executable ExecutableParser
Advertisements