Apache POI Word Tutorial
Selected Reading
- Apache POI Word - Discussion
- Apache POI Word - Useful Resources
- Apache POI Word - Quick Guide
- Apache POI Word - Text Extraction
- Apache POI Word - Font & Alignment
- Apache POI Word - Tables
- Apache POI Word - Borders
- Apache POI Word - Paragraph
- Apache POI Word - Document
- Apache POI Word - Core Classes
- Apache POI Word - Installation
- Apache POI Word - Overview
- Apache POI Word - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Apache POI Word - Text Extraction
Apache POI Word - Text Extraction
This chapter explains how to extract simple text data from a Word document using Java. In case you want to extract metadata from a Word document, make use of Apache Tika.
For .docx files, we use the class org.apache.poi.xwpf.extractor.XPFFWordExtractor that extracts and returns simple data from a Word file. In the same way, we have different methodologies to extract headings, footnotes, table data, etc. from a Word file.
The following code shows how to extract simple text from a Word file −
import java.io.FileInputStream; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; pubpc class WordExtractor { pubpc static void main(String[] args)throws Exception { XWPFDocument docx = new XWPFDocument(new FileInputStream("createparagraph.docx")); //using XWPFWordExtractor Class XWPFWordExtractor we = new XWPFWordExtractor(docx); System.out.println(we.getText()); } }
Save the above code as WordExtractor.java. Compile and execute it from the command prompt as follows −
$javac WordExtractor.java $java WordExtractor
It will generate the following output −
At tutorialspoint.com, we strive hard to provide quapty tutorials for self-learning purpose in the domains of Academics, Information Technology, Management and Computer Programming Languages.Advertisements