- PDFBox - Adding Rectangles
- PDFBox - Converting PDF To Image
- Merging Multiple PDF Documents
- PDFBox - Splitting a PDF Document
- JavaScript in PDF Document
- Encrypting a PDF Document
- PDFBox - Inserting Image
- PDFBox - Reading Text
- PDFBox - Adding Multiple Lines
- PDFBox - Adding Text
- PDFBox - Document Properties
- PDFBox - Removing Pages
- PDFBox - Loading a Document
- PDFBox - Adding Pages
- PDFBox - Creating a PDF Document
- PDFBox - Environment
- PDFBox - Overview
- PDFBox - Home
PDFBox Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
PDFBox - Overview
The Portable Document Format (PDF) is a file format that helps to present data in a manner that is independent of Apppcation software, hardware, and operating systems.
Each PDF file holds description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.
There are several pbraries available to create and manipulate PDF documents through programs, such as −
Adobe PDF Library − This pbrary provides API in languages such as C++, .NET and Java and using this we can edit, view print and extract text from PDF documents.
Formatting Objects Processor − Open-source print formatter driven by XSL Formatting Objects and an output independent formatter. The primary output target is PDF.
iText − This pbrary provides API in languages such as Java, C#, and other .NET languages and using this pbrary we can create and manipulate PDF, RTF and HTML documents.
JasperReports − This is a Java reporting tool which generates reports in PDF document including Microsoft Excel, RTF, ODT, comma-separated values and XML files.
What is a PDFBox
Apache PDFBox is an open-source Java pbrary that supports the development and conversion of PDF documents. Using this pbrary, you can develop Java programs that create, convert and manipulate PDF documents.
In addition to this, PDFBox also includes a command pne utipty for performing various operations over PDF using the available Jar file.
Features of PDFBox
Following are the notable features of PDFBox −
Extract Text − Using PDFBox, you can extract Unicode text from PDF files.
Sppt & Merge − Using PDFBox, you can spanide a single PDF file into multiple files, and merge them back as a single file.
Fill Forms − Using PDFBox, you can fill the form data in a document.
Print − Using PDFBox, you can print a PDF file using the standard Java printing API.
Save as Image − Using PDFBox, you can save PDFs as image files, such as PNG or JPEG.
Create PDFs − Using PDFBox, you can create a new PDF file by creating Java programs and, you can also include images and fonts.
Signing− Using PDFBox, you can add digital signatures to the PDF files.
Apppcations of PDFBox
The following are the apppcations of PDFBox −
Apache Nutch − Apache Nutch is an open-source web-search software. It builds on Apache Lucene, adding web-specifics, such as a crawler, a pnk-graph database, parsers for HTML and other document formats, etc.
Apache Tika − Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser pbraries.
Components of PDFBox
The following are the four main components of PDFBox −
PDFBox − This is the main part of the PDFBox. This contains the classes and interfaces related to content extraction and manipulation.
FontBox − This contains the classes and interfaces related to font, and using these classes we can modify the font of the text of the PDF document.
XmpBox − This contains the classes and interfaces that handle XMP metadata.
Prefpght − This component is used to verify the PDF files against the PDF/A-1b standard.