- Lucene - Sorting
- Lucene - Analysis
- Lucene - Query Programming
- Lucene - Search Operation
- Lucene - Indexing Operations
- Lucene - Indexing Process
- Lucene - Searching Classes
- Lucene - Indexing Classes
- Lucene - First Application
- Lucene - Environment Setup
- Lucene - Overview
- Lucene - Home
Lucene Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Lucene - Analysis
In one of our previous chapters, we have seen that Lucene uses IndexWriter to analyze the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss the various types of Analyzer objects and other relevant objects which are used during the analysis process. Understanding the Analysis process and how analyzers work will give you great insight over how Lucene indexes the documents.
Following is the pst of objects that we ll discuss in due course.
S.No. | Class & Description |
---|---|
1 | Token represents text or word in a document with relevant details pke its metadata (position, start offset, end offset, token type and its position increment). |
2 | TokenStream is an output of the analysis process and it comprises of a series of tokens. It is an abstract class. |
3 | This is an abstract base class for each and every type of Analyzer. |
4 | This analyzer sppts the text in a document based on whitespace. |
5 | This analyzer sppts the text in a document based on non-letter characters and puts the text in lowercase. |
6 | This analyzer works just as the SimpleAnalyzer and removes the common words pke a , an , the , etc. |
7 | This is the most sophisticated analyzer and is capable of handpng names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any. |