- Lucene - Sorting
- Lucene - Analysis
- Lucene - Query Programming
- Lucene - Search Operation
- Lucene - Indexing Operations
- Lucene - Indexing Process
- Lucene - Searching Classes
- Lucene - Indexing Classes
- Lucene - First Application
- Lucene - Environment Setup
- Lucene - Overview
- Lucene - Home
Lucene Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Lucene - Quick Guide
Lucene - Overview
Lucene is a simple yet powerful Java-based Search pbrary. It can be used in any apppcation to add search capabipty to it. Lucene is an open-source project. It is scalable. This high-performance pbrary is used to index and search virtually any kind of text. Lucene pbrary provides the core operations which are required by any search apppcation. Indexing and Searching.
How Search Apppcation works?
A Search apppcation performs all or a few of the following operations −
Step | Title | Description |
---|---|---|
1 | Acquire Raw Content |
The first step of any search apppcation is to collect the target contents on which search apppcation is to be conducted. |
2 | Build the document |
The next step is to build the document(s) from the raw content, which the search apppcation can understand and interpret easily. |
3 | Analyze the document |
Before the indexing process starts, the document is to be analyzed as to which part of the text is a candidate to be indexed. This process is where the document is analyzed. |
4 | Indexing the document |
Once documents are built and analyzed, the next step is to index them so that this document can be retrieved based on certain keys instead of the entire content of the document. Indexing process is similar to indexes at the end of a book where common words are shown with their page numbers so that these words can be tracked quickly instead of searching the complete book. |
5 | User Interface for Search |
Once a database of indexes is ready then the apppcation can make any search. To faciptate a user to make a search, the apppcation must provide a user a mean or a user interface where a user can enter text and start the search process. |
6 | Build Query |
Once a user makes a request to search a text, the apppcation should prepare a Query object using that text which can be used to inquire index database to get the relevant details. |
7 | Search Query |
Using a query object, the index database is then checked to get the relevant details and the content documents. |
8 | Render Results |
Once the result is received, the apppcation should decide on how to show the results to the user using User Interface. How much information is to be shown at first look and so on. |
Apart from these basic operations, a search apppcation can also provide administration user interface and help administrators of the apppcation to control the level of search based on the user profiles. Analytics of search results is another important and advanced aspect of any search apppcation.
Lucene s Role in Search Apppcation
Lucene plays role in steps 2 to step 7 mentioned above and provides classes to do the required operations. In a nutshell, Lucene is the heart of any search apppcation and provides vital operations pertaining to indexing and searching. Acquiring contents and displaying the results is left for the apppcation part to handle.
In the next chapter, we will perform a simple Search apppcation using Lucene Search pbrary.
Lucene - Environment Setup
This tutorial will guide you on how to prepare a development environment to start your work with the Spring Framework. This tutorial will also teach you how to setup JDK, Tomcat and Ecppse on your machine before you set up the Spring Framework −
Step 1 - Java Development Kit (JDK) Setup
You can download the latest version of SDK from Oracle s Java site:
. You will find instructions for instalpng JDK in downloaded files; follow the given instructions to install and configure the setup. Finally set the PATH and JAVA_HOME environment variables to refer to the directory that contains Java and javac, typically java_install_dir/bin and java_install_dir respectively.If you are running Windows and installed the JDK in C:jdk1.6.0_15, you would have to put the following pne in your C:autoexec.bat file.
set PATH = C:jdk1.6.0_15in;%PATH% set JAVA_HOME = C:jdk1.6.0_15
Alternatively, on Windows NT/2000/XP, you could also right-cpck on My Computer, select Properties, then Advanced, then Environment Variables. Then, you would update the PATH value and press the OK button.
On Unix (Solaris, Linux, etc.), if the SDK is installed in /usr/local/jdk1.6.0_15 and you use the C shell, you would put the following into your .cshrc file.
setenv PATH /usr/local/jdk1.6.0_15/bin:$PATH setenv JAVA_HOME /usr/local/jdk1.6.0_15
Alternatively, if you use an Integrated Development Environment (IDE) pke Borland JBuilder, Ecppse, IntelpJ IDEA, or Sun ONE Studio, compile and run a simple program to confirm that the IDE knows where you installed Java, otherwise do proper setup as given in the document of the IDE.
Step 2 - Ecppse IDE Setup
All the examples in this tutorial have been written using Ecppse IDE. So I would suggest you should have the latest version of Ecppse installed on your machine.
To install Ecppse IDE, download the latest Ecppse binaries from
. Once you downloaded the installation, unpack the binary distribution into a convenient location. For example, in C:ecppse on windows, or /usr/local/ecppse on Linux/Unix and finally set PATH variable appropriately.Ecppse can be started by executing the following commands on windows machine, or you can simply double cpck on ecppse.exe
%C:ecppseecppse.exe
Ecppse can be started by executing the following commands on Unix (Solaris, Linux, etc.) machine −
$/usr/local/ecppse/ecppse
After a successful startup, it should display the following result −
Step 3 - Setup Lucene Framework Libraries
If the startup is successful, then you can proceed to set up your Lucene framework. Following are the simple steps to download and install the framework on your machine.
Make a choice whether you want to install Lucene on Windows, or Unix and then proceed to the next step to download the .zip file for windows and .tz file for Unix.
Download the suitable version of Lucene framework binaries from
.At the time of writing this tutorial, I downloaded lucene-3.6.2.zip on my Windows machine and when you unzip the downloaded file it will give you the directory structure inside C:lucene-3.6.2 as follows.
You will find all the Lucene pbraries in the directory C:lucene-3.6.2. Make sure you set your CLASSPATH variable on this directory properly otherwise, you will face problem while running your apppcation. If you are using Ecppse, then it is not required to set CLASSPATH because all the setting will be done through Ecppse.
Once you are done with this last step, you are ready to proceed for your first Lucene Example which you will see in the next chapter.
Lucene - First Apppcation
In this chapter, we will learn the actual programming with Lucene Framework. Before you start writing your first example using Lucene framework, you have to make sure that you have set up your Lucene environment properly as explained in
tutorial. It is recommended you have the working knowledge of Ecppse IDE.Let us now proceed by writing a simple Search Apppcation which will print the number of search results found. We ll also see the pst of indexes created during this process.
Step 1 - Create Java Project
The first step is to create a simple Java Project using Ecppse IDE. Follow the option File > New -> Project and finally select Java Project wizard from the wizard pst. Now name your project as LuceneFirstApppcation using the wizard window as follows −
Once your project is created successfully, you will have following content in your Project Explorer −
Step 2 - Add Required Libraries
Let us now add Lucene core Framework pbrary in our project. To do this, right cpck on your project name LuceneFirstApppcation and then follow the following option available in context menu: Build Path -> Configure Build Path to display the Java Build Path window as follows −
Now use Add External JARs button available under Libraries tab to add the following core JAR from the Lucene installation directory −
lucene-core-3.6.2
Step 3 - Create Source Files
Let us now create actual source files under the LuceneFirstApppcation project. First we need to create a package called com.tutorialspoint.lucene. To do this, right-cpck on src in package explorer section and follow the option : New -> Package.
Next we will create LuceneTester.java and other java classes under the com.tutorialspoint.lucene package.
LuceneConstants.java
This class is used to provide various constants to be used across the sample apppcation.
package com.tutorialspoint.lucene; pubpc class LuceneConstants { pubpc static final String CONTENTS = "contents"; pubpc static final String FILE_NAME = "filename"; pubpc static final String FILE_PATH = "filepath"; pubpc static final int MAX_SEARCH = 10; }
TextFileFilter.java
This class is used as a .txt file filter.
package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; pubpc class TextFileFilter implements FileFilter { @Override pubpc boolean accept(File pathname) { return pathname.getName().toLowerCase().endsWith(".txt"); } }
Indexer.java
This class is used to index the raw data so that we can make it searchable using the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; import java.io.FileReader; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; pubpc class Indexer { private IndexWriter writer; pubpc Indexer(String indexDirectoryPath) throws IOException { //this directory will contain the indexes Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); //create the indexer writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36),true, IndexWriter.MaxFieldLength.UNLIMITED); } pubpc void close() throws CorruptIndexException, IOException { writer.close(); } private Document getDocument(File file) throws IOException { Document document = new Document(); //index file contents Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file)); //index file name Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(),Field.Store.YES,Field.Index.NOT_ANALYZED); //index file path Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(),Field.Store.YES,Field.Index.NOT_ANALYZED); document.add(contentField); document.add(fileNameField); document.add(filePathField); return document; } private void indexFile(File file) throws IOException { System.out.println("Indexing "+file.getCanonicalPath()); Document document = getDocument(file); writer.addDocument(document); } pubpc int createIndex(String dataDirPath, FileFilter filter) throws IOException { //get all files in the data directory File[] files = new File(dataDirPath).pstFiles(); for (File file : files) { if(!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && filter.accept(file) ){ indexFile(file); } } return writer.numDocs(); } }
Searcher.java
This class is used to search the indexes created by the Indexer to search the requested content.
package com.tutorialspoint.lucene; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; pubpc class Searcher { IndexSearcher indexSearcher; QueryParser queryParser; Query query; pubpc Searcher(String indexDirectoryPath) throws IOException { Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); } pubpc TopDocs search( String searchQuery) throws IOException, ParseException { query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } pubpc Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException { return indexSearcher.doc(scoreDoc.doc); } pubpc void close() throws IOException { indexSearcher.close(); } }
LuceneTester.java
This class is used to test the indexing and search capabipty of lucene pbrary.
package com.tutorialspoint.lucene; import java.io.IOException; import org.apache.lucene.document.Document; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; pubpc class LuceneTester { String indexDir = "E:\Lucene\Index"; String dataDir = "E:\Lucene\Data"; Indexer indexer; Searcher searcher; pubpc static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.createIndex(); tester.search("Mohan"); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } private void createIndex() throws IOException { indexer = new Indexer(indexDir); int numIndexed; long startTime = System.currentTimeMilps(); numIndexed = indexer.createIndex(dataDir, new TextFileFilter()); long endTime = System.currentTimeMilps(); indexer.close(); System.out.println(numIndexed+" File indexed, time taken: " +(endTime-startTime)+" ms"); } private void search(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); TopDocs hits = searcher.search(searchQuery); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime)); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.println("File: " + doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } }
Step 4 - Data & Index directory creation
We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData.
. An index directory path should be created as E:LuceneIndex. After running this program, you can see the pst of index files created in that folder.Step 5 - Running the program
Once you are done with the creation of the source, the raw data, the data directory and the index directory, you are ready for compipng and running of your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Ecppse IDE or use Ctrl + F11 to compile and run your LuceneTester apppcation. If the apppcation runs successfully, it will print the following message in Ecppse IDE s console −
Indexing E:LuceneData ecord1.txt Indexing E:LuceneData ecord10.txt Indexing E:LuceneData ecord2.txt Indexing E:LuceneData ecord3.txt Indexing E:LuceneData ecord4.txt Indexing E:LuceneData ecord5.txt Indexing E:LuceneData ecord6.txt Indexing E:LuceneData ecord7.txt Indexing E:LuceneData ecord8.txt Indexing E:LuceneData ecord9.txt 10 File indexed, time taken: 109 ms 1 documents found. Time :0 File: E:LuceneData ecord4.txt
Once you ve run the program successfully, you will have the following content in your index directory −
Lucene - Indexing Classes
Indexing process is one of the core functionapties provided by Lucene. The following diagram illustrates the indexing process and the use of classes. IndexWriter is the most important and the core component of the indexing process.
We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes.
Indexing Classes
Following is a pst of commonly-used classes during the indexing process.
S.No. | Class & Description |
---|---|
1 | This class acts as a core component which creates/updates indexes during the indexing process. |
2 | This class represents the storage location of the indexes. |
3 | This class is responsible to analyze a document and get the tokens/words from the text which is to be indexed. Without analysis done, IndexWriter cannot create index. |
4 | This class represents a virtual document with Fields where the Field is an object which can contain the physical document s contents, its meta data and so on. The Analyzer can understand a Document only. |
5 | This is the lowest unit or the starting point of the indexing process. It represents the key value pair relationship where a key is used to identify the value to be indexed. Let us assume a field used to represent contents of a document will have key as "contents" and the value may contain the part or all of the text or numeric content of the document. Lucene can index only text or numeric content only. |
Lucene - Searching Classes
The process of Searching is again one of the core functionapties provided by Lucene. Its flow is similar to that of the indexing process. Basic search of Lucene can be made using the following classes which can also be termed as foundation classes for all search related operations.
Searching Classes
Following is a pst of commonly-used classes during searching process.
S.No. | Class & Description |
---|---|
1 | This class act as a core component which reads/searches indexes created after the indexing process. It takes directory instance pointing to the location containing the indexes. |
2 | This class is the lowest unit of searching. It is similar to Field in indexing process. |
3 | Query is an abstract class and contains various utipty methods and is the parent of all types of queries that Lucene uses during search process. |
4 | TermQuery is the most commonly-used query object and is the foundation of many complex queries that Lucene can make use of. |
5 | TopDocs points to the top N search results which matches the search criteria. It is a simple container of pointers to point to documents which are the output of a search result. |
Lucene - Indexing Process
Indexing process is one of the core functionapty provided by Lucene. Following diagram illustrates the indexing process and use of classes. IndexWriter is the most important and core component of the indexing process.
We add Document(s) containing Field(s) to IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required and store/update them in a Directory. IndexWriter is used to update or create indexes. It is not used to read indexes.
Now we ll show you a step by step process to get a kick start in understanding of indexing process using a basic example.
Create a document
Create a method to get a lucene document from a text file.
Create various types of fields which are key value pairs containing keys as names and values as contents to be indexed.
Set field to be analyzed or not. In our case, only contents is to be analyzed as it can contain data such as a, am, are, an etc. which are not required in search operations.
Add the newly created fields to the document object and return it to the caller method.
private Document getDocument(File file) throws IOException { Document document = new Document(); //index file contents Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file)); //index file name Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,Field.Index.NOT_ANALYZED); //index file path Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,Field.Index.NOT_ANALYZED); document.add(contentField); document.add(fileNameField); document.add(filePathField); return document; }
Create a IndexWriter
IndexWriter class acts as a core component which creates/updates indexes during indexing process. Follow these steps to create a IndexWriter −
Step 1 − Create object of IndexWriter.
Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.
Step 3 − Initiapze the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters.
private IndexWriter writer; pubpc Indexer(String indexDirectoryPath) throws IOException { //this directory will contain the indexes Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); //create the indexer writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36),true, IndexWriter.MaxFieldLength.UNLIMITED); }
Start Indexing Process
The following program shows how to start an indexing process −
private void indexFile(File file) throws IOException { System.out.println("Indexing "+file.getCanonicalPath()); Document document = getDocument(file); writer.addDocument(document); }
Example Apppcation
To test the indexing process, we need to create a Lucene apppcation test.
Step | Description |
---|---|
1 | Create a project with a name LuceneFirstApppcation under a package com.tutorialspoint.lucene as explained in the Lucene - First Apppcation chapter. You can also use the project created in Lucene - First Apppcation chapter as such for this chapter to understand the indexing process. |
2 | Create LuceneConstants.java,TextFileFilter.java and Indexer.java as explained in the Lucene - First Apppcation chapter. Keep the rest of the files unchanged. |
3 | Create LuceneTester.java as mentioned below. |
4 | Clean and build the apppcation to make sure the business logic is working as per the requirements. |
LuceneConstants.java
This class is used to provide various constants to be used across the sample apppcation.
package com.tutorialspoint.lucene; pubpc class LuceneConstants { pubpc static final String CONTENTS = "contents"; pubpc static final String FILE_NAME = "filename"; pubpc static final String FILE_PATH = "filepath"; pubpc static final int MAX_SEARCH = 10; }
TextFileFilter.java
This class is used as a .txt file filter.
package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; pubpc class TextFileFilter implements FileFilter { @Override pubpc boolean accept(File pathname) { return pathname.getName().toLowerCase().endsWith(".txt"); } }
Indexer.java
This class is used to index the raw data so that we can make it searchable using the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; import java.io.FileReader; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; pubpc class Indexer { private IndexWriter writer; pubpc Indexer(String indexDirectoryPath) throws IOException { //this directory will contain the indexes Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); //create the indexer writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36),true, IndexWriter.MaxFieldLength.UNLIMITED); } pubpc void close() throws CorruptIndexException, IOException { writer.close(); } private Document getDocument(File file) throws IOException { Document document = new Document(); //index file contents Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file)); //index file name Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,Field.Index.NOT_ANALYZED); //index file path Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,Field.Index.NOT_ANALYZED); document.add(contentField); document.add(fileNameField); document.add(filePathField); return document; } private void indexFile(File file) throws IOException { System.out.println("Indexing "+file.getCanonicalPath()); Document document = getDocument(file); writer.addDocument(document); } pubpc int createIndex(String dataDirPath, FileFilter filter) throws IOException { //get all files in the data directory File[] files = new File(dataDirPath).pstFiles(); for (File file : files) { if(!file.isDirectory() && !file.isHidden() && file.exists() && file.canRead() && filter.accept(file) ){ indexFile(file); } } return writer.numDocs(); } }
LuceneTester.java
This class is used to test the indexing capabipty of the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.IOException; pubpc class LuceneTester { String indexDir = "E:\Lucene\Index"; String dataDir = "E:\Lucene\Data"; Indexer indexer; pubpc static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.createIndex(); } catch (IOException e) { e.printStackTrace(); } } private void createIndex() throws IOException { indexer = new Indexer(indexDir); int numIndexed; long startTime = System.currentTimeMilps(); numIndexed = indexer.createIndex(dataDir, new TextFileFilter()); long endTime = System.currentTimeMilps(); indexer.close(); System.out.println(numIndexed+" File indexed, time taken: " +(endTime-startTime)+" ms"); } }
Data & Index Directory Creation
We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData.
. An index directory path should be created as E:LuceneIndex. After running this program, you can see the pst of index files created in that folder.Running the Program
Once you are done with the creation of the source, the raw data, the data directory and the index directory, you can proceed by compipng and running your program. To do this, keep the LuceneTester.Java file tab active and use either the Run option available in the Ecppse IDE or use Ctrl + F11 to compile and run your LuceneTester apppcation. If your apppcation runs successfully, it will print the following message in Ecppse IDE s console −
Indexing E:LuceneData ecord1.txt Indexing E:LuceneData ecord10.txt Indexing E:LuceneData ecord2.txt Indexing E:LuceneData ecord3.txt Indexing E:LuceneData ecord4.txt Indexing E:LuceneData ecord5.txt Indexing E:LuceneData ecord6.txt Indexing E:LuceneData ecord7.txt Indexing E:LuceneData ecord8.txt Indexing E:LuceneData ecord9.txt 10 File indexed, time taken: 109 ms
Once you ve run the program successfully, you will have the following content in your index directory −
Lucene - Indexing Operations
In this chapter, we ll discuss the four major operations of indexing. These operations are useful at various times and are used throughout of a software search apppcation.
Indexing Operations
Following is a pst of commonly-used operations during indexing process.
S.No. | Operation & Description |
---|---|
1 | This operation is used in the initial stage of the indexing process to create the indexes on the newly available content. |
2 | This operation is used to update indexes to reflect the changes in the updated contents. It is similar to recreating the index. |
3 | This operation is used to update indexes to exclude the documents which are not required to be indexed/searched. |
4 | Field options specify a way or control the ways in which the contents of a field are to be made searchable. |
Lucene - Search Operation
The process of searching is one of the core functionapties provided by Lucene. Following diagram illustrates the process and its use. IndexSearcher is one of the core components of the searching process.
We first create Directory(s) containing indexes and then pass it to IndexSearcher which opens the Directory using IndexReader. Then we create a Query with a Term and make a search using IndexSearcher by passing the Query to the searcher. IndexSearcher returns a TopDocs object which contains the search details along with document ID(s) of the Document which is the result of the search operation.
We will now show you a step-wise approach and help you understand the indexing process using a basic example.
Create a QueryParser
QueryParser class parses the user entered input into Lucene understandable format query. Follow these steps to create a QueryParser −
Step 1 − Create object of QueryParser.
Step 2 − Initiapze the QueryParser object created with a standard analyzer having version information and index name on which this query is to be run.
QueryParser queryParser; pubpc Searcher(String indexDirectoryPath) throws IOException { queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); }
Create a IndexSearcher
IndexSearcher class acts as a core component which searcher indexes created during indexing process. Follow these steps to create a IndexSearcher −
Step 1 − Create object of IndexSearcher.
Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.
Step 3 − Initiapze the IndexSearcher object created with the index directory.
IndexSearcher indexSearcher; pubpc Searcher(String indexDirectoryPath) throws IOException { Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); }
Make search
Follow these steps to make search −
Step 1 − Create a Query object by parsing the search expression through QueryParser.
Step 2 − Make search by calpng the IndexSearcher.search() method.
Query query; pubpc TopDocs search( String searchQuery) throws IOException, ParseException { query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); }
Get the Document
The following program shows how to get the document.
pubpc Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException { return indexSearcher.doc(scoreDoc.doc); }
Close IndexSearcher
The following program shows how to close the IndexSearcher.
pubpc void close() throws IOException { indexSearcher.close(); }
Example Apppcation
Let us create a test Lucene apppcation to test searching process.
Step | Description |
---|---|
1 | Create a project with a name LuceneFirstApppcation under a package com.tutorialspoint.lucene as explained in the Lucene - First Apppcation chapter. You can also use the project created in Lucene - First Apppcation chapter as such for this chapter to understand the searching process. |
2 | Create LuceneConstants.java,TextFileFilter.java and Searcher.java as explained in the Lucene - First Apppcation chapter. Keep the rest of the files unchanged. |
3 | Create LuceneTester.java as mentioned below. |
4 | Clean and Build the apppcation to make sure business logic is working as per the requirements. |
LuceneConstants.java
This class is used to provide various constants to be used across the sample apppcation.
package com.tutorialspoint.lucene; pubpc class LuceneConstants { pubpc static final String CONTENTS = "contents"; pubpc static final String FILE_NAME = "filename"; pubpc static final String FILE_PATH = "filepath"; pubpc static final int MAX_SEARCH = 10; }
TextFileFilter.java
This class is used as a .txt file filter.
package com.tutorialspoint.lucene; import java.io.File; import java.io.FileFilter; pubpc class TextFileFilter implements FileFilter { @Override pubpc boolean accept(File pathname) { return pathname.getName().toLowerCase().endsWith(".txt"); } }
Searcher.java
This class is used to read the indexes made on raw data and searches data using the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; pubpc class Searcher { IndexSearcher indexSearcher; QueryParser queryParser; Query query; pubpc Searcher(String indexDirectoryPath) throws IOException { Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); } pubpc TopDocs search( String searchQuery) throws IOException, ParseException { query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } pubpc Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException { return indexSearcher.doc(scoreDoc.doc); } pubpc void close() throws IOException { indexSearcher.close(); } }
LuceneTester.java
This class is used to test the searching capabipty of the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.IOException; import org.apache.lucene.document.Document; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; pubpc class LuceneTester { String indexDir = "E:\Lucene\Index"; String dataDir = "E:\Lucene\Data"; Searcher searcher; pubpc static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.search("Mohan"); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } private void search(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); TopDocs hits = searcher.search(searchQuery); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) +" ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } }
Data & Index Directory Creation
We have used 10 text files named record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData.
. An index directory path should be created as E:LuceneIndex. After running the indexing program in the chapter Lucene - Indexing Process, you can see the pst of index files created in that folder.Running the Program
Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compipng and running your program. To do this, keep LuceneTester.Java file tab active and use either the Run option available in the Ecppse IDE or use Ctrl + F11 to compile and run your LuceneTesterapppcation. If your apppcation runs successfully, it will print the following message in Ecppse IDE s console −
1 documents found. Time :29 ms File: E:LuceneData ecord4.txt
Lucene - Query Programming
We have seen in previous chapter Lucene - Search Operation, Lucene uses IndexSearcher to make searches and it uses the Query object created by QueryParser as the input. In this chapter, we are going to discuss various types of Query objects and the different ways to create them programmatically. Creating different types of Query object gives control on the kind of search to be made.
Consider a case of Advanced Search, provided by many apppcations where users are given multiple options to confine the search results. By Query programming, we can achieve the same very easily.
Following is the pst of Query types that we ll discuss in due course.
S.No. | Class & Description |
---|---|
1 | This class acts as a core component which creates/updates indexes during the indexing process. |
2 | TermRangeQuery is used when a range of textual terms are to be searched. |
3 | PrefixQuery is used to match documents whose index starts with a specified string. |
4 | BooleanQuery is used to search documents which are result of multiple queries using AND, OR or NOT operators. |
5 | Phrase query is used to search documents which contain a particular sequence of terms. |
6 | WildcardQuery is used to search documents using wildcards pke * for any character sequence,? matching a single character. |
7 | FuzzyQuery is used to search documents using fuzzy implementation that is an approximate search based on the edit distance algorithm. |
8 | MatchAllDocsQuery as the name suggests matches all the documents. |
Lucene - Analysis
In one of our previous chapters, we have seen that Lucene uses IndexWriter to analyze the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss the various types of Analyzer objects and other relevant objects which are used during the analysis process. Understanding the Analysis process and how analyzers work will give you great insight over how Lucene indexes the documents.
Following is the pst of objects that we ll discuss in due course.
S.No. | Class & Description |
---|---|
1 | Token represents text or word in a document with relevant details pke its metadata (position, start offset, end offset, token type and its position increment). |
2 | TokenStream is an output of the analysis process and it comprises of a series of tokens. It is an abstract class. |
3 | This is an abstract base class for each and every type of Analyzer. |
4 | This analyzer sppts the text in a document based on whitespace. |
5 | This analyzer sppts the text in a document based on non-letter characters and puts the text in lowercase. |
6 | This analyzer works just as the SimpleAnalyzer and removes the common words pke a , an , the , etc. |
7 | This is the most sophisticated analyzer and is capable of handpng names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any. |
Lucene - Sorting
In this chapter, we will look into the sorting orders in which Lucene gives the search results by default or can be manipulated as required.
Sorting by Relevance
This is the default sorting mode used by Lucene. Lucene provides results by the most relevant hit at the top.
private void sortUsingRelevance(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); }
Sorting by IndexOrder
This sorting mode is used by Lucene. Here, the first document indexed is shown first in the search results.
private void sortUsingIndex(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); }
Example Apppcation
Let us create a test Lucene apppcation to test the sorting process.
Step | Description |
---|---|
1 | Create a project with a name LuceneFirstApppcation under a package com.tutorialspoint.lucene as explained in the Lucene - First Apppcation chapter. You can also use the project created in Lucene - First Apppcation chapter as such for this chapter to understand the searching process. |
2 | Create LuceneConstants.java and Searcher.java as explained in the Lucene - First Apppcation chapter. Keep the rest of the files unchanged. |
3 | Create LuceneTester.java as mentioned below. |
4 | Clean and Build the apppcation to make sure the business logic is working as per the requirements. |
LuceneConstants.java
This class is used to provide various constants to be used across the sample apppcation.
package com.tutorialspoint.lucene; pubpc class LuceneConstants { pubpc static final String CONTENTS = "contents"; pubpc static final String FILE_NAME = "filename"; pubpc static final String FILE_PATH = "filepath"; pubpc static final int MAX_SEARCH = 10; }
Searcher.java
This class is used to read the indexes made on raw data and searches data using the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; pubpc class Searcher { IndexSearcher indexSearcher; QueryParser queryParser; Query query; pubpc Searcher(String indexDirectoryPath) throws IOException { Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath)); indexSearcher = new IndexSearcher(indexDirectory); queryParser = new QueryParser(Version.LUCENE_36, LuceneConstants.CONTENTS, new StandardAnalyzer(Version.LUCENE_36)); } pubpc TopDocs search( String searchQuery) throws IOException, ParseException { query = queryParser.parse(searchQuery); return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } pubpc TopDocs search(Query query) throws IOException, ParseException { return indexSearcher.search(query, LuceneConstants.MAX_SEARCH); } pubpc TopDocs search(Query query,Sort sort) throws IOException, ParseException { return indexSearcher.search(query, LuceneConstants.MAX_SEARCH,sort); } pubpc void setDefaultFieldSortScoring(boolean doTrackScores, boolean doMaxScores) { indexSearcher.setDefaultFieldSortScoring( doTrackScores,doMaxScores); } pubpc Document getDocument(ScoreDoc scoreDoc) throws CorruptIndexException, IOException { return indexSearcher.doc(scoreDoc.doc); } pubpc void close() throws IOException { indexSearcher.close(); } }
LuceneTester.java
This class is used to test the searching capabipty of the Lucene pbrary.
package com.tutorialspoint.lucene; import java.io.IOException; import org.apache.lucene.document.Document; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.FuzzyQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.TopDocs; pubpc class LuceneTester { String indexDir = "E:\Lucene\Index"; String dataDir = "E:\Lucene\Data"; Indexer indexer; Searcher searcher; pubpc static void main(String[] args) { LuceneTester tester; try { tester = new LuceneTester(); tester.sortUsingRelevance("cord3.txt"); tester.sortUsingIndex("cord3.txt"); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } private void sortUsingRelevance(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.RELEVANCE); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } private void sortUsingIndex(String searchQuery) throws IOException, ParseException { searcher = new Searcher(indexDir); long startTime = System.currentTimeMilps(); //create a term to search file name Term term = new Term(LuceneConstants.FILE_NAME, searchQuery); //create the term query object Query query = new FuzzyQuery(term); searcher.setDefaultFieldSortScoring(true, false); //do the search TopDocs hits = searcher.search(query,Sort.INDEXORDER); long endTime = System.currentTimeMilps(); System.out.println(hits.totalHits + " documents found. Time :" + (endTime - startTime) + "ms"); for(ScoreDoc scoreDoc : hits.scoreDocs) { Document doc = searcher.getDocument(scoreDoc); System.out.print("Score: "+ scoreDoc.score + " "); System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH)); } searcher.close(); } }
Data & Index Directory Creation
We have used 10 text files from record1.txt to record10.txt containing names and other details of the students and put them in the directory E:LuceneData.
. An index directory path should be created as E:LuceneIndex. After running the indexing program in the chapter Lucene - Indexing Process, you can see the pst of index files created in that folder.Running the Program
Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can compile and run your program. To do this, Keep the LuceneTester.Java file tab active and use either the Run option available in the Ecppse IDE or use Ctrl + F11 to compile and run your LuceneTester apppcation. If your apppcation runs successfully, it will print the following message in Ecppse IDE s console −
10 documents found. Time :31ms Score: 1.3179655 File: E:LuceneData ecord3.txt Score: 0.790779 File: E:LuceneData ecord1.txt Score: 0.790779 File: E:LuceneData ecord2.txt Score: 0.790779 File: E:LuceneData ecord4.txt Score: 0.790779 File: E:LuceneData ecord5.txt Score: 0.790779 File: E:LuceneData ecord6.txt Score: 0.790779 File: E:LuceneData ecord7.txt Score: 0.790779 File: E:LuceneData ecord8.txt Score: 0.790779 File: E:LuceneData ecord9.txt Score: 0.2635932 File: E:LuceneData ecord10.txt 10 documents found. Time :0ms Score: 0.790779 File: E:LuceneData ecord1.txt Score: 0.2635932 File: E:LuceneData ecord10.txt Score: 0.790779 File: E:LuceneData ecord2.txt Score: 1.3179655 File: E:LuceneData ecord3.txt Score: 0.790779 File: E:LuceneData ecord4.txt Score: 0.790779 File: E:LuceneData ecord5.txt Score: 0.790779 File: E:LuceneData ecord6.txt Score: 0.790779 File: E:LuceneData ecord7.txt Score: 0.790779 File: E:LuceneData ecord8.txt Score: 0.790779 File: E:LuceneData ecord9.txtAdvertisements