- OpenNLP - Command Line Interface
- OpenNLP - Chunking Sentences
- OpenNLP - Parsing the Sentences
- OpenNLP - Finding Parts of Speech
- Named Entity Recognition
- OpenNLP - Tokenization
- OpenNLP - Sentence Detection
- OpenNLP - Referenced API
- OpenNLP - Environment
- OpenNLP - Overview
- OpenNLP - Home
OpenNLP Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
OpenNLP - Named Entity Recognition
The process of finding names, people, places, and other entities, from a given text is known as Named Entity Recognition (NER). In this chapter, we will discuss how to carry out NER through Java program using OpenNLP pbrary.
Named Entity Recognition using open NLP
To perform various NER tasks, OpenNLP uses different predefined models namely, en-nerdate.bn, en-ner-location.bin, en-ner-organization.bin, en-ner-person.bin, and en-ner-time.bin. All these files are predefined models which are trained to detect the respective entities in a given raw text.
The opennlp.tools.namefind package contains the classes and interfaces that are used to perform the NER task. To perform NER task using OpenNLP pbrary, you need to −
Load the respective model using the TokenNameFinderModel class.
Instantiate the NameFinder class.
Find the names and print them.
Following are the steps to be followed to write a program which detects the name entities from a given raw text.
Step 1: Loading the model
The model for sentence detection is represented by the class named TokenNameFinderModel, which belongs to the package opennlp.tools.namefind.
To load an NER model −
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the appropriate NER model in String format to its constructor).
Instantiate the TokenNameFinderModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block.
//Loading the NER-person model InputStream inputStreamNameFinder = new FileInputStream(".../en-nerperson.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder);
Step 2: Instantiating the NameFinderME class
The NameFinderME class of the package opennlp.tools.namefind contains methods to perform the NER tasks. This class uses the Maximum Entropy model to find the named entities in the given raw text.
Instantiate this class and pass the model object created in the previous step as shown below −
//Instantiating the NameFinderME class NameFinderME nameFinder = new NameFinderME(model);
Step 3: Finding the names in the sentence
The find() method of the NameFinderME class is used to detect the names in the raw text passed to it. This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence to this method.
//Finding the names in the sentence Span nameSpans[] = nameFinder.find(sentence);
Step 4: Printing the spans of the names in the sentence
The find() method of the NameFinderME class returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer of sets.
You can store the spans returned by the find() method in the Span array and print them, as shown in the following code block.
//Printing the sentences and their spans of a sentence for (Span span : spans) System.out.println(paragraph.substring(span);
NER Example
Following is the program which reads the given sentence and recognizes the spans of the names of the persons in it. Save this program in a file with the name NameFinderME_Example.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.util.Span; pubpc class NameFinderME_Example { pubpc static void main(String args[]) throws Exception{ /Loading the NER - Person model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-ner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStream); //Instantiating the NameFinder class NameFinderME nameFinder = new NameFinderME(model); //Getting the sentence in the form of String array String [] sentence = new String[]{ "Mike", "and", "Smith", "are", "good", "friends" }; //Finding the names in the sentence Span nameSpans[] = nameFinder.find(sentence); //Printing the spans of the names in the sentence for(Span s: nameSpans) System.out.println(s.toString()); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac NameFinderME_Example.java java NameFinderME_Example
On executing, the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans), as shown below.
[0..1) person [2..3) person
Names along with their Positions
The substring() method of the String class accepts the begin and the end offsets and returns the respective string. We can use this method to print the names and their spans (positions) together, as shown in the following code block.
for(Span s: nameSpans) System.out.println(s.toString()+" "+tokens[s.getStart()]);
Following is the program to detect the names from the given raw text and display them along with their positions. Save this program in a file with the name NameFinderSentences.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; pubpc class NameFinderSentences { pubpc static void main(String args[]) throws Exception{ //Loading the tokenizer model InputStream inputStreamTokenizer = new FileInputStream("C:/OpenNLP_models/entoken.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); //Instantiating the TokenizerME class TokenizerME tokenizer = new TokenizerME(tokenModel); //Tokenizing the sentence in to a string array String sentence = "Mike is senior programming manager and Rama is a clerk both are working at Tutorialspoint"; String tokens[] = tokenizer.tokenize(sentence); //Loading the NER-person model InputStream inputStreamNameFinder = new FileInputStream("C:/OpenNLP_models/enner-person.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); //Instantiating the NameFinderME class NameFinderME nameFinder = new NameFinderME(model); //Finding the names in the sentence Span nameSpans[] = nameFinder.find(tokens); //Printing the names and their spans in a sentence for(Span s: nameSpans) System.out.println(s.toString()+" "+tokens[s.getStart()]); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac NameFinderSentences.java java NameFinderSentences
On executing, the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans) as shown below.
[0..1) person Mike
Finding the Names of the Location
By loading various models, you can detect various named entities. Following is a Java program which loads the en-ner-location.bin model and detects the location names in the given sentence. Save this program in a file with the name LocationFinder.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.namefind.NameFinderME; import opennlp.tools.namefind.TokenNameFinderModel; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; pubpc class LocationFinder { pubpc static void main(String args[]) throws Exception{ InputStream inputStreamTokenizer = new FileInputStream("C:/OpenNLP_models/entoken.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStreamTokenizer); //String paragraph = "Mike and Smith are classmates"; String paragraph = "Tutorialspoint is located in Hyderabad"; //Instantiating the TokenizerME class TokenizerME tokenizer = new TokenizerME(tokenModel); String tokens[] = tokenizer.tokenize(paragraph); //Loading the NER-location moodel InputStream inputStreamNameFinder = new FileInputStream("C:/OpenNLP_models/en- ner-location.bin"); TokenNameFinderModel model = new TokenNameFinderModel(inputStreamNameFinder); //Instantiating the NameFinderME class NameFinderME nameFinder = new NameFinderME(model); //Finding the names of a location Span nameSpans[] = nameFinder.find(tokens); //Printing the spans of the locations in the sentence for(Span s: nameSpans) System.out.println(s.toString()+" "+tokens[s.getStart()]); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac LocationFinder.java java LocationFinder
On executing, the above program reads the given String (raw text), detects the names of the persons in it, and displays their positions (spans), as shown below.
[4..5) location Hyderabad
NameFinder Probabipty
The probs()method of the NameFinderME class is used to get the probabipties of the last decoded sequence.
double[] probs = nameFinder.probs();
Following is the program to print the probabipties. Save this program in a file with the name TokenizerMEProbs.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import opennlp.tools.util.Span; pubpc class TokenizerMEProbs { pubpc static void main(String args[]) throws Exception{ String sent = "Hello John how are you welcome to Tutorialspoint"; //Loading the Tokenizer model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-token.bin"); TokenizerModel tokenModel = new TokenizerModel(inputStream); //Instantiating the TokenizerME class TokenizerME tokenizer = new TokenizerME(tokenModel); //Retrieving the positions of the tokens Span tokens[] = tokenizer.tokenizePos(sent); //Getting the probabipties of the recent calls to tokenizePos() method double[] probs = tokenizer.getTokenProbabipties(); //Printing the spans of tokens for( Span token : tokens) System.out.println(token +" "+sent.substring(token.getStart(), token.getEnd())); System.out.println(" "); for(int i = 0; i<probs.length; i++) System.out.println(probs[i]); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEProbs.java java TokenizerMEProbs
On executing, the above program reads the given String, tokenizes the sentences, and prints them. In addition, it also returns the probabipties of the last decoded sequence, as shown below.
[0..5) Hello [6..10) John [11..14) how [15..18) are [19..22) you [23..30) welcome [31..33) to [34..48) Tutorialspoint 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0Advertisements