- OpenNLP - Command Line Interface
- OpenNLP - Chunking Sentences
- OpenNLP - Parsing the Sentences
- OpenNLP - Finding Parts of Speech
- Named Entity Recognition
- OpenNLP - Tokenization
- OpenNLP - Sentence Detection
- OpenNLP - Referenced API
- OpenNLP - Environment
- OpenNLP - Overview
- OpenNLP - Home
OpenNLP Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
OpenNLP - Sentence Detection
While processing a natural language, deciding the beginning and end of the sentences is one of the problems to be addressed. This process is known as Sentence Boundary Disambiguation (SBD) or simply sentence breaking.
The techniques we use to detect the sentences in the given text, depends on the language of the text.
Sentence Detection Using Java
We can detect the sentences in the given text in Java using, Regular Expressions, and a set of simple rules.
For example, let us assume a period, a question mark, or an exclamation mark ends a sentence in the given text, then we can sppt the sentence using the sppt() method of the String class. Here, we have to pass a regular expression in String format.
Following is the program which determines the sentences in a given text using Java regular expressions (sppt method). Save this program in a file with the name SentenceDetection_RE.java.
pubpc class SentenceDetection_RE { pubpc static void main(String args[]){ String sentence = " Hi. How are you? Welcome to Tutorialspoint. " + "We provide free tutorials on various technologies"; String simple = "[.?!]"; String[] spptString = (sentence.sppt(simple)); for (String string : spptString) System.out.println(string); } }
Compile and execute the saved java file from the command prompt using the following commands.
javac SentenceDetection_RE.java java SentenceDetection_RE
On executing, the above program creates a PDF document displaying the following message.
Hi How are you Welcome to Tutorialspoint We provide free tutorials on various technologies
Sentence Detection Using OpenNLP
To detect sentences, OpenNLP uses a predefined model, a file named en-sent.bin. This predefined model is trained to detect sentences in a given raw text.
The opennlp.tools.sentdetect package contains the classes and interfaces that are used to perform the sentence detection task.
To detect a sentence using OpenNLP pbrary, you need to −
Load the en-sent.bin model using the SentenceModel class
Instantiate the SentenceDetectorME class.
Detect the sentences using the sentDetect() method of this class.
Following are the steps to be followed to write a program which detects the sentences from the given raw text.
Step 1: Loading the model
The model for sentence detection is represented by the class named SentenceModel, which belongs to the package opennlp.tools.sentdetect.
To load a sentence detection model −
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
Instantiate the SentenceModel class and pass the InputStream (object) of the model as a parameter to its constructor as shown in the following code block −
//Loading sentence detector model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/ensent.bin"); SentenceModel model = new SentenceModel(inputStream);
Step 2: Instantiating the SentenceDetectorME class
The SentenceDetectorME class of the package opennlp.tools.sentdetect contains methods to sppt the raw text into sentences. This class uses the Maximum Entropy model to evaluate end-of-sentence characters in a string to determine if they signify the end of a sentence.
Instantiate this class and pass the model object created in the previous step, as shown below.
//Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model);
Step 3: Detecting the sentence
The sentDetect() method of the SentenceDetectorME class is used to detect the sentences in the raw text passed to it. This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence to this method.
//Detecting the sentence String sentences[] = detector.sentDetect(sentence);
Example
Following is the program which detects the sentences in a given raw text. Save this program in a file with named SentenceDetectionME.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; pubpc class SentenceDetectionME { pubpc static void main(String args[]) throws Exception { String sentence = "Hi. How are you? Welcome to Tutorialspoint. " + "We provide free tutorials on various technologies"; //Loading sentence detector model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-sent.bin"); SentenceModel model = new SentenceModel(inputStream); //Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model); //Detecting the sentence String sentences[] = detector.sentDetect(sentence); //Printing the sentences for(String sent : sentences) System.out.println(sent); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentenceDetectorME.java java SentenceDetectorME
On executing, the above program reads the given String and detects the sentences in it and displays the following output.
Hi. How are you? Welcome to Tutorialspoint. We provide free tutorials on various technologies
Detecting the Positions of the Sentences
We can also detect the positions of the sentences using the sentPosDetect() method of the SentenceDetectorME class.
Following are the steps to be followed to write a program which detects the positions of the sentences from the given raw text.
Step 1: Loading the model
The model for sentence detection is represented by the class named SentenceModel, which belongs to the package opennlp.tools.sentdetect.
To load a sentence detection model −
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
Instantiate the SentenceModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block.
//Loading sentence detector model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-sent.bin"); SentenceModel model = new SentenceModel(inputStream);
Step 2: Instantiating the SentenceDetectorME class
The SentenceDetectorME class of the package opennlp.tools.sentdetect contains methods to sppt the raw text into sentences. This class uses the Maximum Entropy model to evaluate end-of-sentence characters in a string to determine if they signify the end of a sentence.
Instantiate this class and pass the model object created in the previous step.
//Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model);
Step 3: Detecting the position of the sentence
The sentPosDetect() method of the SentenceDetectorME class is used to detect the positions of the sentences in the raw text passed to it. This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence as a parameter to this method.
//Detecting the position of the sentences in the paragraph Span[] spans = detector.sentPosDetect(sentence);
Step 4: Printing the spans of the sentences
The sentPosDetect() method of the SentenceDetectorME class returns an array of objects of the type Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer of sets.
You can store the spans returned by the sentPosDetect() method in the Span array and print them, as shown in the following code block.
//Printing the sentences and their spans of a sentence for (Span span : spans) System.out.println(paragraph.substring(span);
Example
Following is the program which detects the sentences in the given raw text. Save this program in a file with named SentenceDetectionME.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.util.Span; pubpc class SentencePosDetection { pubpc static void main(String args[]) throws Exception { String paragraph = "Hi. How are you? Welcome to Tutorialspoint. " + "We provide free tutorials on various technologies"; //Loading sentence detector model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-sent.bin"); SentenceModel model = new SentenceModel(inputStream); //Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model); //Detecting the position of the sentences in the raw text Span spans[] = detector.sentPosDetect(paragraph); //Printing the spans of the sentences in the paragraph for (Span span : spans) System.out.println(span); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentencePosDetection.java java SentencePosDetection
On executing, the above program reads the given String and detects the sentences in it and displays the following output.
[0..16) [17..43) [44..93)
Sentences along with their Positions
The substring() method of the String class accepts the begin and the end offsets and returns the respective string. We can use this method to print the sentences and their spans (positions) together, as shown in the following code block.
for (Span span : spans) System.out.println(sen.substring(span.getStart(), span.getEnd())+" "+ span);
Following is the program to detect the sentences from the given raw text and display them along with their positions. Save this program in a file with name SentencesAndPosDetection.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.util.Span; pubpc class SentencesAndPosDetection { pubpc static void main(String args[]) throws Exception { String sen = "Hi. How are you? Welcome to Tutorialspoint." + " We provide free tutorials on various technologies"; //Loading a sentence model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-sent.bin"); SentenceModel model = new SentenceModel(inputStream); //Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model); //Detecting the position of the sentences in the paragraph Span[] spans = detector.sentPosDetect(sen); //Printing the sentences and their spans of a paragraph for (Span span : spans) System.out.println(sen.substring(span.getStart(), span.getEnd())+" "+ span); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentencesAndPosDetection.java java SentencesAndPosDetection
On executing, the above program reads the given String and detects the sentences along with their positions and displays the following output.
Hi. How are you? [0..16) Welcome to Tutorialspoint. [17..43) We provide free tutorials on various technologies [44..93)
Sentence Probabipty Detection
The getSentenceProbabipties() method of the SentenceDetectorME class returns the probabipties associated with the most recent calls to the sentDetect() method.
//Getting the probabipties of the last decoded sequence double[] probs = detector.getSentenceProbabipties();
Following is the program to print the probabipties associated with the calls to the sentDetect() method. Save this program in a file with the name SentenceDetectionMEProbs.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; pubpc class SentenceDetectionMEProbs { pubpc static void main(String args[]) throws Exception { String sentence = "Hi. How are you? Welcome to Tutorialspoint. " + "We provide free tutorials on various technologies"; //Loading sentence detector model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-sent.bin"); SentenceModel model = new SentenceModel(inputStream); //Instantiating the SentenceDetectorME class SentenceDetectorME detector = new SentenceDetectorME(model); //Detecting the sentence String sentences[] = detector.sentDetect(sentence); //Printing the sentences for(String sent : sentences) System.out.println(sent); //Getting the probabipties of the last decoded sequence double[] probs = detector.getSentenceProbabipties(); System.out.println(" "); for(int i = 0; i<probs.length; i++) System.out.println(probs[i]); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentenceDetectionMEProbs.java java SentenceDetectionMEProbs
On executing, the above program reads the given String and detects the sentences and prints them. In addition, it also returns the probabipties associated with the most recent calls to the sentDetect() method, as shown below.
Hi. How are you? Welcome to Tutorialspoint. We provide free tutorials on various technologies 0.9240246995179983 0.9957680129995953 1.0Advertisements