- Natural Language Toolkit - Discussion
- Natural Language Toolkit - Useful Resources
- Natural Language Toolkit - Quick Guide
- Natural Language Toolkit - Text Classification
- Synonym & Antonym Replacement
- Natural Language Toolkit - Word Replacement
- Stemming & Lemmatization
- Looking up words in Wordnet
- Training Tokenizer & Filtering Stopwords
- Natural Language Toolkit - Tokenizing Text
- Natural Language Toolkit - Getting Started
- Natural Language Toolkit - Introduction
- Natural Language Toolkit - Home
自然语言工具包
- 自然语言工具箱——改造树木
- 自然语言工具箱——改造楚克
- Chunking & Information 排外
- 自然语言工具箱——包装
- 自然语言工具包 - 更多国家 Taggers
- 自然语言工具箱——将Taggers混为一谈
- 自然语言工具箱——Unigram Tagger
- 部分Speech(POS)基本原理
- Corpus Readers and Customs Corpora
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Looking up words in Wordnet
What is Wordnet?
Wordnet is a large lexical database of Engpsh, which was created by Princeton. It is a part of the NLTK corpus. Nouns, verbs, adjectives and adverbs all are grouped into set of synsets, i.e., cognitive synonyms. Here each set of synsets express a distinct meaning. Following are some use cases of Wordnet −
It can be used to look up the definition of a word
We can find synonyms and antonyms of a word
Word relations and similarities can be explored using Wordnet
Word sense disambiguation for those words having multiple uses and definitions
How to import Wordnet?
Wordnet can be imported with the help of following command −
from nltk.corpus import wordnet
For more compact command, use the following −
from nltk.corpus import wordnet as wn
Synset instances
Synset are groupings of synonyms words that express the same concept. When you use Wordnet to look up words, you will get a pst of Synset instances.
wordnet.synsets(word)
To get a pst of Synsets, we can look up any word in Wordnet by using wordnet.synsets(word). For example, in next Python recipe, we are going to look up the Synset for the ‘dog’ along with some properties and methods of Synset −
Example
First, import the wordnet as follows −
from nltk.corpus import wordnet as wn
Now, provide the word you want to look up the Synset for −
syn = wn.synsets( dog )[0]
Here, we are using name() method to get the unique name for the synset which can be used to get the Synset directly −
syn.name() Output: dog.n.01
Next, we are using definition() method which will give us the definition of the word −
syn.definition() Output: a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
Another method is examples() which will give us the examples related to the word −
syn.examples() Output: [ the dog barked all night ]
Complete implementation example
from nltk.corpus import wordnet as wn syn = wn.synsets( dog )[0] syn.name() syn.definition() syn.examples()
Getting Hypernyms
Synsets are organized in an inheritance tree pke structure in which Hypernyms represents more abstracted terms while Hyponyms represents the more specific terms. One of the important things is that this tree can be traced all the way to a root hypernym. Let us understand the concept with the help of the following example −
from nltk.corpus import wordnet as wn syn = wn.synsets( dog )[0] syn.hypernyms()
Output
[Synset( canine.n.02 ), Synset( domestic_animal.n.01 )]
Here, we can see that canine and domestic_animal are the hypernyms of ‘dog’.
Now, we can find hyponyms of ‘dog’ as follows −
syn.hypernyms()[0].hyponyms()
Output
[ Synset( bitch.n.04 ), Synset( dog.n.01 ), Synset( fox.n.01 ), Synset( hyena.n.01 ), Synset( jackal.n.01 ), Synset( wild_dog.n.01 ), Synset( wolf.n.01 ) ]
From the above output, we can see that ‘dog’ is only one of the many hyponyms of ‘domestic_animals’.
To find the root of all these, we can use the following command −
syn.root_hypernyms()
Output
[Synset( entity.n.01 )]
From the above output, we can see it has only one root.
Complete implementation example
from nltk.corpus import wordnet as wn syn = wn.synsets( dog )[0] syn.hypernyms() syn.hypernyms()[0].hyponyms() syn.root_hypernyms()
Output
[Synset( entity.n.01 )]
Lemmas in Wordnet
In pnguistics, the canonical form or morphological form of a word is called a lemma. To find a synonym as well as antonym of a word, we can also lookup lemmas in WordNet. Let us see how.
Finding Synonyms
By using the lemma() method, we can find the number of synonyms of a Synset. Let us apply this method on ‘dog’ synset −
Example
from nltk.corpus import wordnet as wn syn = wn.synsets( dog )[0] lemmas = syn.lemmas() len(lemmas)
Output
3
The above output shows ‘dog’ has three lemmas.
Getting the name of the first lemma as follows −
lemmas[0].name() Output: dog
Getting the name of the second lemma as follows −
lemmas[1].name() Output: domestic_dog
Getting the name of the third lemma as follows −
lemmas[2].name() Output: Canis_famiparis
Actually, a Synset represents a group of lemmas that all have similar meaning while a lemma represents a distinct word form.
Finding Antonyms
In WordNet, some lemmas also have antonyms. For example, the word ‘good ‘has a total of 27 synets, among them, 5 have lemmas with antonyms. Let us find the antonyms (when the word ‘good’ used as noun and when the word ‘good’ used as adjective).
Example 1
from nltk.corpus import wordnet as wn syn1 = wn.synset( good.n.02 ) antonym1 = syn1.lemmas()[0].antonyms()[0] antonym1.name()
Output
evil
antonym1.synset().definition()
Output
the quapty of being morally wrong in principle or practice
The above example shows that the word ‘good’, when used as noun, have the first antonym ‘evil’.
Example 2
from nltk.corpus import wordnet as wn syn2 = wn.synset( good.a.01 ) antonym2 = syn2.lemmas()[0].antonyms()[0] antonym2.name()
Output
bad
antonym2.synset().definition()
Output
having undesirable or negative quapties’
The above example shows that the word ‘good’, when used as adjective, have the first antonym ‘bad’.
Advertisements