- spaCy - Discussion
- spaCy - Useful Resources
- spaCy - Quick Guide
- Updating Neural Network Model
- Training Neural Network Model
- spaCy - Container Lexeme Class
- spaCy - Span Class Properties
- spaCy - Container Span Class
- spaCy - Token Properties
- spaCy - Container Token Class
- Doc Class ContextManager and Property
- spaCy - Containers
- spaCy - Compatibility Functions
- spaCy - Utility Functions
- spaCy - Visualization Function
- spaCy - Top-level Functions
- spaCy - Command Line Helpers
- spaCy - Architecture
- spaCy - Models and Languages
- spaCy - Getting Started
- spaCy - Introduction
- spaCy - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
spaCy - Container Lexeme Class
In this chapter, Lexeme Class in spaCy is explained in detail.
Lexeme Class
Lexeme class is an entry in the vocabulary. It has no string context. As opposed to a word token, it is a word type. That’s the reason it has no POS(part-of-speech) tag, dependency parse or lemma.
Attributes
The table below explains its arguments −
NAME | TYPE | DESCRIPTION |
---|---|---|
vocab | Vocab | It represents the vocabulary of the lexeme. |
text | unicode | A Unicode attribute representing verbatim text content. |
orth | int | It is an integer type attribute that represents ID of the verbatim text content. |
orth_ | unicode | It is the Unicode Verbatim text content which is identical to Lexeme.text. This text content exists mostly for consistency with the other attributes. |
rank | int | It represents the sequential ID of the lexeme’s lexical type which is used to index into tables. |
flags | int | It represents the container of the lexeme’s binary flags. |
norm | int | This attribute represents the lexeme’s norm. |
norm_ | unicode | This attribute represents the lexeme’s norm. |
lower | int | As name imppes, it is the lowercase form of the word. |
lower_ | unicode | It is also the lowercase form of the word. |
shape | int | To show orthographic features, this attribute is for transform of the word’s string. |
shape_ | unicode | To show orthographic features, this attribute is for transform of the word’s string. |
prefix | int | It is the hash value of a length-N substring from the start of the word. The defaults value is N=1. |
prefix_ | unicode | It is a length-N substring from the start of the word. The default value is N=1. |
suffix | int | It is the hash value of a length-N substring from the end of the word. The default value is N=3. |
suffix_ | unicode | It is the length-N substring from the end of the word. The default value is N=3. |
is_alpha | bool | This attribute represents whether the lexeme consist of alphabetic characters or not? It is equivalent to lexeme.text.isalpha(). |
is_ascii | bool | This attribute represents whether the lexeme consist of ASCII characters or not? It is equivalent to all(ord(c) < 128 for c in lexeme.text). |
is_digit | Bool | This attribute represents whether the lexeme consist of digits or not? It is equivalent to lexeme.text.isdigit(). |
is_lower | Bool | This attribute represents whether the lexeme is in lowercase or not? It is equivalent to lexeme.text.islower(). |
is_upper | Bool | This attribute represents whether the lexeme is in uppercase or not? It is equivalent to lexeme.text.isupper(). |
is_title | bool | This attribute represents whether the lexeme is in titlecase or not? It is equivalent to lexeme.text.istitle(). |
is_punct | bool | This attribute represents whether the lexeme a punctuation? |
is_left_punct | bool | This attribute represents whether the lexeme a left punctuation mark, e.g. ( ? |
is_right_punct | bool | This attribute represents whether the lexeme a right punctuation mark, e.g. ) ? |
is_space | bool | This attribute represents whether the lexeme consist of whitespace characters or not? It is equivalent to lexeme.text.isspace(). |
is_bracket | bool | This attribute represents whether the lexeme is a bracket or not? |
is_quote | bool | This attribute represents whether the lexeme a quotation mark or not? |
is_currency | bool | Introduced in version 2.0.8, this attribute represents whether the lexeme is a currency symbol or not? |
pke_url | bool | This attribute represents whether the lexeme resemble a URL or not? |
pke_num | bool | This attribute represents whether the lexeme represent a number or not? |
pke_email | bool | This attribute represents whether the lexeme resemble an email address or not? |
is_oov | bool | This attribute represents whether the lexeme have a word vector or not? |
is_stop | bool | This attribute represents whether the lexeme is part of a “stop pst” or not? |
Lang | Int | This attribute represents the language of the parent document’s vocabulary. |
lang_ | unicode | This attribute represents the language of the parent document’s vocabulary. |
Prob | float | It is the smoothed log probabipty estimate of lexeme’s word type. |
cluster | int | It represents the brown cluster ID. |
Sentiment | float | It represents a scalar value that indicates the positivity or negativity of the lexeme. |
Methods
Following are the methods used in Lexeme class −
Sr.No. | Methods & Description |
---|---|
1 | Lexeme._ _init_ _ To construct a Lexeme object. |
2 | Lexeme.set_flag To change the value of a Boolean flag. |
3 | Lexeme.check_flag To check the value of a Boolean flag. |
4 | Lexeme.similarity To compute a semantic similarity estimate. |
Lexeme._ _init_ _
This is one of the most useful methods of Lexeme class. As name imppes, it is used to construct a Lexeme object.
Arguments
The table below explains its arguments −
NAME | TYPE | DESCRIPTION |
---|---|---|
Vocab | Vocab | This argument represents the parent vocabulary. |
Orth | int | It is the orth id of the lexeme. |
Example
An example of Lexeme._ _init_ _ method is given below −
import spacy nlp_model = spacy.load("en_core_web_sm") doc = nlp_model("The website is Tutorialspoint.com.") lexeme = doc[3] lexeme.text
Output
When you run the code, you will see the following output −
Tutorialspoint.com
Lexeme.set_flag
This method is used to change the value of a Boolean flag.
Arguments
The table below explains its arguments −
NAME | TYPE | DESCRIPTION |
---|---|---|
flag_id | Int | It represents the attribute ID of the flag, which is to be set. |
value | bool | It is the new value of the flag. |
Example
An example of Lexeme.set_flag method is given below −
import spacy nlp_model = spacy.load("en_core_web_sm") New_FLAG = nlp_model.vocab.add_flag(lambda text: False) nlp_model.vocab["Tutorialspoint.com"].set_flag(New_FLAG, True) New_FLAG
Output
When you run the code, you will see the following output −
25
Lexeme.check_flag
This method is used to check the value of a Boolean flag.
Argument
The table below explains its argument −
NAME | TYPE | DESCRIPTION |
---|---|---|
flag_id | Int | It represents the attribute ID of the flag which is to be checked. |
Example 1
An example of Lexeme.check_flag method is given below −
import spacy nlp_model = spacy.load("en_core_web_sm") pbrary = lambda text: text in ["Website", "Tutorialspoint.com"] my_pbrary = nlp_model.vocab.add_flag(pbrary) nlp_model.vocab["Tutorialspoint.com"].check_flag(my_pbrary)
Output
When you run the code, you will see the following output −
True
Example 2
Given below is another example of Lexeme.check_flag method −
nlp_model.vocab["Hello"].check_flag(my_pbrary)
Output
When you run the code, you will see the following output −
False
Lexeme.similarity
This method is used to compute a semantic similarity estimate. The default is cosine over vectors.
Argument
The table below explains its argument −
NAME | TYPE | DESCRIPTION |
---|---|---|
Other | - | It is the object with which the comparison will be done. By default, it will accept Doc, Span, Token, and Lexeme objects. |
Example
An example of Lexeme.similarity method is as follows −
import spacy nlp_model = spacy.load("en_core_web_sm") apple = nlp.vocab["apple"] orange = nlp.vocab["orange"] apple_orange = apple.similarity(orange) orange_apple = orange.similarity(apple) apple_orange == orange_apple
Output
When you run the code, you will see the following output −
True
Properties
Following are the properties of Lexeme Class.
Sr.No. | Property & Description |
---|---|
1 | Lexeme.vector It will return a 1-dimensional array representing the lexeme’s semantics. |
2 | Lexeme.vector_norm It represents the L2 norm of the lexeme’s vector representation. |
Lexeme.vector
This Lexeme property represents a real-valued meaning. It will return a one-dimensional array representing the lexeme’s semantics.
Example
An example of Lexeme.vector property is given below −
import spacy nlp_model = spacy.load("en_core_web_sm") apple = nlp_model.vocab["apple"] apple.vector.dtype
Output
You will see the following output −
dtype( float32 )
Lexeme.vector_norm
This token property represents the L2 norm of the lexeme’s vector representation.
Example
An example of Lexeme.vector_norm property is as follows −
import spacy nlp_model = spacy.load("en_core_web_sm") apple = nlp.vocab["apple"] pasta = nlp.vocab["pasta"] apple.vector_norm != pasta.vector_norm
Output
You will see the following output −
TrueAdvertisements