English 中文(简体)
spaCy - Container Lexeme Class
  • 时间:2024-12-27

spaCy - Container Lexeme Class


Previous Page Next Page  

In this chapter, Lexeme Class in spaCy is explained in detail.

Lexeme Class

Lexeme class is an entry in the vocabulary. It has no string context. As opposed to a word token, it is a word type. That’s the reason it has no POS(part-of-speech) tag, dependency parse or lemma.

Attributes

The table below explains its arguments −

NAME TYPE DESCRIPTION
vocab Vocab It represents the vocabulary of the lexeme.
text unicode A Unicode attribute representing verbatim text content.
orth int It is an integer type attribute that represents ID of the verbatim text content.
orth_ unicode It is the Unicode Verbatim text content which is identical to Lexeme.text. This text content exists mostly for consistency with the other attributes.
rank int It represents the sequential ID of the lexeme’s lexical type which is used to index into tables.
flags int It represents the container of the lexeme’s binary flags.
norm int This attribute represents the lexeme’s norm.
norm_ unicode This attribute represents the lexeme’s norm.
lower int As name imppes, it is the lowercase form of the word.
lower_ unicode It is also the lowercase form of the word.
shape int To show orthographic features, this attribute is for transform of the word’s string.
shape_ unicode To show orthographic features, this attribute is for transform of the word’s string.
prefix int It is the hash value of a length-N substring from the start of the word. The defaults value is N=1.
prefix_ unicode It is a length-N substring from the start of the word. The default value is N=1.
suffix int It is the hash value of a length-N substring from the end of the word. The default value is N=3.
suffix_ unicode It is the length-N substring from the end of the word. The default value is N=3.
is_alpha bool This attribute represents whether the lexeme consist of alphabetic characters or not? It is equivalent to lexeme.text.isalpha().
is_ascii bool This attribute represents whether the lexeme consist of ASCII characters or not? It is equivalent to all(ord(c) < 128 for c in lexeme.text).
is_digit Bool This attribute represents whether the lexeme consist of digits or not? It is equivalent to lexeme.text.isdigit().
is_lower Bool This attribute represents whether the lexeme is in lowercase or not? It is equivalent to lexeme.text.islower().
is_upper Bool This attribute represents whether the lexeme is in uppercase or not? It is equivalent to lexeme.text.isupper().
is_title bool This attribute represents whether the lexeme is in titlecase or not? It is equivalent to lexeme.text.istitle().
is_punct bool This attribute represents whether the lexeme a punctuation?
is_left_punct bool This attribute represents whether the lexeme a left punctuation mark, e.g. ( ?
is_right_punct bool This attribute represents whether the lexeme a right punctuation mark, e.g. ) ?
is_space bool This attribute represents whether the lexeme consist of whitespace characters or not? It is equivalent to lexeme.text.isspace().
is_bracket bool This attribute represents whether the lexeme is a bracket or not?
is_quote bool This attribute represents whether the lexeme a quotation mark or not?
is_currency bool Introduced in version 2.0.8, this attribute represents whether the lexeme is a currency symbol or not?
pke_url bool This attribute represents whether the lexeme resemble a URL or not?
pke_num bool This attribute represents whether the lexeme represent a number or not?
pke_email bool This attribute represents whether the lexeme resemble an email address or not?
is_oov bool This attribute represents whether the lexeme have a word vector or not?
is_stop bool This attribute represents whether the lexeme is part of a “stop pst” or not?
Lang Int This attribute represents the language of the parent document’s vocabulary.
lang_ unicode This attribute represents the language of the parent document’s vocabulary.
Prob float It is the smoothed log probabipty estimate of lexeme’s word type.
cluster int It represents the brown cluster ID.
Sentiment float It represents a scalar value that indicates the positivity or negativity of the lexeme.

Methods

Following are the methods used in Lexeme class −

Sr.No. Methods & Description
1

Lexeme._ _init_ _

To construct a Lexeme object.

2

Lexeme.set_flag

To change the value of a Boolean flag.

3

Lexeme.check_flag

To check the value of a Boolean flag.

4

Lexeme.similarity

To compute a semantic similarity estimate.

Lexeme._ _init_ _

This is one of the most useful methods of Lexeme class. As name imppes, it is used to construct a Lexeme object.

Arguments

The table below explains its arguments −

NAME TYPE DESCRIPTION
Vocab Vocab This argument represents the parent vocabulary.
Orth int It is the orth id of the lexeme.

Example

An example of Lexeme._ _init_ _ method is given below −


import spacy
nlp_model = spacy.load("en_core_web_sm")
doc = nlp_model("The website is Tutorialspoint.com.")
lexeme = doc[3]
lexeme.text

Output

When you run the code, you will see the following output −


 Tutorialspoint.com 

Lexeme.set_flag

This method is used to change the value of a Boolean flag.

Arguments

The table below explains its arguments −

NAME TYPE DESCRIPTION
flag_id Int It represents the attribute ID of the flag, which is to be set.
value bool It is the new value of the flag.

Example

An example of Lexeme.set_flag method is given below −


import spacy
nlp_model = spacy.load("en_core_web_sm")
New_FLAG = nlp_model.vocab.add_flag(lambda text: False)
nlp_model.vocab["Tutorialspoint.com"].set_flag(New_FLAG, True)
New_FLAG

Output

When you run the code, you will see the following output −


25

Lexeme.check_flag

This method is used to check the value of a Boolean flag.

Argument

The table below explains its argument −

NAME TYPE DESCRIPTION
flag_id Int It represents the attribute ID of the flag which is to be checked.

Example 1

An example of Lexeme.check_flag method is given below −


import spacy
nlp_model = spacy.load("en_core_web_sm")
pbrary = lambda text: text in ["Website", "Tutorialspoint.com"]
my_pbrary = nlp_model.vocab.add_flag(pbrary)
nlp_model.vocab["Tutorialspoint.com"].check_flag(my_pbrary)

Output

When you run the code, you will see the following output −


True

Example 2

Given below is another example of Lexeme.check_flag method −


nlp_model.vocab["Hello"].check_flag(my_pbrary)

Output

When you run the code, you will see the following output −


False

Lexeme.similarity

This method is used to compute a semantic similarity estimate. The default is cosine over vectors.

Argument

The table below explains its argument −

NAME TYPE DESCRIPTION
Other - It is the object with which the comparison will be done. By default, it will accept Doc, Span, Token, and Lexeme objects.

Example

An example of Lexeme.similarity method is as follows −


import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp.vocab["apple"]
orange = nlp.vocab["orange"]
apple_orange = apple.similarity(orange)
orange_apple = orange.similarity(apple)
apple_orange == orange_apple

Output

When you run the code, you will see the following output −


True

Properties

Following are the properties of Lexeme Class.

Sr.No. Property & Description
1

Lexeme.vector

It will return a 1-dimensional array representing the lexeme’s semantics.

2

Lexeme.vector_norm

It represents the L2 norm of the lexeme’s vector representation.

Lexeme.vector

This Lexeme property represents a real-valued meaning. It will return a one-dimensional array representing the lexeme’s semantics.

Example

An example of Lexeme.vector property is given below −


import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp_model.vocab["apple"]
apple.vector.dtype

Output

You will see the following output −


dtype( float32 )

Lexeme.vector_norm

This token property represents the L2 norm of the lexeme’s vector representation.

Example

An example of Lexeme.vector_norm property is as follows −


import spacy
nlp_model = spacy.load("en_core_web_sm")
apple = nlp.vocab["apple"]
pasta = nlp.vocab["pasta"]
apple.vector_norm != pasta.vector_norm

Output

You will see the following output −


True
Advertisements