- Natural Language Toolkit - Discussion
- Natural Language Toolkit - Useful Resources
- Natural Language Toolkit - Quick Guide
- Natural Language Toolkit - Text Classification
- Synonym & Antonym Replacement
- Natural Language Toolkit - Word Replacement
- Stemming & Lemmatization
- Looking up words in Wordnet
- Training Tokenizer & Filtering Stopwords
- Natural Language Toolkit - Tokenizing Text
- Natural Language Toolkit - Getting Started
- Natural Language Toolkit - Introduction
- Natural Language Toolkit - Home
自然语言工具包
- 自然语言工具箱——改造树木
- 自然语言工具箱——改造楚克
- Chunking & Information 排外
- 自然语言工具箱——包装
- 自然语言工具包 - 更多国家 Taggers
- 自然语言工具箱——将Taggers混为一谈
- 自然语言工具箱——Unigram Tagger
- 部分Speech(POS)基本原理
- Corpus Readers and Customs Corpora
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Natural Language Toolkit - Transforming Trees
以下是改造树木的两个原因:
To modify deep parse tree and
To flatten deep parse trees
Converting Tree or Subtree to Sentence
我们在这里将要讨论的第一种回报是,将树木或树木改成一句或草块。 这非常简单,让我们以以下例子看到:
Example
from nltk.corpus import treebank_chunk tree = treebank_chunk.chunked_sents()[2] .join([w for w, t in tree.leaves()])
Output
Rudolph Agnew , 55 years old and former chairman of Consopdated Gold Fields PLC , was named a nonexecutive director of this British industrial conglomerate .
Deep tree flattening
深植的树木不能用于培训一只chu子,因此我们必须在使用之前加以平整。 在以下例子中,我们将使用第3句子,即从特里克斯敦/布>英亩的深树。
Example
为此,我们正在界定一个名为deeptree_flat()的职能。 这将带上单一树木,并将归还只保存最低树木的新树木。 为了完成大部分工作,它利用了我们称之为childtree_flat()的助手职能。
from nltk.tree import Tree def childtree_flat(trees): children = [] for t in trees: if t.height() < 3: children.extend(t.pos()) epf t.height() == 3: children.append(Tree(t.label(), t.pos())) else: children.extend(flatten_childtrees([c for c in t])) return children def deeptree_flat(tree): return Tree(tree.label(), flatten_childtrees([c for c in tree]))
现在,请deeptree_flat(>)从特里克斯/伯斯>上接下三句子,深植树。 我们在名为“深树.py”的档案中节省了这些职能。
from deeptree import deeptree_flat from nltk.corpus import treebank deeptree_flat(treebank.parsed_sents()[2])
Output
Tree( S , [Tree( NP , [( Rudolph , NNP ), ( Agnew , NNP )]), ( , , , ), Tree( NP , [( 55 , CD ), ( years , NNS )]), ( old , JJ ), ( and , CC ), Tree( NP , [( former , JJ ), ( chairman , NN )]), ( of , IN ), Tree( NP , [( Consopdated , NNP ), ( Gold , NNP ), ( Fields , NNP ), ( PLC , NNP )]), ( , , , ), ( was , VBD ), ( named , VBN ), Tree( NP-SBJ , [( *-1 , -NONE- )]), Tree( NP , [( a , DT ), ( nonexecutive , JJ ), ( director , NN )]), ( of , IN ), Tree( NP , [( this , DT ), ( British , JJ ), ( industrial , JJ ), ( conglomerate , NN )]), ( . , . )])
Building Shallow tree
在前一节中,我们只保留最低层次的树木,从而 flat平了深植树。 在本节中,我们将只保留最高级别的子树,即建设浅树。 在以下例子中,我们将使用第3句子,即从特里克斯敦/布>英亩的深树。
Example
为了实现这一点,我们定义了一个叫做 tree_shallow() 的函数,将通过保留仅顶层子树标签来消除所有嵌套的子树。
from nltk.tree import Tree def tree_shallow(tree): children = [] for t in tree: if t.height() < 3: children.extend(t.pos()) else: children.append(Tree(t.label(), t.pos())) return Tree(tree.label(), children)
现在,请让我们从特里克-撒洛()上接通3、rd的句子,即深植树的句子。 我们在名为浅树的档案中节省了这些职能。 py.
from shallowtree import shallow_tree from nltk.corpus import treebank tree_shallow(treebank.parsed_sents()[2])
Output
Tree( S , [Tree( NP-SBJ-1 , [( Rudolph , NNP ), ( Agnew , NNP ), ( , , , ), ( 55 , CD ), ( years , NNS ), ( old , JJ ), ( and , CC ), ( former , JJ ), ( chairman , NN ), ( of , IN ), ( Consopdated , NNP ), ( Gold , NNP ), ( Fields , NNP ), ( PLC , NNP ), ( , , , )]), Tree( VP , [( was , VBD ), ( named , VBN ), ( *-1 , -NONE- ), ( a , DT ), ( nonexecutive , JJ ), ( director , NN ), ( of , IN ), ( this , DT ), ( British , JJ ), ( industrial , JJ ), ( conglomerate , NN )]), ( . , . )])
我们能够看到这种差异,帮助getting树高。
from nltk.corpus import treebank tree_shallow(treebank.parsed_sents()[2]).height()
Output
3
from nltk.corpus import treebank treebank.parsed_sents()[2].height()
Output
9
Tree labels conversion
在树林中,有多种多样的Tree。 未出现在chu树中的标签类型。 但是,在使用 par树来培训ker子的同时,我们要通过将一些树木标签改成更常见的标签类型来减少这种多样性。 例如,我们有两个替代的国产阶级,即国产总值和国产总值。 我们可以把这两人都变成国家警察。 让我们看到如何在以下例子中做到这一点。
Example
为了实现这一目标,我们正在界定一个名为 特里-科沃()的职能,它遵循两个论点:
Tree to convert
A label conversion mapping
这一功能将恢复一种新的树木,其所有标签都以绘图价值为基础取而代之。
from nltk.tree import Tree def tree_convert(tree, mapping): children = [] for t in tree: if isinstance(t, Tree): children.append(convert_tree_labels(t, mapping)) else: children.append(t) label = mapping.get(tree.label(), tree.label()) return Tree(label, children)
现在,请在树_convert(>)上课以3句子,即深植树的句子,从特里克斯敦 卷中起。 我们在一份名为converttree.py的档案中节省了这些职能。
from converttree import tree_convert from nltk.corpus import treebank mapping = { NP-SBJ : NP , NP-TMP : NP } convert_tree_labels(treebank.parsed_sents()[2], mapping)
Output
Tree( S , [Tree( NP-SBJ-1 , [Tree( NP , [Tree( NNP , [ Rudolph ]), Tree( NNP , [ Agnew ])]), Tree( , , [ , ]), Tree( UCP , [Tree( ADJP , [Tree( NP , [Tree( CD , [ 55 ]), Tree( NNS , [ years ])]), Tree( JJ , [ old ])]), Tree( CC , [ and ]), Tree( NP , [Tree( NP , [Tree( JJ , [ former ]), Tree( NN , [ chairman ])]), Tree( PP , [Tree( IN , [ of ]), Tree( NP , [Tree( NNP , [ Consopdated ]), Tree( NNP , [ Gold ]), Tree( NNP , [ Fields ]), Tree( NNP , [ PLC ])])])])]), Tree( , , [ , ])]), Tree( VP , [Tree( VBD , [ was ]),Tree( VP , [Tree( VBN , [ named ]), Tree( S , [Tree( NP , [Tree( -NONE- , [ *-1 ])]), Tree( NP-PRD , [Tree( NP , [Tree( DT , [ a ]), Tree( JJ , [ nonexecutive ]), Tree( NN , [ director ])]), Tree( PP , [Tree( IN , [ of ]), Tree( NP , [Tree( DT , [ this ]), Tree( JJ , [ British ]), Tree( JJ , [ industrial ]), Tree( NN , [ conglomerate ])])])])])])]), Tree( . , [ . ])])Advertisements