English 中文(简体)
自然语言工具箱——改造树木
  • 时间:2024-09-17

Natural Language Toolkit - Transforming Trees


Previous Page Next Page  

以下是改造树木的两个原因:

    To modify deep parse tree and

    To flatten deep parse trees

Converting Tree or Subtree to Sentence

我们在这里将要讨论的第一种回报是,将树木或树木改成一句或草块。 这非常简单,让我们以以下例子看到:

Example


from nltk.corpus import treebank_chunk
tree = treebank_chunk.chunked_sents()[2]
   .join([w for w, t in tree.leaves()])

Output


 Rudolph Agnew , 55 years old and former chairman of Consopdated Gold Fields
PLC , was named a nonexecutive director of this British industrial
conglomerate . 

Deep tree flattening

深植的树木不能用于培训一只chu子,因此我们必须在使用之前加以平整。 在以下例子中,我们将使用第3句子,即从特里克斯敦/布>英亩的深树。

Example

为此,我们正在界定一个名为deeptree_flat()的职能。 这将带上单一树木,并将归还只保存最低树木的新树木。 为了完成大部分工作,它利用了我们称之为childtree_flat()的助手职能。


from nltk.tree import Tree
def childtree_flat(trees):
   children = []
   for t in trees:
      if t.height() < 3:
         children.extend(t.pos())
      epf t.height() == 3:
         children.append(Tree(t.label(), t.pos()))
      else:
         children.extend(flatten_childtrees([c for c in t]))
   return children
def deeptree_flat(tree):
   return Tree(tree.label(), flatten_childtrees([c for c in tree]))

现在,请deeptree_flat(>)从特里克斯/伯斯>上接下三句子,深植树。 我们在名为“深树.py”的档案中节省了这些职能。


from deeptree import deeptree_flat
from nltk.corpus import treebank
deeptree_flat(treebank.parsed_sents()[2])

Output


Tree( S , [Tree( NP , [( Rudolph ,  NNP ), ( Agnew ,  NNP )]),
( , ,  , ), Tree( NP , [( 55 ,  CD ), 
( years ,  NNS )]), ( old ,  JJ ), ( and ,  CC ),
Tree( NP , [( former ,  JJ ), 
( chairman ,  NN )]), ( of ,  IN ), Tree( NP , [( Consopdated ,  NNP ), 
( Gold ,  NNP ), ( Fields ,  NNP ), ( PLC , 
 NNP )]), ( , ,  , ), ( was ,  VBD ), 
( named ,  VBN ), Tree( NP-SBJ , [( *-1 ,  -NONE- )]), 
Tree( NP , [( a ,  DT ), ( nonexecutive ,  JJ ), ( director ,  NN )]),
( of ,  IN ), Tree( NP , 
[( this ,  DT ), ( British ,  JJ ), 
( industrial ,  JJ ), ( conglomerate ,  NN )]), ( . ,  . )])

Building Shallow tree

在前一节中,我们只保留最低层次的树木,从而 flat平了深植树。 在本节中,我们将只保留最高级别的子树,即建设浅树。 在以下例子中,我们将使用第3句子,即从特里克斯敦/布>英亩的深树。

Example

为了实现这一点,我们定义了一个叫做 tree_shallow() 的函数,将通过保留仅顶层子树标签来消除所有嵌套的子树。


from nltk.tree import Tree
def tree_shallow(tree):
   children = []
   for t in tree:
      if t.height() < 3:
         children.extend(t.pos())
      else:
         children.append(Tree(t.label(), t.pos()))
   return Tree(tree.label(), children)

现在,请让我们从特里克-撒洛()上接通3、rd的句子,即深植树的句子。 我们在名为浅树的档案中节省了这些职能。 py.


from shallowtree import shallow_tree
from nltk.corpus import treebank
tree_shallow(treebank.parsed_sents()[2])

Output


Tree( S , [Tree( NP-SBJ-1 , [( Rudolph ,  NNP ), ( Agnew ,  NNP ), ( , ,  , ), 
( 55 ,  CD ), ( years ,  NNS ), ( old ,  JJ ), ( and ,  CC ), 
( former ,  JJ ), ( chairman ,  NN ), ( of ,  IN ), ( Consopdated ,  NNP ), 
( Gold ,  NNP ), ( Fields ,  NNP ), ( PLC ,  NNP ), ( , ,  , )]), 
Tree( VP , [( was ,  VBD ), ( named ,  VBN ), ( *-1 ,  -NONE- ), ( a ,  DT ), 
( nonexecutive ,  JJ ), ( director ,  NN ), ( of ,  IN ), ( this ,  DT ), 
( British ,  JJ ), ( industrial ,  JJ ), ( conglomerate ,  NN )]), ( . ,  . )])

我们能够看到这种差异,帮助getting树高。


from nltk.corpus import treebank
tree_shallow(treebank.parsed_sents()[2]).height()

Output


3


from nltk.corpus import treebank
treebank.parsed_sents()[2].height()

Output


9

Tree labels conversion

在树林中,有多种多样的Tree。 未出现在chu树中的标签类型。 但是,在使用 par树来培训ker子的同时,我们要通过将一些树木标签改成更常见的标签类型来减少这种多样性。 例如,我们有两个替代的国产阶级,即国产总值和国产总值。 我们可以把这两人都变成国家警察。 让我们看到如何在以下例子中做到这一点。

Example

为了实现这一目标,我们正在界定一个名为 特里-科沃()的职能,它遵循两个论点:

    Tree to convert

    A label conversion mapping

这一功能将恢复一种新的树木,其所有标签都以绘图价值为基础取而代之。


from nltk.tree import Tree
def tree_convert(tree, mapping):
   children = []
   for t in tree:
      if isinstance(t, Tree):
         children.append(convert_tree_labels(t, mapping))
      else:
         children.append(t)
   label = mapping.get(tree.label(), tree.label())
   return Tree(label, children)

现在,请在树_convert(>)上课以3句子,即深植树的句子,从特里克斯敦 卷中起。 我们在一份名为converttree.py的档案中节省了这些职能。


from converttree import tree_convert
from nltk.corpus import treebank
mapping = { NP-SBJ :  NP ,  NP-TMP :  NP }
convert_tree_labels(treebank.parsed_sents()[2], mapping)

Output


Tree( S , [Tree( NP-SBJ-1 , [Tree( NP , [Tree( NNP , [ Rudolph ]), 
Tree( NNP , [ Agnew ])]), Tree( , , [ , ]), 
Tree( UCP , [Tree( ADJP , [Tree( NP , [Tree( CD , [ 55 ]), 
Tree( NNS , [ years ])]), 
Tree( JJ , [ old ])]), Tree( CC , [ and ]), 
Tree( NP , [Tree( NP , [Tree( JJ , [ former ]), 
Tree( NN , [ chairman ])]), Tree( PP , [Tree( IN , [ of ]), 
Tree( NP , [Tree( NNP , [ Consopdated ]), 
Tree( NNP , [ Gold ]), Tree( NNP , [ Fields ]), 
Tree( NNP , [ PLC ])])])])]), Tree( , , [ , ])]), 
Tree( VP , [Tree( VBD , [ was ]),Tree( VP , [Tree( VBN , [ named ]), 
Tree( S , [Tree( NP , [Tree( -NONE- , [ *-1 ])]), 
Tree( NP-PRD , [Tree( NP , [Tree( DT , [ a ]), 
Tree( JJ , [ nonexecutive ]), Tree( NN , [ director ])]), 
Tree( PP , [Tree( IN , [ of ]), Tree( NP , 
[Tree( DT , [ this ]), Tree( JJ , [ British ]), Tree( JJ , [ industrial ]), 
Tree( NN , [ conglomerate ])])])])])])]), Tree( . , [ . ])])
Advertisements