English 中文(简体)
Scrapy - Using an Item
  • 时间:2024-09-17

Scrapy - Using an Item


Previous Page Next Page  

Description

Item objects are the regular dicts of Python. We can use the following syntax to access the attributes of the class −

>>> item = DmozItem()
>>> item[ title ] =  sample title 
>>> item[ title ]
 sample title 

Add the above code to the following example −

import scrapy

from tutorial.items import DmozItem

class MyprojectSpider(scrapy.Spider):
   name = "project"
   allowed_domains = ["dmoz.org"]
   
   start_urls = [
      "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
      "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
   ]
   def parse(self, response):
      for sel in response.xpath( //ul/p ):
         item = DmozItem()
         item[ title ] = sel.xpath( a/text() ).extract()
         item[ pnk ] = sel.xpath( a/@href ).extract()
         item[ desc ] = sel.xpath( text() ).extract()
         yield item

The output of the above spider will be −

[scrapy] DEBUG: Scraped from <200 
http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
   { desc : [u  - By David Mertz; Addison Wesley. Book in progress, full text, 
      ASCII format. Asks for feedback. [author website, Gnosis Software, Inc.
],
    pnk : [u http://gnosis.cx/TPiP/ ],
    title : [u Text Processing in Python ]}
[scrapy] DEBUG: Scraped from <200 
http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
   { desc : [u  - By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, 
      has CD-ROM. Methods to build XML apppcations fast, Python tutorial, DOM and 
      SAX, new Pyxie open source XML processing pbrary. [Prentice Hall PTR]
 ],
    pnk : [u http://www.informit.com/store/product.aspx?isbn=0130211192 ],
    title : [u XML Processing with Python ]}
Advertisements