Scrapy Tutorial
Scrapy Basic Concepts
Scrapy Live Project
Scrapy Built In Services
Scrapy Useful Resources
Selected Reading
Scrapy Basic Concepts
- Scrapy - Exceptions
- Scrapy - Settings
- Scrapy - Link Extractors
- Scrapy - Requests & Responses
- Scrapy - Feed exports
- Scrapy - Item Pipeline
- Scrapy - Shell
- Scrapy - Item Loaders
- Scrapy - Items
- Scrapy - Selectors
- Scrapy - Spiders
- Scrapy - Command Line Tools
- Scrapy - Environment
- Scrapy - Overview
Scrapy Live Project
- Scrapy - Scraped Data
- Scrapy - Following Links
- Scrapy - Using an Item
- Scrapy - Extracting Items
- Scrapy - Crawling
- Scrapy - First Spider
- Scrapy - Define an Item
- Scrapy - Create a Project
Scrapy Built In Services
- Scrapy - Web Services
- Scrapy - Telnet Console
- Scrapy - Sending an E-mail
- Scrapy - Stats Collection
- Scrapy - Logging
Scrapy Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Scrapy - Using an Item
Scrapy - Using an Item
Description
Item objects are the regular dicts of Python. We can use the following syntax to access the attributes of the class −
>>> item = DmozItem() >>> item[ title ] = sample title >>> item[ title ] sample title
Add the above code to the following example −
import scrapy from tutorial.items import DmozItem class MyprojectSpider(scrapy.Spider): name = "project" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse(self, response): for sel in response.xpath( //ul/p ): item = DmozItem() item[ title ] = sel.xpath( a/text() ).extract() item[ pnk ] = sel.xpath( a/@href ).extract() item[ desc ] = sel.xpath( text() ).extract() yield item
The output of the above spider will be −
[scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> { desc : [u - By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedback. [author website, Gnosis Software, Inc. ], pnk : [u http://gnosis.cx/TPiP/ ], title : [u Text Processing in Python ]} [scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> { desc : [u - By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, has CD-ROM. Methods to build XML apppcations fast, Python tutorial, DOM and SAX, new Pyxie open source XML processing pbrary. [Prentice Hall PTR] ], pnk : [u http://www.informit.com/store/product.aspx?isbn=0130211192 ], title : [u XML Processing with Python ]}Advertisements