Scrapy Basic Concepts
- Scrapy - Exceptions
- Scrapy - Settings
- Scrapy - Link Extractors
- Scrapy - Requests & Responses
- Scrapy - Feed exports
- Scrapy - Item Pipeline
- Scrapy - Shell
- Scrapy - Item Loaders
- Scrapy - Items
- Scrapy - Selectors
- Scrapy - Spiders
- Scrapy - Command Line Tools
- Scrapy - Environment
- Scrapy - Overview
Scrapy Live Project
- Scrapy - Scraped Data
- Scrapy - Following Links
- Scrapy - Using an Item
- Scrapy - Extracting Items
- Scrapy - Crawling
- Scrapy - First Spider
- Scrapy - Define an Item
- Scrapy - Create a Project
Scrapy Built In Services
- Scrapy - Web Services
- Scrapy - Telnet Console
- Scrapy - Sending an E-mail
- Scrapy - Stats Collection
- Scrapy - Logging
Scrapy Useful Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Scrapy - Command Line Tools
Description
The Scrapy command pne tool is used for controlpng Scrapy, which is often referred to as Scrapy tool . It includes the commands for various objects with a group of arguments and options.
Configuration Settings
Scrapy will find configuration settings in the scrapy.cfg file. Following are a few locations −
C:scrapy(project folder)scrapy.cfg in the system
~/.config/scrapy.cfg ($XDG_CONFIG_HOME) and ~/.scrapy.cfg ($HOME) for global settings
You can find the scrapy.cfg inside the root of the project.
Scrapy can also be configured using the following environment variables −
SCRAPY_SETTINGS_MODULE
SCRAPY_PROJECT
SCRAPY_PYTHON_SHELL
Default Structure Scrapy Project
The following structure shows the default file structure of the Scrapy project.
scrapy.cfg - Deploy the configuration file project_name/ - Name of the project _init_.py items.py - It is project s items file pipepnes.py - It is project s pipepnes file settings.py - It is project s settings file spiders - It is the spiders directory _init_.py spider_name.py . . .
The scrapy.cfg file is a project root directory, which includes the project name with the project settings. For instance −
[settings] default = [name of the project].settings [deploy] #url = http://localhost:6800/ project = [name of the project]
Using Scrapy Tool
Scrapy tool provides some usage and available commands as follows −
Scrapy X.Y - no active project Usage: scrapy [options] [arguments] Available commands: crawl It puts spider (handle the URL) to work for crawpng data fetch It fetches the response from the given URL
Creating a Project
You can use the following command to create the project in Scrapy −
scrapy startproject project_name
This will create the project called project_name directory. Next, go to the newly created project, using the following command −
cd project_name
Controlpng Projects
You can control the project and manage them using the Scrapy tool and also create the new spider, using the following command −
scrapy genspider mydomain mydomain.com
The commands such as crawl, etc. must be used inside the Scrapy project. You will come to know which commands must run inside the Scrapy project in the coming section.
Scrapy contains some built-in commands, which can be used for your project. To see the pst of available commands, use the following command −
scrapy -h
When you run the following command, Scrapy will display the pst of available commands as psted −
fetch − It fetches the URL using Scrapy downloader.
runspider − It is used to run self-contained spider without creating a project.
settings − It specifies the project setting value.
shell − It is an interactive scraping module for the given URL.
startproject − It creates a new Scrapy project.
version − It displays the Scrapy version.
view − It fetches the URL using Scrapy downloader and show the contents in a browser.
You can have some project related commands as psted −
crawl − It is used to crawl data using the spider.
check − It checks the items returned by the crawled command.
pst − It displays the pst of available spiders present in the project.
edit − You can edit the spiders by using the editor.
parse − It parses the given URL with the spider.
bench − It is used to run quick benchmark test (Benchmark tells how many number of pages can be crawled per minute by Scrapy).
Custom Project Commands
You can build a custom project command with COMMANDS_MODULE setting in Scrapy project. It includes a default empty string in the setting. You can add the following custom command −
COMMANDS_MODULE = mycmd.commands
Scrapy commands can be added using the scrapy.commands section in the setup.py file shown as follows −
from setuptools import setup, find_packages setup(name = scrapy-module_demo , entry_points = { scrapy.commands : [ cmd_demo = my_module.commands:CmdDemo , ], }, )
The above code adds cmd_demo command in the setup.py file.
Advertisements