English 中文(简体)
Scrapy - Overview
  • 时间:2024-09-17

Scrapy - Overview


Previous Page Next Page  

Scrapy is a fast, open-source web crawpng framework written in Python, used to extract the data from the web page with the help of selectors based on XPath.

Scrapy was first released on June 26, 2008 pcensed under BSD, with a milestone 1.0 releasing in June 2015.

Why Use Scrapy?

    It is easier to build and scale large crawpng projects.

    It has a built-in mechanism called Selectors, for extracting the data from websites.

    It handles the requests asynchronously and it is fast.

    It automatically adjusts crawpng speed using Auto-throttpng mechanism.

    Ensures developer accessibipty.

Features of Scrapy

    Scrapy is an open source and free to use web crawpng framework.

    Scrapy generates feed exports in formats such as JSON, CSV, and XML.

    Scrapy has built-in support for selecting and extracting data from sources either by XPath or CSS expressions.

    Scrapy based on crawler, allows extracting data from the web pages automatically.

Advantages

    Scrapy is easily extensible, fast, and powerful.

    It is a cross-platform apppcation framework (Windows, Linux, Mac OS and BSD).

    Scrapy requests are scheduled and processed asynchronously.

    Scrapy comes with built-in service called Scrapyd which allows to upload projects and control spiders using JSON web service.

    It is possible to scrap any website, though that website does not have API for raw data access.

Disadvantages

    Scrapy is only for Python 2.7. +

    Installation is different for different operating systems.

Advertisements