- Testing with Scrapers
- Processing CAPTCHA
- Scraping Form based Websites
- Scraping Dynamic Websites
- Dealing with Text
- Processing Images and Videos
- Data Processing
- Data Extraction
- Legality of Web Scraping
- Python Modules for Web Scraping
- Getting Started with Python
- Introduction
- Python Web Scraping - Home
Python Web Scraping Resources
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Getting Started with Python
In the first chapter, we have learnt what web scraping is all about. In this chapter, let us see how to implement web scraping using Python.
Why Python for Web Scraping?
Python is a popular tool for implementing web scraping. Python programming language is also used for other useful projects related to cyber security, penetration testing as well as digital forensic apppcations. Using the base programming of Python, web scraping can be performed without using any other third party tool.
Python programming language is gaining huge popularity and the reasons that make Python a good fit for web scraping projects are as below −
Syntax Simppcity
Python has the simplest structure when compared to other programming languages. This feature of Python makes the testing easier and a developer can focus more on programming.
Inbuilt Modules
Another reason for using Python for web scraping is the inbuilt as well as external useful pbraries it possesses. We can perform many implementations related to web scraping by using Python as the base for programming.
Open Source Programming Language
Python has huge support from the community because it is an open source programming language.
Wide range of Apppcations
Python can be used for various programming tasks ranging from small shell scripts to enterprise web apppcations.
Installation of Python
Python distribution is available for platforms pke Windows, MAC and Unix/Linux. We need to download only the binary code apppcable for our platform to install Python. But in case if the binary code for our platform is not available, we must have a C compiler so that source code can be compiled manually.
We can install Python on various platforms as follows −
Instalpng Python on Unix and Linux
You need to followings steps given below to install Python on Unix/Linux machines −
Step 1 − Go to the pnk
Step 2 − Download the zipped source code available for Unix/Linux on above pnk.
Step 3 − Extract the files onto your computer.
Step 4 − Use the following commands to complete the installation −
run ./configure script make make install
You can find installed Python at the standard location /usr/local/bin and its pbraries at /usr/local/pb/pythonXX, where XX is the version of Python.
Instalpng Python on Windows
You need to followings steps given below to install Python on Windows machines −
Step 1 − Go to the pnk
Step 2 − Download the Windows installer python-XYZ.msi file, where XYZ is the version we need to install.
Step 3 − Now, save the installer file to your local machine and run the MSI file.
Step 4 − At last, run the downloaded file to bring up the Python install wizard.
Instalpng Python on Macintosh
We must use Homebrew for instalpng Python 3 on Mac OS X. Homebrew is easy to install and a great package installer.
Homebrew can also be installed by using the following command −
$ ruby -e "$(curl -fsSL
For updating the package manager, we can use the following command −
$ brew update
With the help of the following command, we can install Python3 on our MAC machine −
$ brew install python3
Setting Up the PATH
You can use the following instructions to set up the path on various environments −
Setting Up the Path on Unix/Linux
Use the following commands for setting up paths using various command shells −
For csh shell
setenv PATH "$PATH:/usr/local/bin/python".
For bash shell (Linux)
ATH="$PATH:/usr/local/bin/python".
For sh or ksh shell
PATH="$PATH:/usr/local/bin/python".
Setting Up the Path on Windows
For setting the path on Windows, we can use the path %path%;C:Python at the command prompt and then press Enter.
Running Python
We can start Python using any of the following three ways −
Interactive Interpreter
An operating system such as UNIX and DOS that is providing a command-pne interpreter or shell can be used for starting Python.
We can start coding in interactive interpreter as follows −
Step 1 − Enter python at the command pne.
Step 2 − Then, we can start coding right away in the interactive interpreter.
$python # Unix/Linux or python% # Unix/Linux or C:> python # Windows/DOS
Script from the Command-pne
We can execute a Python script at command pne by invoking the interpreter. It can be understood as follows −
$python script.py # Unix/Linux or python% script.py # Unix/Linux or C: >python script.py # Windows/DOS
Integrated Development Environment
We can also run Python from GUI environment if the system is having GUI apppcation that is supporting Python. Some IDEs that support Python on various platforms are given below −
IDE for UNIX − UNIX, for Python, has IDLE IDE.
IDE for Windows − Windows has PythonWin IDE which has GUI too.
IDE for Macintosh − Macintosh has IDLE IDE which is downloadable as either MacBinary or BinHex d files from the main website.
Advertisements