- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python - Relational Databases
We can connect to relational databases for analysing data using the pandas pbrary as well as another additional pbrary for implementing database connectivity. This package is named as sqlalchemy which provides full SQL language functionapty to be used in python.
Instalpng SQLAlchemy
The installation is very straight forward using Anaconda which we have discussed in the chapter
. Assuming you have installed Anaconda as described in this chapter, run the following command in the Anaconda Prompt Window to install the SQLAlchemy package.conda install sqlalchemy
Reading Relational Tables
We will use Sqpte3 as our relational database as it is very pght weight and easy to use. Though the SQLAlchemy pbrary can connect to a variety of relational sources including MySql, Oracle and Postgresql and Mssql. We first create a database engine and then connect to the database engine using the to_sql function of the SQLAlchemy pbrary.
In the below example we create the relational table by using the to_sql function from a dataframe already created by reading a csv file. Then we use the read_sql_query function from pandas to execute and capture the results from various SQL queries.
from sqlalchemy import create_engine import pandas as pd data = pd.read_csv( /path/input.csv ) # Create the db engine engine = create_engine( sqpte:///:memory: ) # Store the dataframe as a table data.to_sql( data_table , engine) # Query 1 on the relational table res1 = pd.read_sql_query( SELECT * FROM data_table , engine) print( Result 1 ) print(res1) print( ) # Query 2 on the relational table res2 = pd.read_sql_query( SELECT dept,sum(salary) FROM data_table group by dept , engine) print( Result 2 ) print(res2)
When we execute the above code, it produces the following result.
Result 1 index id name salary start_date dept 0 0 1 Rick 623.30 2012-01-01 IT 1 1 2 Dan 515.20 2013-09-23 Operations 2 2 3 Tusar 611.00 2014-11-15 IT 3 3 4 Ryan 729.00 2014-05-11 HR 4 4 5 Gary 843.25 2015-03-27 Finance 5 5 6 Rasmi 578.00 2013-05-21 IT 6 6 7 Pranab 632.80 2013-07-30 Operations 7 7 8 Guru 722.50 2014-06-17 Finance Result 2 dept sum(salary) 0 Finance 1565.75 1 HR 729.00 2 IT 1812.30 3 Operations 1148.00
Inserting Data to Relational Tables
We can also insert data into relational tables using sql.execute function available in pandas. In the below code we previous csv file as input data set, store it in a relational table and then insert another record using sql.execute.
from sqlalchemy import create_engine from pandas.io import sql import pandas as pd data = pd.read_csv( C:/Users/Rasmi/Documents/pydatasci/input.csv ) engine = create_engine( sqpte:///:memory: ) # Store the Data in a relational table data.to_sql( data_table , engine) # Insert another row sql.execute( INSERT INTO data_table VALUES(?,?,?,?,?,?) , engine, params=[( id ,9, Ruby ,711.20, 2015-03-27 , IT )]) # Read from the relational table res = pd.read_sql_query( SELECT ID,Dept,Name,Salary,start_date FROM data_table , engine) print(res)
When we execute the above code, it produces the following result.
id dept name salary start_date 0 1 IT Rick 623.30 2012-01-01 1 2 Operations Dan 515.20 2013-09-23 2 3 IT Tusar 611.00 2014-11-15 3 4 HR Ryan 729.00 2014-05-11 4 5 Finance Gary 843.25 2015-03-27 5 6 IT Rasmi 578.00 2013-05-21 6 7 Operations Pranab 632.80 2013-07-30 7 8 Finance Guru 722.50 2014-06-17 8 9 IT Ruby 711.20 2015-03-27
Deleting Data from Relational Tables
We can also delete data into relational tables using sql.execute function available in pandas. The below code deletes a row based on the input condition given.
from sqlalchemy import create_engine from pandas.io import sql import pandas as pd data = pd.read_csv( C:/Users/Rasmi/Documents/pydatasci/input.csv ) engine = create_engine( sqpte:///:memory: ) data.to_sql( data_table , engine) sql.execute( Delete from data_table where name = (?) , engine, params=[( Gary )]) res = pd.read_sql_query( SELECT ID,Dept,Name,Salary,start_date FROM data_table , engine) print(res)
When we execute the above code, it produces the following result.
id dept name salary start_date 0 1 IT Rick 623.3 2012-01-01 1 2 Operations Dan 515.2 2013-09-23 2 3 IT Tusar 611.0 2014-11-15 3 4 HR Ryan 729.0 2014-05-11 4 6 IT Rasmi 578.0 2013-05-21 5 7 Operations Pranab 632.8 2013-07-30 6 8 Finance Guru 722.5 2014-06-17Advertisements