- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python - Data Wrangpng
Data wrangpng involves processing the data in various formats pke - merging, grouping, concatenating etc. for the purpose of analysing or getting them ready to be used with another set of data. Python has built-in features to apply these wrangpng methods to various data sets to achieve the analytical goal. In this chapter we will look at few examples describing these methods.
Merging Data
The Pandas pbrary in python provides a single function, merge, as the entry point for all standard database join operations between DataFrame objects −
pd.merge(left, right, how= inner , on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
Let us now create two different DataFrames and perform the merging operations on it.
# import the pandas pbrary import pandas as pd left = pd.DataFrame({ id :[1,2,3,4,5], Name : [ Alex , Amy , Allen , Apce , Ayoung ], subject_id :[ sub1 , sub2 , sub4 , sub6 , sub5 ]}) right = pd.DataFrame( { id :[1,2,3,4,5], Name : [ Billy , Brian , Bran , Bryce , Betty ], subject_id :[ sub2 , sub4 , sub3 , sub6 , sub5 ]}) print left print right
Its output is as follows −
Name id subject_id 0 Alex 1 sub1 1 Amy 2 sub2 2 Allen 3 sub4 3 Apce 4 sub6 4 Ayoung 5 sub5 Name id subject_id 0 Billy 1 sub2 1 Brian 2 sub4 2 Bran 3 sub3 3 Bryce 4 sub6 4 Betty 5 sub5
Grouping Data
Grouping data sets is a frequent need in data analysis where we need the result in terms of various groups present in the data set. Panadas has in-built methods which can roll the data into various groups.
In the below example we group the data by year and then get the result for a specific year.
# import the pandas pbrary import pandas as pd ipl_data = { Team : [ Riders , Riders , Devils , Devils , Kings , kings , Kings , Kings , Riders , Royals , Royals , Riders ], Rank : [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2], Year : [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017], Points :[876,789,863,673,741,812,756,788,694,701,804,690]} df = pd.DataFrame(ipl_data) grouped = df.groupby( Year ) print grouped.get_group(2014)
Its output is as follows −
Points Rank Team Year 0 876 1 Riders 2014 2 863 2 Devils 2014 4 741 3 Kings 2014 9 701 4 Royals 2014
Concatenating Data
Pandas provides various facipties for easily combining together Series, DataFrame, and Panel objects. In the below example the concat function performs concatenation operations along an axis. Let us create different objects and do concatenation.
import pandas as pd one = pd.DataFrame({ Name : [ Alex , Amy , Allen , Apce , Ayoung ], subject_id :[ sub1 , sub2 , sub4 , sub6 , sub5 ], Marks_scored :[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ Name : [ Billy , Brian , Bran , Bryce , Betty ], subject_id :[ sub2 , sub4 , sub3 , sub6 , sub5 ], Marks_scored :[89,80,79,97,88]}, index=[1,2,3,4,5]) print pd.concat([one,two])
Its output is as follows −
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Apce sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5Advertisements