- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python - Data Operations
Python handles data of various formats mainly through the two pbraries, Pandas and Numpy. We have already seen the important features of these two pbraries in the previous chapters. In this chapter we will see some basic examples from each of the pbraries on how to operate on data.
Data Operations in Numpy
The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. An instance of ndarray class can be constructed by different array creation routines described later in the tutorial. The basic ndarray is created using an array function in NumPy as follows −
numpy.array
Following are some examples on Numpy Data handpng.
Example 1
# more than one dimensions import numpy as np a = np.array([[1, 2], [3, 4]]) print a
The output is as follows −
[[1, 2] [3, 4]]
Example 2
# minimum dimensions import numpy as np a = np.array([1, 2, 3,4,5], ndmin = 2) print a
The output is as follows −
[[1, 2, 3, 4, 5]]
Example 3
# dtype parameter import numpy as np a = np.array([1, 2, 3], dtype = complex) print a
The output is as follows −
[ 1.+0.j, 2.+0.j, 3.+0.j]
Data Operations in Pandas
Pandas handles data through Series,Data Frame, and Panel. We will see some examples from each of these.
Pandas Series
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
Example
Here we create a series from a Numpy Array.
#import the pandas pbrary and apasing as pd import pandas as pd import numpy as np data = np.array([ a , b , c , d ]) s = pd.Series(data) print s
Its output is as follows −
0 a 1 b 2 c 3 d dtype: object
Pandas DataFrame
A Data frame is a two-dimensional data structure, i.e., data is apgned in a tabular fashion in rows and columns. A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
Let us now create an indexed DataFrame using arrays.
import pandas as pd data = { Name :[ Tom , Jack , Steve , Ricky ], Age :[28,34,29,42]} df = pd.DataFrame(data, index=[ rank1 , rank2 , rank3 , rank4 ]) print df
Its output is as follows −
Age Name rank1 28 Tom rank2 34 Jack rank3 29 Steve rank4 42 Ricky
Pandas Panel
A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
A Panel can be created using the following constructor −
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
In the below example we create a panel from dict of DataFrame Objects
#creating an empty panel import pandas as pd import numpy as np data = { Item1 : pd.DataFrame(np.random.randn(4, 3)), Item2 : pd.DataFrame(np.random.randn(4, 2))} p = pd.Panel(data) print p
Its output is as follows −
<class pandas.core.panel.Panel > Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis) Items axis: 0 to 1 Major_axis axis: 0 to 3 Minor_axis axis: 0 to 4Advertisements