- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python Data Science - Pandas
What is Pandas?
Pandas is an open-source Python Library used for high-performance data manipulation and data analysis using its powerful data structures. Python with pandas is in use in a variety of academic and commercial domains, including Finance, Economics, Statistics, Advertising, Web Analytics, and more. Using Pandas, we can accomppsh five typical steps in the processing and analysis of data, regardless of the origin of data — load, organize, manipulate, model, and analyse the data.
Below are the some of the important features of Pandas which is used specifically for Data processing and Data analysis work.
Key Features of Pandas
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data apgnment and integrated handpng of missing data.
Reshaping and pivoting of date sets.
Label-based spcing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionapty.
Pandas deals with the following three data structures −
Series
DataFrame
These data structures are built on top of Numpy array, making them fast and efficient.
Dimension & Description
The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame.
Data Structure | Dimensions | Description |
---|---|---|
Series | 1 | 1D labeled homogeneous array, size-immutable. |
Data Frames | 2 | General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns. |
DataFrame is widely used and it is the most important data structures.
Series
Series is a one-dimensional array pke structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …
10 | 23 | 56 | 17 | 52 | 61 | 73 | 90 | 26 | 72 |
Key Points of Series
Homogeneous data
Size Immutable
Values of Data Mutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For example,
Name | Age | Gender | Rating |
---|---|---|---|
Steve | 32 | Male | 3.45 |
Lia | 28 | Female | 4.6 |
Vin | 45 | Male | 3.9 |
Katie | 38 | Female | 2.78 |
The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.
Data Type of Columns
The data types of the four columns are as follows −
Column | Type |
---|---|
Name | String |
Age | Integer |
Gender | String |
Rating | Float |
Key Points of Data Frame
Heterogeneous data
Size Mutable
Data Mutable
We will see lots of examples on using pandas pbrary of python in Data science work in the next chapters.
Advertisements