- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python - Measuring Central Tendency
Mathematically central tendency means measuring the center or distribution of location of values of a data set. It gives an idea of the average value of the data in the data set and also an indication of how widely the values are spread in the data set. That in turn helps in evaluating the chances of a new input fitting into the existing data set and hence probabipty of success.
There are three main measures of central tendency which can be calculated using the methods in pandas python pbrary.
Mean - It is the Average value of the data which is a spanision of sum of the values with the number of values.
Median - It is the middle value in distribution when the values are arranged in ascending or descending order.
Mode - It is the most commonly occurring value in a distribution.
Calculating Mean and Median
The pandas functions can be directly used to calculate these values.
import pandas as pd #Create a Dictionary of series d = { Name :pd.Series([ Tom , James , Ricky , Vin , Steve , Smith , Jack , Lee , Chanchal , Gasper , Naviya , Andres ]), Age :pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), Rating :pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) print "Mean Values in the Distribution" print df.mean() print "*******************************" print "Median Values in the Distribution" print df.median()
Its output is as follows −
Mean Values in the Distribution Age 31.833333 Rating 3.743333 dtype: float64 ******************************* Median Values in the Distribution Age 29.50 Rating 3.79 dtype: float64
Calculating Mode
Mode may or may not be available in a distribution depending on whether the data is continous or whether there are values which has maximum frquency. We take a simple distribution below to find out the mode. Here we have a value which has maximum frequency in the distribution.
import pandas as pd #Create a Dictionary of series d = { Name :pd.Series([ Tom , James , Ricky , Vin , Steve , Smith , Jack , Lee , Chanchal , Gasper , Naviya , Andres ]), Age :pd.Series([25,26,25,23,30,25,23,34,40,30,25,46])} #Create a DataFrame df = pd.DataFrame(d) print df.mode()
Its output is as follows −
Age Name 0 25.0 Andres 1 NaN Chanchal 2 NaN Gasper 3 NaN Jack 4 NaN James 5 NaN Lee 6 NaN Naviya 7 NaN Ricky 8 NaN Smith 9 NaN Steve 10 NaN Tom 11 NaN VinAdvertisements