- Python Data Science - Matplotlib
- Python Data Science - SciPy
- Python Data Science - Numpy
- Python Data Science - Pandas
- Python Data Science - Environment Setup
- Python Data Science - Getting Started
- Python Data Science - Home
Python Data Processing
- Python Stemming and Lemmatization
- Python word tokenization
- Python Processing Unstructured Data
- Python Reading HTML Pages
- Python Data Aggregation
- Python Data Wrangling
- Python Date and Time
- Python NoSQL Databases
- Python Relational databases
- Python Processing XLS Data
- Python Processing JSON Data
- Python Processing CSV Data
- Python Data cleansing
- Python Data Operations
Python Data Visualization
- Python Graph Data
- Python Geographical Data
- Python Time Series
- Python 3D Charts
- Python Bubble Charts
- Python Scatter Plots
- Python Heat Maps
- Python Box Plots
- Python Chart Styling
- Python Chart Properties
Statistical Data Analysis
- Python Linear Regression
- Python Chi-square Test
- Python Correlation
- Python P-Value
- Python Bernoulli Distribution
- Python Poisson Distribution
- Python Binomial Distribution
- Python Normal Distribution
- Python Measuring Variance
- Python Measuring Central Tendency
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Python - Measuring Variance
In statistics, variance is a measure of how far a value in a data set pes from the mean value. In other words, it indicates how dispersed the values are. It is measured by using standard deviation. The other method commonly used is skewness.
Both of these are calculated by using functions available in pandas pbrary.
Measuring Standard Deviation
Standard deviation is square root of variance. variance is the average of squared difference of values in a data set from the mean value. In python we calculate this value by using the function std() from pandas pbrary.
import pandas as pd #Create a Dictionary of series d = { Name :pd.Series([ Tom , James , Ricky , Vin , Steve , Smith , Jack , Lee , Chanchal , Gasper , Naviya , Andres ]), Age :pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]), Rating :pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) # Calculate the standard deviation print df.std()
Its output is as follows −
Age 7.265527 Rating 0.661628 dtype: float64
Measuring Skewness
It used to determine whether the data is symmetric or skewed. If the index is between -1 and 1, then the distribution is symmetric. If the index is no more than -1 then it is skewed to the left and if it is at least 1, then it is skewed to the right
import pandas as pd #Create a Dictionary of series d = { Name :pd.Series([ Tom , James , Ricky , Vin , Steve , Smith , Jack , Lee , Chanchal , Gasper , Naviya , Andres ]), Age :pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]), Rating :pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFrame df = pd.DataFrame(d) print df.skew()
Its output is as follows −
Age 1.443490 Rating -0.153629 dtype: float64
So the distribution of age rating is symmetric while the distribution of age is skewed to the right.
Advertisements