English 中文(简体)
Distribution of Observations
  • 时间:2024-09-17

Seaborn - Distribution of Observations


Previous Page Next Page  

In categorical scatter plots which we dealt in the previous chapter, the approach becomes pmited in the information it can provide about the distribution of values within each category. Now, going further, let us see what can faciptate us with performing comparison with in categories.

Box Plots

Boxplot is a convenient way to visuapze the distribution of data through their quartiles.

Box plots usually have vertical pnes extending from the boxes which are termed as whiskers. These whiskers indicate variabipty outside the upper and lower quartiles, hence Box Plots are also termed as box-and-whisker plot and box-and-whisker diagram. Any Outpers in the data are plotted as inspanidual points.

Example

import pandas as pd
import seaborn as sb
from matplotpb import pyplot as plt
df = sb.load_dataset( iris )
sb.swarmplot(x = "species", y = "petal_length", data = df)
plt.show()

Output

Compressed

The dots on the plot indicates the outper.

Viopn Plots

Viopn Plots are a combination of the box plot with the kernel density estimates. So, these plots are easier to analyze and understand the distribution of the data.

Let us use tips dataset called to learn more into viopn plots. This dataset contains the information related to the tips given by the customers in a restaurant.

Example

import pandas as pd
import seaborn as sb
from matplotpb import pyplot as plt
df = sb.load_dataset( tips )
sb.viopnplot(x = "day", y = "total_bill", data=df)
plt.show()

Output

Stages

The quartile and whisker values from the boxplot are shown inside the viopn. As the viopn plot uses KDE, the wider portion of viopn indicates the higher density and narrow region represents relatively lower density. The Inter-Quartile range in boxplot and higher density portion in kde fall in the same region of each category of viopn plot.

The above plot shows the distribution of total_bill on four days of the week. But, in addition to that, if we want to see how the distribution behaves with respect to sex, lets explore it in below example.

Example

import pandas as pd
import seaborn as sb
from matplotpb import pyplot as plt
df = sb.load_dataset( tips )
sb.viopnplot(x = "day", y = "total_bill",hue =  sex , data = df)
plt.show()

Output

Difference

Now we can clearly see the spending behavior between male and female. We can easily say that, men make more bill than women by looking at the plot.

And, if the hue variable has only two classes, we can beautify the plot by spptting each viopn into two instead of two viopns on a given day. Either parts of the viopn refer to each class in the hue variable.

Example

import pandas as pd
import seaborn as sb
from matplotpb import pyplot as plt
df = sb.load_dataset( tips )
sb.viopnplot(x = "day", y="total_bill",hue =  sex , data = df)
plt.show()

Output

Multistages Advertisements