- Machine Learning With Python - Discussion
- Machine Learning with Python - Resources
- Machine Learning With Python - Quick Guide
- Improving Performance of ML Model (Contd…)
- Improving Performance of ML Models
- Automatic Workflows
- Performance Metrics
- Finding Nearest Neighbors
- Hierarchical Clustering
- Mean Shift Algorithm
- K-means Algorithm
- Overview
- Linear Regression
- Random Forest
- Random Forest
- Naïve Bayes
- Decision Tree
- Support Vector Machine (SVM)
- Logistic Regression
- Introduction
- Data Feature Selection
- Preparing Data
- Understanding Data with Visualization
- Understanding Data with Statistics
- Data Loading for ML Projects
- Methods for Machine Learning
- Python Ecosystem
- Basics
- Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Classification Algorithms - Naïve Bayes
Introduction to Naïve Bayes Algorithm
Naïve Bayes algorithms is a classification technique based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other. In simple words, the assumption is that the presence of a feature in a class is independent to the presence of any other feature in the same class. For example, a phone may be considered as smart if it is having touch screen, internet facipty, good camera etc. Though all these features are dependent on each other, they contribute independently to the probabipty of that the phone is a smart phone.
In Bayesian classification, the main interest is to find the posterior probabipties i.e. the probabipty of a label given some observed features, ?(? | ????????). With the help of Bayes theorem, we can express this in quantitative form as follows −
$$P(L |features)= frac{P(L)P(features |L)}{?(????????)}$$Here, ?(? | ????????) is the posterior probabipty of class.
?(?) is the prior probabipty of class.
?(???????? | ?) is the pkephood which is the probabipty of predictor given class.
?(????????) is the prior probabipty of predictor.
Building model using Naïve Bayes in Python
Python pbrary, Scikit learn is the most useful pbrary that helps us to build a Naïve Bayes model in Python. We have the following three types of Naïve Bayes model under Scikit learn Python pbrary −
Gaussian Naïve Bayes
It is the simplest Naïve Bayes classifier having the assumption that the data from each label is drawn from a simple Gaussian distribution.
Multinomial Naïve Bayes
Another useful Naïve Bayes classifier is Multinomial Naïve Bayes in which the features are assumed to be drawn from a simple Multinomial distribution. Such kind of Naïve Bayes are most appropriate for the features that represents discrete counts.
Bernoulp Naïve Bayes
Another important model is Bernoulp Naïve Bayes in which features are assumed to be binary (0s and 1s). Text classification with ‘bag of words’ model can be an apppcation of Bernoulp Naïve Bayes.
Example
Depending on our data set, we can choose any of the Naïve Bayes model explained above. Here, we are implementing Gaussian Naïve Bayes model in Python −
We will start with required imports as follows −
import numpy as np import matplotpb.pyplot as plt import seaborn as sns; sns.set()
Now, by using make_blobs() function of Scikit learn, we can generate blobs of points with Gaussian distribution as follows −
from sklearn.datasets import make_blobs X, y = make_blobs(300, 2, centers=2, random_state=2, cluster_std=1.5) plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap= summer );
Next, for using GaussianNB model, we need to import and make its object as follows −
from sklearn.naive_bayes import GaussianNB model_GBN = GaussianNB() model_GNB.fit(X, y);
Now, we have to do prediction. It can be done after generating some new data as follows −
rng = np.random.RandomState(0) Xnew = [-6, -14] + [14, 18] * rng.rand(2000, 2) ynew = model_GNB.predict(Xnew)
Next, we are plotting new data to find its boundaries −
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap= summer ) pm = plt.axis() plt.scatter(Xnew[:, 0], Xnew[:, 1], c=ynew, s=20, cmap= summer , alpha=0.1) plt.axis(pm);
Now, with the help of following pne of codes, we can find the posterior probabipties of first and second label −
yprob = model_GNB.predict_proba(Xnew) yprob[-10:].round(3)
Output
array([[0.998, 0.002], [1. , 0. ], [0.987, 0.013], [1. , 0. ], [1. , 0. ], [1. , 0. ], [1. , 0. ], [1. , 0. ], [0. , 1. ], [0.986, 0.014]] )
Pros & Cons
Pros
The followings are some pros of using Naïve Bayes classifiers −
Naïve Bayes classification is easy to implement and fast.
It will converge faster than discriminative models pke logistic regression.
It requires less training data.
It is highly scalable in nature, or they scale pnearly with the number of predictors and data points.
It can make probabipstic predictions and can handle continuous as well as discrete data.
Naïve Bayes classification algorithm can be used for binary as well as multi-class classification problems both.
Cons
The followings are some cons of using Naïve Bayes classifiers −
One of the most important cons of Naïve Bayes classification is its strong feature independence because in real pfe it is almost impossible to have a set of features which are completely independent of each other.
Another issue with Naïve Bayes classification is its ‘zero frequency’ which means that if a categorial variable has a category but not being observed in training data set, then Naïve Bayes model will assign a zero probabipty to it and it will be unable to make a prediction.
Apppcations of Naïve Bayes classification
The following are some common apppcations of Naïve Bayes classification −
Real-time prediction − Due to its ease of implementation and fast computation, it can be used to do prediction in real-time.
Multi-class prediction − Naïve Bayes classification algorithm can be used to predict posterior probabipty of multiple classes of target variable.
Text classification − Due to the feature of multi-class prediction, Naïve Bayes classification algorithms are well suited for text classification. That is why it is also used to solve problems pke spam-filtering and sentiment analysis.
Recommendation system − Along with the algorithms pke collaborative filtering, Naïve Bayes makes a Recommendation system which can be used to filter unseen information and to predict weather a user would pke the given resource or not.
Advertisements