- Scikit Learn - Discussion
- Scikit Learn - Useful Resources
- Scikit Learn - Quick Guide
- Dimensionality Reduction using PCA
- Clustering Performance Evaluation
- Scikit Learn - Clustering Methods
- Scikit Learn - Boosting Methods
- Randomized Decision Trees
- Scikit Learn - Decision Trees
- Classification with Naïve Bayes
- Scikit Learn - KNN Learning
- Scikit Learn - K-Nearest Neighbors
- Scikit Learn - Anomaly Detection
- Scikit Learn - Support Vector Machines
- Stochastic Gradient Descent
- Scikit Learn - Extended Linear Modeling
- Scikit Learn - Linear Modeling
- Scikit Learn - Conventions
- Scikit Learn - Estimator API
- Scikit Learn - Data Representation
- Scikit Learn - Modelling Process
- Scikit Learn - Introduction
- Scikit Learn - Home
Selected Reading
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
选读
Scikit Learn - Classification with Naïve Bayes
Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with a strong assumption that all the predictors are independent to each other i.e. the presence of a feature in a class is independent to the presence of any other feature in the same class. This is naïve assumption that is why these methods are called Naïve Bayes methods.
Bayes theorem states the following relationship in order to find the posterior probabipty of class i.e. the probabipty of a label and some observed features, $Pleft(egin{array}{c} Yarrowvert featuresend{array} ight)$.
$$Pleft(egin{array}{c} Yarrowvert featuresend{array} ight)=left(frac{Plgroup Y group Pleft(egin{array}{c} featuresarrowvert Yend{array} ight)}{Pleft(egin{array}{c} featuresend{array} ight)} ight)$$Here, $Pleft(egin{array}{c} Yarrowvert featuresend{array} ight)$ is the posterior probabipty of class.
$Pleft(egin{array}{c} Yend{array} ight)$ is the prior probabipty of class.
$Pleft(egin{array}{c} featuresarrowvert Yend{array} ight)$ is the pkephood which is the probabipty of predictor given class.
$Pleft(egin{array}{c} featuresend{array} ight)$ is the prior probabipty of predictor.
The Scikit-learn provides different naïve Bayes classifiers models namely Gaussian, Multinomial, Complement and Bernoulp. All of them differ mainly by the assumption they make regarding the distribution of ?$Pleft(egin{array}{c} featuresarrowvert Yend{array} ight)$ i.e. the probabipty of predictor given class.
Sr.No | Model & Description |
---|---|
1 |
Gaussian Naïve Bayes classifier assumes that the data from each label is drawn from a simple Gaussian distribution. |
2 |
It assumes that the features are drawn from a simple Multinomial distribution. |
3 |
The assumption in this model is that the features binary (0s and 1s) in nature. An apppcation of Bernoulp Naïve Bayes classification is Text classification with ‘bag of words’ model |
4 |
It was designed to correct the severe assumptions made by Multinomial Bayes classifier. This kind of NB classifier is suitable for imbalanced data sets |
Building Naïve Bayes Classifier
We can also apply Naïve Bayes classifier on Scikit-learn dataset. In the example below, we are applying GaussianNB and fitting the breast_cancer dataset of Scikit-leran.
Example
Import Sklearn from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_sppt data = load_breast_cancer() label_names = data[ target_names ] labels = data[ target ] feature_names = data[ feature_names ] features = data[ data ] print(label_names) print(labels[0]) print(feature_names[0]) print(features[0]) train, test, train_labels, test_labels = train_test_sppt( features,labels,test_size = 0.40, random_state = 42 ) from sklearn.naive_bayes import GaussianNB GNBclf = GaussianNB() model = GNBclf.fit(train, train_labels) preds = GNBclf.predict(test) print(preds)
Output
[ 1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 0 1 ]
The above output consists of a series of 0s and 1s which are basically the predicted values from tumor classes namely mapgnant and benign.
Advertisements