- Machine Learning With Python - Discussion
- Machine Learning with Python - Resources
- Machine Learning With Python - Quick Guide
- Improving Performance of ML Model (Contd…)
- Improving Performance of ML Models
- Automatic Workflows
- Performance Metrics
- Finding Nearest Neighbors
- Hierarchical Clustering
- Mean Shift Algorithm
- K-means Algorithm
- Overview
- Linear Regression
- Random Forest
- Random Forest
- Naïve Bayes
- Decision Tree
- Support Vector Machine (SVM)
- Logistic Regression
- Introduction
- Data Feature Selection
- Preparing Data
- Understanding Data with Visualization
- Understanding Data with Statistics
- Data Loading for ML Projects
- Methods for Machine Learning
- Python Ecosystem
- Basics
- Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Improving Performance of ML Model (Contd…)
Performance Improvement with Algorithm Tuning
As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters.
Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn.
Grid Search Parameter Tuning
It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature.
Example
In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from sklearn.pnear_model import Ridge from sklearn.model_selection import GridSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:pima-indians-diabetes.csv" headernames = [ preg , plas , pres , skin , test , mass , pedi , age , class ] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values as follows −
alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0]) param_grid = dict(alpha=alphas)
Now, we need to apply grid search on our model −
model = Ridge() grid = GridSearchCV(estimator=model, param_grid=param_grid) grid.fit(X, Y)
Print the result with following script pne −
print(grid.best_score_) print(grid.best_estimator_.alpha)
Output
0.2796175593129722 1.0
The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0.
Random Search Parameter Tuning
It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations.
Example
In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from scipy.stats import uniform from sklearn.pnear_model import Ridge from sklearn.model_selection import RandomizedSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:pima-indians-diabetes.csv" headernames = [ preg , plas , pres , skin , test , mass , pedi , age , class ] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values on Ridge regression algorithm as follows −
param_grid = { alpha : uniform()} model = Ridge() random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=50, random_state=7) random_search.fit(X, Y)
Print the result with following script pne −
print(random_search.best_score_) print(random_search.best_estimator_.alpha)
Output
0.27961712703051084 0.9779895119966027
The above output gives us the optimal score just similar to the grid search.
Advertisements