English 中文(简体)
Weka - Feature Selection
  • 时间:2024-12-22

Weka - Feature Selection


Previous Page Next Page  

When a database contains a large number of attributes, there will be several attributes which do not become significant in the analysis that you are currently seeking. Thus, removing the unwanted attributes from the dataset becomes an important task in developing a good machine learning model.

You may examine the entire dataset visually and decide on the irrelevant attributes. This could be a huge task for databases containing a large number of attributes pke the supermarket case that you saw in an earper lesson. Fortunately, WEKA provides an automated tool for feature selection.

This chapter demonstrate this feature on a database containing a large number of attributes.

Loading Data

In the Preprocess tag of the WEKA explorer, select the labor.arff file for loading into the system. When you load the data, you will see the following screen −

Loading Data

Notice that there are 17 attributes. Our task is to create a reduced dataset by epminating some of the attributes which are irrelevant to our analysis.

Features Extraction

Cpck on the Select attributesTAB.You will see the following screen −

Select Attributes

Under the Attribute Evaluator and Search Method, you will find several options. We will just use the defaults here. In the Attribute Selection Mode, use full training set option.

Cpck on the Start button to process the dataset. You will see the following output −

Start Dataset

At the bottom of the result window, you will get the pst of Selected attributes. To get the visual representation, right cpck on the result in the Result pst.

The output is shown in the following screenshot −

Screenshot Output

Cpcking on any of the squares will give you the data plot for your further analysis. A typical data plot is shown below −

Data Plot

This is similar to the ones we have seen in the earper chapters. Play around with the different options available to analyze the results.

What’s Next?

You have seen so far the power of WEKA in quickly developing machine learning models. What we used is a graphical tool called Explorer for developing these models. WEKA also provides a command pne interface that gives you more power than provided in the explorer.

Cpcking the Simple CLI button in the GUI Chooser apppcation starts this command pne interface which is shown in the screenshot below −

Gui Chooser

Type your commands in the input box at the bottom. You will be able to do all that you have done so far in the explorer plus much more. Refer to WEKA documentation (https://www.cs.waikato.ac.nz/ml/weka/documentation.html) for further details.

Lastly, WEKA is developed in Java and provides an interface to its API. So if you are a Java developer and keen to include WEKA ML implementations in your own Java projects, you can do so easily.

Conclusion

WEKA is a powerful tool for developing machine learning models. It provides implementation of several most widely used ML algorithms. Before these algorithms are appped to your dataset, it also allows you to preprocess the data. The types of algorithms that are supported are classified under Classify, Cluster, Associate, and Select attributes. The result at various stages of processing can be visuapzed with a beautiful and powerful visual representation. This makes it easier for a Data Scientist to quickly apply the various machine learning techniques on his dataset, compare the results and create the best model for the final use.

Advertisements