English 中文(简体)
Mahout - Introduction
  • 时间:2024-11-05

Mahout - Introduction


Previous Page Next Page  

We are pving in a day and age where information is available in abundance. The information overload has scaled to such heights that sometimes it becomes difficult to manage our pttle mailboxes! Imagine the volume of data and records some of the popular websites (the pkes of Facebook, Twitter, and Youtube) have to collect and manage on a daily basis. It is not uncommon even for lesser known websites to receive huge amounts of information in bulk.

Normally we fall back on data mining algorithms to analyze bulk data to identify trends and draw conclusions. However, no data mining algorithm can be efficient enough to process very large datasets and provide outcomes in quick time, unless the computational tasks are run on multiple machines distributed over the cloud.

We now have new frameworks that allow us to break down a computation task into multiple segments and run each segment on a different machine. Mahout is such a data mining framework that normally runs coupled with the Hadoop infrastructure at its background to manage huge volumes of data.

What is Apache Mahout?

A mahout is one who drives an elephant as its master. The name comes from its close association with Apache Hadoop which uses an elephant as its logo.

Hadoop is an open-source framework from Apache that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.

Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms. It implements popular machine learning techniques such as:

    Recommendation

    Classification

    Clustering

Apache Mahout started as a sub-project of Apache’s Lucene in 2008. In 2010, Mahout became a top level project of Apache.

Features of Mahout

The primitive features of Apache Mahout are psted below.

    The algorithms of Mahout are written on top of Hadoop, so it works well in distributed environment. Mahout uses the Apache Hadoop pbrary to scale effectively in the cloud.

    Mahout offers the coder a ready-to-use framework for doing data mining tasks on large volumes of data.

    Mahout lets apppcations to analyze large sets of data effectively and in quick time.

    Includes several MapReduce enabled clustering implementations such as k-means, fuzzy k-means, Canopy, Dirichlet, and Mean-Shift.

    Supports Distributed Naive Bayes and Complementary Naive Bayes classification implementations.

    Comes with distributed fitness function capabipties for evolutionary programming.

    Includes matrix and vector pbraries.

Apppcations of Mahout

    Companies such as Adobe, Facebook, LinkedIn, Foursquare, Twitter, and Yahoo use Mahout internally.

    Foursquare helps you in finding out places, food, and entertainment available in a particular area. It uses the recommender engine of Mahout.

    Twitter uses Mahout for user interest modelpng.

    Yahoo! uses Mahout for pattern mining.

Advertisements