Overview of Mahout
Mahout's goal is to build scalable machine learning libraries. With scalable we mean:
- Scalable to reasonably large data sets. Our core algorithms for clustering, classification and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms.
- Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license.
- Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more.
Currently Mahout supports mainly three use cases: Recommendation mining takes users' behaviour and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.
Purpose of this wiki
For the main user, developer and infrastructure documentation please check our main web page at http://mahout.apache.org - there you will find design docs, links to JavaDoc, information on where to check out the code, write patches, submit issues and discuss on mailing lists.
Information in this wiki is mainly for work in progress - before it gets migrated to the main web page.