Originally posted on Data Science Central
This article introduces Mahout, a library for scalable machine learning, and studies potential applications through two Mahout projects. It was written by Linda Terlouw. Linda is a computer scientist who works on Data Science (Data Analysis, Data Visualization, Process Mining).
Apache Mahout is a library for scalable machine learning. Originally a subproject of Apache Lucene (a high-performance text search engine library), Mahout has progressed to be a top-level Apache project.
While Mahout has only been around for a few years, it has established itself as a frontrunner in the field of machine learning technologies. Mahout has currently been adopted by: Foursquare, which uses Mahout with Apache Hadoop and Apache Hiveto power its recommendation engine; Twitter, which creates user interest models using Mahout; and Yahoo!, which uses Mahout in their anti-spam analytic platform. Other commercial and academic uses of Mahout have been catalogued at https://mahout.apache.org/general/powered-by-mahout.html.
This Refcard will present the basics of Mahout by studying two possible applications:
- Training and testing a Random Forest for handwriting recognition using Amazon Web Services EMR AND
- Running a recommendation engine on a standalone Spark cluster.
In this article there are 10 sections:
- Machine Learning
- Algorithms Supported in Apache Mahout
- Installing Apache Mahout
- Example of Multi-Class Classification Using Amazon Elastic MapReduce
- Getting and Preparing the Data
- Classifying From Command Line Using Amazon Elastic MapReduce
- Interpreting the Test Results
- Using Apache Mahout With Apache Spark for Recommendations
- Running Mahout from Java or Scala
To check out all this information, click here.
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science