Introduction
This quick start page shows how to run the clustering Synthetic Control Data example. The data is described here .
Steps
- Download the data at http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series.
- In $MAHOUT_HOME/, build the Job file
- The same job is used for all examples so this only needs to be done once
- mvn install
- The job will be generated in $MAHOUT_HOME/examples/target/ and it's name will contain the Mahout version number. For example, when using Mahout 0.1 release, the job will be mahout-examples-0.1.job
- (Optional)
Start up Hadoop: $HADOOP_HOME/bin/start-all.sh
- Put the data: $HADOOP_HOME/bin/hadoop fs -put <PATH TO DATA> testdata
- Run the Job: $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-<MAHOUT VERSION>.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
- For kmeans : $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-<MAHOUT VERSION>.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
- For canopy : $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-<MAHOUT VERSION>.job org.apache.mahout.clustering.syntheticcontrol.canopy.Job
- For dirichlet : $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-<MAHOUT VERSION>.job org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
- For meanshift : $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-<MAHOUT VERSION>.job org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
- Get the data out of HDFS
and have a look
- All example jobs use testdata as input and output to directory output
- Use bin/hadoop fs -lsr output to view all outputs
- Output:
- KMeans is placed into output/points
- Canopy and MeanShift results are placed into output/clustered-points