Mahout ã¤ã³ã¹ãã¼ã«
Mahout ã¨ã¯ãæ©æ¢°å¦ç¿ã«é¢ããã©ã¤ãã©ãªã®ã»ããã»ã»ã»
é£ããäºãè²ã æ¸ãã¦ããã岿
ã¬ã³ã¡ã³ããã§ããã¨è¨ãäºã§å°å ¥ããæãã
åãHadoopãã¤ã³ã¹ãã¼ã«ããã®ã¯ãã®Mahoutã
使ããããããã®è¨å®ã
Hadoopèªä½ãçµæ§é¢ç½ããã ãã©ã
(Hadoop ããã°ã©ã ã¯å¾ã§å¦ç¿)
å¿ è¦ãªãã®
apache-maven-2.2.1-bin.tar.gz
jdk-6u20-linux-i586-rpm.bin
mahout-0.3-src.tar.gz
jdkã®ã¤ã³ã¹ãã¼ã«ã¯å²æ
Maven
ï¼Javaç¨ããã¸ã§ã¯ã管çãã¼ã« Mahoutããã«ãããããã«å¿ è¦ï¼
http://maven.apache.org/download.html ããããææ°çããã¦ã³ãã¼ã
apache-maven-2.2.1-bin.tar.gz
mkdir -p /home/maven
mv apache-maven-2.2.1-bin.tar.gz
tar xvzf apache-maven-2.2.1-bin.tar.gz
vi ~/.bashrc
MAVEN_HOME=/home/maven/apache-maven-2.2.1/
JAVA_HOME=/usr/java/jdk1.6.0_20/
export PATH=$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH
ã追å ãã¦ä¿å
次 Mahoutã®ãã¦ã³ãã¼ã
http://ftp.kddilabs.jp/infosystems/apache/lucene/mahout/0.3/mahout-0.3-src.tar.gz
tar xvzf mahout-0.3-src.tar.gz
cd mahout-0.3
mvn install
Warning: JAVA_HOME environment variable is not set.
[INFO] Scanning for projects...
[INFO] Reactor build order:
[INFO] Mahout Common Maven Parent
[INFO] Maven Mojo to generate code for collections
[INFO] Mahout Collections
[INFO] Mahout Math
[INFO] Mahout Core
[INFO] Mahout Taste Webapp
[INFO] Mahout Utilities
[INFO] Mahout Examples
[INFO] Apache Lucene Mahoutãã»ã»ã»
ã¨ããããã¨ãã«ããã°ãåºã¦ãã»ã»ã»ã»
ãªãã¨ãªã1è¡ç®ã®Warningãæ°ã«ãªãããã
ãã«ãä¸ã«ãã¹ããã¦ããããã¡ãã£ã¨æéããããã
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] Mahout Common Maven Parent ............................ SUCCESS [2:10.181s]
[INFO] Maven Mojo to generate code for collections ........... SUCCESS [3:00.867s]
[INFO] Mahout Collections .................................... SUCCESS [35.321s]
[INFO] Mahout Math ........................................... SUCCESS [40.775s]
[INFO] Mahout Core ........................................... SUCCESS [12:22.625s]
[INFO] Mahout Taste Webapp ................................... SUCCESS [45.399s]
[INFO] Mahout Utilities ...................................... SUCCESS [17.021s]
[INFO] Mahout Examples ....................................... SUCCESS [52.285s]
[INFO] Apache Lucene Mahout .................................. SUCCESS [5.169s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20 minutes 50 seconds
[INFO] Finished at: Wed Apr 21 16:45:53 JST 2010
[INFO] Final Memory: 68M/162M
[INFO] ------------------------------------------------------------------------
ã¨åºã¦ãã«ãçµäºã
ã§ã¼åããã
https://cwiki.apache.org/MAHOUT/syntheticcontroldata.html
ãã¼ã¿ã
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data
ãããããã¦ã³ãã¼ããã¦
ï¼æ°å¤ã®ç¾ åã»ã»ã»ã»ï¼
hadoop fs -put synthetic_control.data testdata
(ãªãtestdataã¨è¨ãååãã¨è¨ãã¨quickstartã«
All example jobs use testdata as input and output to directory output
ã¨æ¸ããã¦ããããï½
æå³ã¯ä¸ã®ç°¡åãªè±æã§ããæå³ãããã£ã¦ããªãï½)
kmeans
canopy
dirichlet
meanshift
å顿¹æ³ãããã¿ããã
kmeansæ³ã¯
hadoop jar mahout-examples-0.3.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
ã¨ããã¨åé¡ãå§ã¾ãã
ãã°ãããããåºã¦å¦çãçµãã
kmeansæ³ãªã®ã§output/pointsã®ä¸ã«åºåãããã¯ãã
hadoop fs -get output output ã§ãã°ãæã£ã¦ããã
次ã¯ã¬ã³ã¡ã³ããåãããããã©ï½
Tasteã£ã¦ããããã¦èªåã§å®è£ ããæããªã®ãï¼ï½
mahout/ã¬ã³ã¡ã³ããµã¼ãã¬ããã®ä½ãæ¹ã¨ãããã¼ã¸ã
ãã®ã¾ã¾ä½æ¥ãã¦ã¿ãã
http://www.pwv.co.jp/~take/TakeWiki/index.php?mahout%2F%E3%83%AC%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%89%E3%82%B5%E3%83%BC%E3%83%96%E3%83%AC%E3%83%83%E3%83%88%E3%81%AE%E4%BD%9C%E3%82%8A%E6%96%B9
ãªãã®äºã ããã£ã±ãããããªãã以ä¸ã³ãã³ããæã¤ã
mvn archetype:create -DgroupId=sample.recommendWeb -Dartifac
tId=recommendWeb -DarchetypeArtifactId=maven-archetype-webapp -D=Version=0.0.1
å¿ è¦ãªãã®ãä½ã£ã¦ããã¦ãæãï¼
tree recommendWeb
recommendWeb/
|-- pom.xml
`-- src
ãã`-- main
ãããã|-- resources
ãããã`-- webapp
ãããããã|-- WEB-INF
ãããããã| `-- web.xml
ãããããã`-- index.jsp
5 directories, 3 files
ããã¾ã§ã¯OKã ãªã
pom.xmlã«è¿½å ããããpom.xmlã¯ä»¥ä¸ã®ããã«ãªã£ãã
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<modelVersion>4.0.0</modelVersion>
<groupId>sample.recommendWeb</groupId>
<artifactId>recommendWeb</artifactId>
<packaging>war</packaging>
<version>1.0-SNAPSHOT</version>
<name>recommendWeb Maven Webapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.3</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jcl</artifactId>
<version>1.5.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
<version>2.5</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<finalName>recommendWeb</finalName>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.mortbay.jetty</groupId>
<artifactId>maven-jetty-plugin</artifactId>
<version>6.1.7</version>
</plugin>
</plugins>
</build>
</project>
recommendWebã®ãã£ã¬ã¯ããªã§
mkdir -p mkdir src/main/java
ä»åã¯eclipse使ããªãã®ã ããã©ãæ¸ãã¦ããã®ã§ããã¾ãã
mvn eclipse:eclipse -DdownloadSources=true
åèã®wikiã«å¾ãã¨æ¬¡ã¯javaãã¡ã¤ã«ã®è¿½å
ã§ãã©ãã«è¿½å ããã®ã ããï¼ï½
mvnãããã£ã¦ãªãã®ã§ä¸æã ãªã
mvn compileã¨ããã¨
ããã©ã«ãã§ã¯src/main/javaã®ä¸ã®.javaãã³ã³ãã¤ã«ãããããã
ã¨ãããã¨ã§src/main/javaã«é ç½®ã ãªã
mvn compileã¨ã©ã¼ç¶åºw
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
å é ã«ã¤ã³ãã¼ãæã追å ããã
ãã«ããéã£ãï½
mvn jetty:runãã»ã»ã»ãã¨ã©ã¼ã
ã§ãjettyã®ã¨ã©ã¼ã§æ©ãã§ã¦ããããããªãã
tomcatã«å¤æ´ï½
ï¼é å¤ãããç¥ã£ã¦ãããã®ã§ããåãããªãã®ãããï¼
$TOMCAT_HOME/webapps/ã§
mkdir -p recommendWeb/WEB-INF/classes
mkdir -p recommendWeb/WEB-INF/lib
ã§
$TOMCAT_HOME/webapps/recommendWeb/ã§
vi web.xml
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
version="2.5">
<display-name>Recommend Web</display-name>
<servlet>
<servlet-name>web-recommender</servlet-name>
<servlet-class>RecommenderServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>web-recommender</servlet-name>
<url-pattern>/RecommenderServlet</url-pattern>
</servlet-mapping>
</web-app>
ã¨ãã¦ä¿å
recommendWeb/WEB-INF/libã®ä¸ã«ã¯å¿ è¦ãªjarãæ¾ãè¾¼ã
mahout-0.3/examples/target/dependency/ã®é ä¸ã®jar
ããããªãã®ã§ä¸è¨ãå ¨é¨æ¾ãè¾¼ãã
recommendWeb/WEB-INF/classesã«ã¯
mvn compileãã¦åºæ¥ä¸ãã£ãRecommenderServlet.class
ãå ¥ãã¾ãã®ã
WEB-INFãã£ã¬ã¯ããªç´ä¸ã«
critics.csvã«å ¥ããã®ã ãã©ãããã¯
http://www.grouplens.org/node/73ã®
100k Ratings Data Set (.tar.gz)ã£ã¦ãªã³ã¯ããæ¾ã£ã¦è§£åãã¦
è§£åããã¨ããããåºã¦ãããã
awk '{printf("%s,%s,%sn",$1,$2,$3);}' ua.base > critics.csv
ã¨ãã¦ä½ãã
ã¡ãªã¿ã«
http://lucene.apache.org/mahout/taste.html#examplesãã¿ãã¨
ãã¼ã¿ã®ä¸¦ã³ã¯
First, create a DataModel of some kind. Here, we'll use a simple on based on data in a file. The file should be in CSV format, with lines of the form userID,itemID,prefValue (e.g. "39505,290002,3.5"):
ã¨ããã¦ããã®ã§
ã¦ã¼ã¶ID,ã¢ã¤ãã ID,è©ä¾¡å¤(ãªã®ãï¼)ã®
並ã³ã¨ãã¦ããã ãããªãå¤åãèªããªããã©ã
tomcatãèµ·åãã¦
http://hadoop:8080/recommendWeb/RecommenderServlet?userID=7
5.0 345
5.0 313
5.0 328
ã¨è¡¨ç¤ºããã¦OKã ãªã
u.itemã®ä¸ãè¦ãã¨
345=Deconstructing Harry
313=Titanic
328=Conspiracy Theory
ã£ã¦è¨ãæãã®ãå§ããªãã ãããªã
http://hadoop:8080/recommendWeb/RecommenderServlet?userID=7&howMany=10
ã¨howManyãã©ã¡ã¼ã¿ãã¤ãããããããã
5.0 328
5.0 313
5.0 302
5.0 345
4.0 295
4.0 751
4.0 315
4.0 321
4.0 340
4.0 319
ãµã³ãã«ãªã®ã§ã¢ã¬ã ã
åé¡ãäºã¤
ã»èµ·åæã«ãã¼ã¿ãèªã¿è¾¼ãäº
ã»ãã¼ã¿ã«è©ä¾¡å¤ãã¤ãã¦ã
ï¼å¤§æµã®å ´åè©ä¾¡å¤ã¯ãã¾ãã¤ãã¦ããªãã®ã§ã¯ãªããã¨ã»ã»ã»ï¼
ãããgooããã° 10000æåã®å¶éããå°ãå¤ããªããªãããï¼
ããªãç·¨éè¦å´ãããã ãã©ã
使¥ãå ¨é¨æ¸ããããã ãããããããªããªã£ããã¨ãããããªããªãããã
é¢ä¿ãªããã© easyrec(http://easyrec.org/)
ã£ã¦ããã¬ã³ã¡ã³ãã·ã¹ãã ãç¥ã£ãã
ãã¹ã«è¨ããã¦ããããªãé·ãæéãããã£ãã±ã
Mahoutåããã¦è¯ãã£ãã
ã¬ã³ã¡ã³ãã®ç¨®é¡
http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/package-summary.html
Mahout ã¨ã¯ãæ©æ¢°å¦ç¿ã«é¢ããã©ã¤ãã©ãªã®ã»ããã»ã»ã»
é£ããäºãè²ã æ¸ãã¦ããã岿
ã¬ã³ã¡ã³ããã§ããã¨è¨ãäºã§å°å ¥ããæãã
åãHadoopãã¤ã³ã¹ãã¼ã«ããã®ã¯ãã®Mahoutã
使ããããããã®è¨å®ã
Hadoopèªä½ãçµæ§é¢ç½ããã ãã©ã
(Hadoop ããã°ã©ã ã¯å¾ã§å¦ç¿)
å¿ è¦ãªãã®
apache-maven-2.2.1-bin.tar.gz
jdk-6u20-linux-i586-rpm.bin
mahout-0.3-src.tar.gz
jdkã®ã¤ã³ã¹ãã¼ã«ã¯å²æ
Maven
ï¼Javaç¨ããã¸ã§ã¯ã管çãã¼ã« Mahoutããã«ãããããã«å¿ è¦ï¼
http://maven.apache.org/download.html ããããææ°çããã¦ã³ãã¼ã
apache-maven-2.2.1-bin.tar.gz
mkdir -p /home/maven
mv apache-maven-2.2.1-bin.tar.gz
tar xvzf apache-maven-2.2.1-bin.tar.gz
vi ~/.bashrc
MAVEN_HOME=/home/maven/apache-maven-2.2.1/
JAVA_HOME=/usr/java/jdk1.6.0_20/
export PATH=$MAVEN_HOME/bin:$JAVA_HOME/bin:$PATH
ã追å ãã¦ä¿å
次 Mahoutã®ãã¦ã³ãã¼ã
http://ftp.kddilabs.jp/infosystems/apache/lucene/mahout/0.3/mahout-0.3-src.tar.gz
tar xvzf mahout-0.3-src.tar.gz
cd mahout-0.3
mvn install
Warning: JAVA_HOME environment variable is not set.
[INFO] Scanning for projects...
[INFO] Reactor build order:
[INFO] Mahout Common Maven Parent
[INFO] Maven Mojo to generate code for collections
[INFO] Mahout Collections
[INFO] Mahout Math
[INFO] Mahout Core
[INFO] Mahout Taste Webapp
[INFO] Mahout Utilities
[INFO] Mahout Examples
[INFO] Apache Lucene Mahoutãã»ã»ã»
ã¨ããããã¨ãã«ããã°ãåºã¦ãã»ã»ã»ã»
ãªãã¨ãªã1è¡ç®ã®Warningãæ°ã«ãªãããã
ãã«ãä¸ã«ãã¹ããã¦ããããã¡ãã£ã¨æéããããã
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] Mahout Common Maven Parent ............................ SUCCESS [2:10.181s]
[INFO] Maven Mojo to generate code for collections ........... SUCCESS [3:00.867s]
[INFO] Mahout Collections .................................... SUCCESS [35.321s]
[INFO] Mahout Math ........................................... SUCCESS [40.775s]
[INFO] Mahout Core ........................................... SUCCESS [12:22.625s]
[INFO] Mahout Taste Webapp ................................... SUCCESS [45.399s]
[INFO] Mahout Utilities ...................................... SUCCESS [17.021s]
[INFO] Mahout Examples ....................................... SUCCESS [52.285s]
[INFO] Apache Lucene Mahout .................................. SUCCESS [5.169s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20 minutes 50 seconds
[INFO] Finished at: Wed Apr 21 16:45:53 JST 2010
[INFO] Final Memory: 68M/162M
[INFO] ------------------------------------------------------------------------
ã¨åºã¦ãã«ãçµäºã
ã§ã¼åããã
https://cwiki.apache.org/MAHOUT/syntheticcontroldata.html
ãã¼ã¿ã
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data
ãããããã¦ã³ãã¼ããã¦
ï¼æ°å¤ã®ç¾ åã»ã»ã»ã»ï¼
hadoop fs -put synthetic_control.data testdata
(ãªãtestdataã¨è¨ãååãã¨è¨ãã¨quickstartã«
All example jobs use testdata as input and output to directory output
ã¨æ¸ããã¦ããããï½
æå³ã¯ä¸ã®ç°¡åãªè±æã§ããæå³ãããã£ã¦ããªãï½)
kmeans
canopy
dirichlet
meanshift
å顿¹æ³ãããã¿ããã
kmeansæ³ã¯
hadoop jar mahout-examples-0.3.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
ã¨ããã¨åé¡ãå§ã¾ãã
ãã°ãããããåºã¦å¦çãçµãã
kmeansæ³ãªã®ã§output/pointsã®ä¸ã«åºåãããã¯ãã
hadoop fs -get output output ã§ãã°ãæã£ã¦ããã
次ã¯ã¬ã³ã¡ã³ããåãããããã©ï½
Tasteã£ã¦ããããã¦èªåã§å®è£ ããæããªã®ãï¼ï½
mahout/ã¬ã³ã¡ã³ããµã¼ãã¬ããã®ä½ãæ¹ã¨ãããã¼ã¸ã
ãã®ã¾ã¾ä½æ¥ãã¦ã¿ãã
http://www.pwv.co.jp/~take/TakeWiki/index.php?mahout%2F%E3%83%AC%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%89%E3%82%B5%E3%83%BC%E3%83%96%E3%83%AC%E3%83%83%E3%83%88%E3%81%AE%E4%BD%9C%E3%82%8A%E6%96%B9
ãªãã®äºã ããã£ã±ãããããªãã以ä¸ã³ãã³ããæã¤ã
mvn archetype:create -DgroupId=sample.recommendWeb -Dartifac
tId=recommendWeb -DarchetypeArtifactId=maven-archetype-webapp -D=Version=0.0.1
å¿ è¦ãªãã®ãä½ã£ã¦ããã¦ãæãï¼
tree recommendWeb
recommendWeb/
|-- pom.xml
`-- src
ãã`-- main
ãããã|-- resources
ãããã`-- webapp
ãããããã|-- WEB-INF
ãããããã| `-- web.xml
ãããããã`-- index.jsp
5 directories, 3 files
ããã¾ã§ã¯OKã ãªã
pom.xmlã«è¿½å ããããpom.xmlã¯ä»¥ä¸ã®ããã«ãªã£ãã
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<modelVersion>4.0.0</modelVersion>
<groupId>sample.recommendWeb</groupId>
<artifactId>recommendWeb</artifactId>
<packaging>war</packaging>
<version>1.0-SNAPSHOT</version>
<name>recommendWeb Maven Webapp</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.3</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jcl</artifactId>
<version>1.5.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
<version>2.5</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<finalName>recommendWeb</finalName>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.mortbay.jetty</groupId>
<artifactId>maven-jetty-plugin</artifactId>
<version>6.1.7</version>
</plugin>
</plugins>
</build>
</project>
recommendWebã®ãã£ã¬ã¯ããªã§
mkdir -p mkdir src/main/java
ä»åã¯eclipse使ããªãã®ã ããã©ãæ¸ãã¦ããã®ã§ããã¾ãã
mvn eclipse:eclipse -DdownloadSources=true
åèã®wikiã«å¾ãã¨æ¬¡ã¯javaãã¡ã¤ã«ã®è¿½å
ã§ãã©ãã«è¿½å ããã®ã ããï¼ï½
mvnãããã£ã¦ãªãã®ã§ä¸æã ãªã
mvn compileã¨ããã¨
ããã©ã«ãã§ã¯src/main/javaã®ä¸ã®.javaãã³ã³ãã¤ã«ãããããã
ã¨ãããã¨ã§src/main/javaã«é ç½®ã ãªã
mvn compileã¨ã©ã¼ç¶åºw
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
å é ã«ã¤ã³ãã¼ãæã追å ããã
ãã«ããéã£ãï½
mvn jetty:runãã»ã»ã»ãã¨ã©ã¼ã
ã§ãjettyã®ã¨ã©ã¼ã§æ©ãã§ã¦ããããããªãã
tomcatã«å¤æ´ï½
ï¼é å¤ãããç¥ã£ã¦ãããã®ã§ããåãããªãã®ãããï¼
$TOMCAT_HOME/webapps/ã§
mkdir -p recommendWeb/WEB-INF/classes
mkdir -p recommendWeb/WEB-INF/lib
ã§
$TOMCAT_HOME/webapps/recommendWeb/ã§
vi web.xml
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
version="2.5">
<display-name>Recommend Web</display-name>
<servlet>
<servlet-name>web-recommender</servlet-name>
<servlet-class>RecommenderServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>web-recommender</servlet-name>
<url-pattern>/RecommenderServlet</url-pattern>
</servlet-mapping>
</web-app>
ã¨ãã¦ä¿å
recommendWeb/WEB-INF/libã®ä¸ã«ã¯å¿ è¦ãªjarãæ¾ãè¾¼ã
mahout-0.3/examples/target/dependency/ã®é ä¸ã®jar
ããããªãã®ã§ä¸è¨ãå ¨é¨æ¾ãè¾¼ãã
recommendWeb/WEB-INF/classesã«ã¯
mvn compileãã¦åºæ¥ä¸ãã£ãRecommenderServlet.class
ãå ¥ãã¾ãã®ã
WEB-INFãã£ã¬ã¯ããªç´ä¸ã«
critics.csvã«å ¥ããã®ã ãã©ãããã¯
http://www.grouplens.org/node/73ã®
100k Ratings Data Set (.tar.gz)ã£ã¦ãªã³ã¯ããæ¾ã£ã¦è§£åãã¦
è§£åããã¨ããããåºã¦ãããã
awk '{printf("%s,%s,%sn",$1,$2,$3);}' ua.base > critics.csv
ã¨ãã¦ä½ãã
ã¡ãªã¿ã«
http://lucene.apache.org/mahout/taste.html#examplesãã¿ãã¨
ãã¼ã¿ã®ä¸¦ã³ã¯
First, create a DataModel of some kind. Here, we'll use a simple on based on data in a file. The file should be in CSV format, with lines of the form userID,itemID,prefValue (e.g. "39505,290002,3.5"):
ã¨ããã¦ããã®ã§
ã¦ã¼ã¶ID,ã¢ã¤ãã ID,è©ä¾¡å¤(ãªã®ãï¼)ã®
並ã³ã¨ãã¦ããã ãããªãå¤åãèªããªããã©ã
tomcatãèµ·åãã¦
http://hadoop:8080/recommendWeb/RecommenderServlet?userID=7
5.0 345
5.0 313
5.0 328
ã¨è¡¨ç¤ºããã¦OKã ãªã
u.itemã®ä¸ãè¦ãã¨
345=Deconstructing Harry
313=Titanic
328=Conspiracy Theory
ã£ã¦è¨ãæãã®ãå§ããªãã ãããªã
http://hadoop:8080/recommendWeb/RecommenderServlet?userID=7&howMany=10
ã¨howManyãã©ã¡ã¼ã¿ãã¤ãããããããã
5.0 328
5.0 313
5.0 302
5.0 345
4.0 295
4.0 751
4.0 315
4.0 321
4.0 340
4.0 319
ãµã³ãã«ãªã®ã§ã¢ã¬ã ã
åé¡ãäºã¤
ã»èµ·åæã«ãã¼ã¿ãèªã¿è¾¼ãäº
ã»ãã¼ã¿ã«è©ä¾¡å¤ãã¤ãã¦ã
ï¼å¤§æµã®å ´åè©ä¾¡å¤ã¯ãã¾ãã¤ãã¦ããªãã®ã§ã¯ãªããã¨ã»ã»ã»ï¼
ãããgooããã° 10000æåã®å¶éããå°ãå¤ããªããªãããï¼
ããªãç·¨éè¦å´ãããã ãã©ã
使¥ãå ¨é¨æ¸ããããã ãããããããªããªã£ããã¨ãããããªããªãããã
é¢ä¿ãªããã© easyrec(http://easyrec.org/)
ã£ã¦ããã¬ã³ã¡ã³ãã·ã¹ãã ãç¥ã£ãã
ãã¹ã«è¨ããã¦ããããªãé·ãæéãããã£ãã±ã
Mahoutåããã¦è¯ãã£ãã
ã¬ã³ã¡ã³ãã®ç¨®é¡
http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/package-summary.html
åå¼·ããã¦ããã ãã¾ãï¼