Apache Mahoutã§æ©æ¢°å¦ç¿ãã¦ã¿ãã¹
Mahoutã·ãªã¼ãºç®æ¬¡ï¼éææ´æ°ï¼
é忣ã¬ã³ã¡ã³ãã¼ã·ã§ã³
忣ã¬ã³ã¡ã³ãã¼ã·ã§ã³
ã§ã¯ãæ¬æããã¾ãã
Apache Mahoutã£ã¦ããæ©æ¢°å¦ç¿ã©ã¤ãã©ãªãããã¾ãã詳ããã¯Apache Mahout の紹介辺ããåèã«ãã¦ãã ããã
ã¾ããè¦ããã«ã¯ã©ã¹ã¿ãªã³ã°*1ã¨ããã¬ã³ã¡ã³ãã¼ã·ã§ã³*2ãªãããããã¯ã©ã¹ã©ã¤ãã©ãªã§ãã
ä¾ãã°Amazonã®ããããããã¯ã大å¢ã®ã¦ã¼ã¶ã®è³¼è²·å±¥æ´ã«åºã¥ãã¦ãåã¦ã¼ã¶ãèå³ãæã¡ãããªè£½åãç®åºãã¦ãã¾ããã¾ããFacebookã®ãããããã¦ããã®äººã¨ç¥ãåãã§ã¯ããã¾ãããï¼ãã¨ããã®ããæ©æ¢°å¦ç¿ã«ãããã®ã§ãã
ããã«ãGoogle Newsã«ã¯æ¬¡ã
ã¨ãã¥ã¼ã¹è¨äºãå
¥ã£ã¦ãã¾ãããããããã¥ã¼ã¹ã¨ãã®ç¶å ±ããªã©ãé¢é£ã®æ·±ããã¥ã¼ã¹ãã²ã¨ã¾ã¨ãã®ã¹ã¬ããã«ãã¦è¡¨ç¤ºããããã¦ãã¾ãããã®ããã¥ã¼ã¹ã®åé¡ããæåã§ã¯ãªããæ¬æä¸ã«ç¾ããåèªã®é »åº¦ãªã©ãåæããçµæãèªåã§ã¾ã¨ãã¦ãã¾ããã¾ããåãæè¡ã使ã£ã¦ãããã®ã¡ã¼ã«ã¯ã¹ãã ãå¦ããã¨ãããã¨ãè¨ç®ã§å¤å®ã§ããã®ã§ããããã¯gmailçã§å©ç¨ããã¦ãã¾ãã
ãããªä¸ãã¬ã³ã¡ã³ãã¼ã·ã§ã³ã¨ã³ã¸ã³ãªããã¯ãæ§ã ãªè¨èªã§è²ã ãªå®è£ ããããOSSã¨ãã¦å ¬éããã¦ãããã®ãå¤ãããã¾ãããããªä¸ã§Mahoutã䏿¼ãã§ããã®ã¯ãã¹ã±ã¼ã©ããªãã£ã®ç¢ºä¿ã«éç¹ãç½®ããã¦ãããã¨ã§ãã
æ©æ¢°å¦ç¿ã¨ããã®ã¯ãå½ç¶ãè¨ç®ã«åºã¥ãã¦çµæãåºãããã§ããããã®åºç¤ã¨ãªããã¼ã¿ãå¤ããã°å¤ãã»ã©ã確ããããçµæãåºãã¦ããã¾ãããããããããã¼ã¿ãå¤ããã°å¤ãã»ã©ãææ°çã«è¨ç®éãå¢å ããå¾åãããã¾ããè¿å¹´ããã®ãããªå¤§éãã¼ã¿å¦çã®ãã¼ãºã¯é«ã¾ã£ã¦ãã¾ãããã®è¾ºãã®èå¯ã¯Hadoopとかに入門してみる 〜 分散技術が出てきた背景 - 都元ダイスケ IT-PRESSãåç §ã
ãããHadoopã®ä¸ã§åãæ©æ¢°å¦ç¿ã©ã¤ãã©ãªãå®è£ ãããããããããã¦ããã®ãMahoutã§ãã
ããããããããã³ã¼ãè¦ãããã£ã¦æã£ã¦ã¾ããï¼ ã¨ããããã¬ã³ã¡ã³ãã¼ã·ã§ã³ã§ããã¾ããããã²ã¨ã¾ãHadoopã¨ãã¯åºã¦æ¥ã¾ãããç°¡åãªæãã§ããã¾ãã
ãã¼ã¿ãç¨æãã
ã¾ãããåºç¤ã¨ãªããã¼ã¿ããç¨æãã¾ãã誰ãä½ãã©ã®ããã好ãããã®ãã¼ã¿ã§ããã
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0
ãããªãã§ãããCSVãã©ã¼ãããã§ãuser,item,preference ã¨ããé ã«ä¸¦ãã§ã¾ããuserã¨itemã¯longåã®IDãpreferenceã¯floatåã§ããããã§ã¯1.0ã5.0ã®ç¯å²ã§é©å½ã«ã5人ã®ã¦ã¼ã¶ã¨7ã¤ã®ã¢ã¤ãã ã«ç»å ´ãã¦ãããã¾ããã1çªã®äººã¯101çªã®ã¢ã¤ãã ã«5.0ç¹ã¤ãã¾ãããâ¦(ä¸ç¥)â¦5çªã®äººã¯106ã®ã¢ã¤ãã ã«4.0ç¹ãã¤ãã¾ãããã£ã¦ãã¼ã¿ã§ããã
ãã®ç¶æ³ã§ã1çªã®äººã«ãæ°ããªã¢ã¤ãã *3ãã1ã¤ã ããè¦ãããã¨ãããã©ãï¼ ã¨ããã®ãã¬ã³ã¡ã³ãã§ããã¬ã³ã¡ã³ãã¼ã·ã§ã³ã¨ã³ã¸ã³ã¯101ã107ã®ã¢ã¤ãã ããç¥ããªãã®ã§ã104ã107ã®4æã§ããããã¼ã¿éãå¤ããªãã°ããã£ã¨å¹ åºããäºæ³ã®ã¤ããªãçµæãåºã¦é¢ç½ãã§ãã
Javaããã¸ã§ã¯ããç¨æãã
Mahoutã使ãããã«ãMavenã®dependencyã«ã³ã¬ãå ¥ãã¦ããã¾ãããã
<dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>0.4</version> </dependency>
ãã£ãç¨æããcsvã¯ãsrc/main/resources/intro.csv ãããã«ä¿åãã¦ããã¾ãããã
ã³ã¼ããæ¸ã
DataModel model = new FileDataModel(new File("src/main/resources/intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); // 第ä¸å¼æ°ã®æå³ã¯ææ¥è§£èª¬ UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity); // 1çªã®äººã«å¯¾ããã¬ã³ã¡ã³ãã1ã¤æ¬²ãããã¨ããæå³ã§(1, 1)ã§ãã List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); }
ããã ãã§ãï¼ ãã³ãã«ããã ãã§ããããã¼ã¼ãåºåçµæã¯ããã
RecommendedItem[item:104, value:4.257081]
1çªã®äººã«ãå§ããªã®ã¯ã104ã®ã¢ã¤ãã ã§ãæãããã®äººã«ãã®ã¢ã¤ãã ãè©ä¾¡ãããã4.25ç¹ããããã¤ããã§ããããã¨ãããã¨ã§ãã
*1:ãã¼ã¿ã®éåãã¯ã©ã¹ã¿ã¨å¼ã¶ã°ã«ã¼ãã«åãããä¼¼ããããªãã¼ã¿ãåãã¯ã©ã¹ã¿ã«å±ããããã«ãªãã
*2:å¤ãã®ã¦ã¼ã¶ã®å¥½ã¿ã«åºã¥ãã¦ãç¹å®ã¦ã¼ã¶ã¼ãèå³ãæã¤ã¨æãããæ å ±ããè¦ãã¨ãã¦æç¤ºããã
*3:101ã103ã¯æ¢ã«ç¥ã£ã¦ãã¢ã¤ãã ãªã®ã§ã¬ã³ã¡ã³ãããæå³ã¯ãªãã