UCI Machine Learning Repository ããã¤ãã³ã°ãã(Machine Learning Advent Calendar 12æ¥ç®)
Machine Learning Advent Calendar 2013 - Qiita
Machine Learning Advent Calendarã§ãï¼
æ¬æ¥ãªãã°ï¼ä»¥å少しでも研究に興味がある人,面白いテーマを探している人は「研究に必要なたったN個の事」とかいう記事を読まずに今すぐに"How to do good research, get it published in SIGKDD and get it cited!"を読むべき - 糞ネット弁慶ã§æ触ããKeoghå
çã®é¢ç½ãã¼ã¿ã使ã£ãè«æã«ã¤ãã¦è§¦ããäºå®ã§ãããï¼ããã§äºå®ãå¤æ´ãã¦ï¼æ°å¼ãä¸ååºã¦ããªãæ¥è¨ãæ¸ããã¨æãã¾ãï¼ãã¡ãã«ã¤ãã¦ã¯ãã¤ãã¾ã¨ãã¦æ¸ãã¾ãï¼
ä»åã¯æ©æ¢°å¦ç¿ãè¡ã£ã¦ãã人ãªãã°ä¸åº¦ã¯ã¢ã¯ã»ã¹ãããã¨ãããã§ãããï¼ UCI Machine Learning Repository ã«ã¤ãã¦ï¼ã©ã®ãããªãã¼ã¿ãåããã¦ãã®ããç°¡åã«è¦ã¦ã¿ãäºã«ãã¾ãï¼
UCI Machine Learning Repository ã¨ã¯?
http://archive.ics.uci.edu/ml/index.html
ãã®åã®éãï¼ã«ãªãã©ã«ãã¢å¤§å¦ã¢ã¼ãã¤ã³æ ¡ãéå¶ãã¦ããï¼æ©æ¢°å¦ç¿ããã¼ã¿ãã¤ãã³ã°ã«é¢ãããã¼ã¿ã®é
å¸ãµã¤ãã§ãï¼
google scholarで検索してみるとï¼ UCI Machine Learning Repository ã®ãã¼ã¿ã使ã£ã¦æ¸ãããè«æ(æ£ç¢ºã«ã¯å¼ç¨ãã¦ããè«æ)ã¯ç¾å¨5121件ããããã§ãï¼å®éã¯ãã®æ°åã®ã¦ã¼ã¶ããã®ãµã¤ãã§é
å¸ããããã¼ã¿ãå©ç¨ãã¦ãªãããã®åæãRã®ãµã³ãã«ã³ã¼ããåããããã¨ãããã¨æãã¾ãï¼
代表çãªãã¼ã¿ã¨ã¯?
Iris
UCI Machine Learning Repository: Iris Data Set
Rã®ãµã³ãã«ãã¼ã¿ã§ã馴æã¿ã®Irisï¼ã¢ã¤ã¡ã«é¢ãããã¼ã¿ã§ãï¼
詳細ã¯æ¨å¹´æ¸ããããã¡ãã®ã¨ã³ããªã§åãä¸ãããã¦ãã¾ã(irisの正体 (R Advent Calendar 2012 6日目) - どんな鳥も)ï¼
Wine Quality
UCI Machine Learning Repository: Wine Quality Data Set
山形浩çããã®æ°å¦ãæ¦ç¥ã決ããããï¼æè¿ã§ã¯æ©æ¬æ主æ¼ã®ãã©ã「ハードナッツ! 〜数学girlの恋する事件簿〜」ã§ããã¿ã«ããã¦ããï¼ã¯ã¤ã³ã®å質ã«é¢ãããã¼ã¿ã»ããï¼
詳細ã¯ãã¡ãã®ã¨ã³ããªã§åãä¸ãããã¦ãã¾ã(ワインの味(美味しさのグレード)は予測できるか?(1) - verum ipsum factum)ï¼
æãåºãç©ã¯ããã?
ç¾å¨264ãã®ãã¼ã¿ã»ããããã UCI Machine Learning Repository ï¼ä¸ã¤ãã¤ãã¼ã¿ãè¦ã¦ããã«ã¯æéãç¡ãã®ã§ï¼
- ãã¼ã¿éãå¤ã
- ãã¾ã注ç®ããã¦ããªã
ã¨ããï¼ãã®äºã¤ã®å±æ§ãæºãããããªãã¼ã¿ãæ¢ãã¦ã¿ããã¨ã«ãã¾ãï¼
幸ããªãã¨ã«ï¼é
å¸ããã¦ãããã¼ã¿ã«ã¯ãã¼ã¿ã®æ¬¡å
æ°ï¼ãã¼ã¿æ°ï¼åã³ Number of Web Hitsã¨ããé
ç®ãããã®ã§ï¼
- ãã¼ã¿ã®æ¬¡å æ° * ãã¼ã¿æ°ããã¼ã¿éã¨ãã¦æ¨ªè»¸
- Number of Web Hitsã注ç®åº¦ã¨ãã¦ç¸¦è»¸
ã«ãã¦æ£å¸å³ã§æãã¦ã¿ããã¨ã«ãã¾ãï¼
ãã¼ã¿éã«åãããããããªã®ã§ï¼logãåã£ããã®ã§ããä¸åº¦ï¼
ããããããã¼ã¿ãè¦ãã¦ãããããªã®ã§ï¼å³ä¸ãããã«ãããã¼ã¿ãè¦ã¦ãããã¨ã«ãã¾ãï¼
æãåºãç©?
URL Reputation
UCI Machine Learning Repository: URL Reputation Data Set
2396130次å
ï¼3231961åã§æ§æããããã¼ã¿ï¼
å
容ã¯ï¼URLãå½ç©ã売ããµã¤ãããã£ãã·ã³ã°ï¼ãã«ã¦ã§ã¢ã®é
å¸ãµã¤ãã¨ãã£ãï¼æªæã®ãããµã¤ãã示ããã®ãã©ãããå¤å®ããããã«ï¼URLã¨ããã«å¯¾ããç¹å¾´é(Hostnameï¼TLDï¼WHOIS infoï¼IP prefixãªã©)ã§æ§æããã¦ããããã§ãï¼
ãã®ææ³ã使ã£ã¦æ¸ãããå
è«æ(Identifying Suspicious URLs: An Application of Large-Scale Online Learning(ICML 2009))ã§ã¯ï¼ããããCWãªã©ã®ãªã³ã©ã¤ã³ã¢ã«ã´ãªãºã ã使ã£ã¦å¤å¥ããã¿ã¹ã¯ã«åãçµã¾ãã¦ãã¾ãï¼
å
è«æã®æãï¼æ¬¡å
æ°ãå¤ãã®ã¯ããããã®æ¬¡å
ã«ã¤ãã¦bag-of-wordsã®ããã«ã¦ãã¼ã¯ãªåèªã¨ãã¦æ±ã£ã¦ããããã®ããã«è¦ãã¾ãï¼
YouTube Multiview Video Games Dataset
UCI Machine Learning Repository: YouTube Multiview Video Games Dataset Data Set
1000000次å
ï¼120000件ã§æ§æããããã¼ã¿ï¼
å
¬éãããã®ã¯ä»å¹´10æï¼å
¬éè
ã¯ãã¡ããgoogleï¼
ã詳細ã¯READMEèªãã§ãããã¨ã®ãã¨ã§ãã2.8Gã®ãµã¤ãºã®ãã¼ã¿ããã¦ã³ãã¼ãããæéãç¡ãã®ã§ï¼ãã®ãã¼ã¿ã使ã£ãè«æ(On Using Nearly-Independent Feature Families for High Precision and Confidence)ãèªããã¨ã«ãã¾ãï¼
ã¤ã³ããã¬ãã«ã§ããèªãã§ãã¾ãããï¼å
容ã¨ãã¦ã¯ï¼è¤æ°ã®ã½ã¼ã¹(ææ¸ï¼é³å£°ï¼æ å)ãªã©ã®ç¹å¾´éãããæã«ï¼ããããã®ç¹å¾´éãã¾ã¨ãã¦1ã¤ã®å¦ç¿å¨ã«çªã£è¾¼ããã(early fusion)ï¼ç¹å¾´éãã¨ã«å¦ç¿å¨ãæ§ç¯ãã¦ãã£ã¦æå¾ã«çµåããã(late fusion)ãããæ¹ãè¯ãï¼ãããlate fusionãã¦å¾ãããçµæã®false positiveã®ä¸çã¨ããããã®å¦ç¿å¨ã§ã®ããã¨ã®é¢ä¿ã示ããã®ã§é½åãè¯ãï¼ã¨ããæãã«è¦ãã¾ãï¼
èå¿ã®ãã¼ã¿ã«ã¤ãã¦ã¯Section 3ï¼youtubeã«ã¢ãããã¼ããããã²ã¼ã åç»ãããã®ã¿ã¤ãã«ãç¹å®ããã¿ã¹ã¯(ãã®ã¿ã¹ã¯ã®æå³ã¨ã¯â¦?)ã解ãããã«é³å£°ã¨æ åã使ã£ã¦ããããã§ãï¼è«æä¸ã§ã¯1ã¿ã¤ãã«3000æ¬ã®åç»ã30ã¿ã¤ãã«å + 追å ã§è² ä¾ã30000件éããã¨ããã®ã§ï¼å
¬éããããã¼ã¿ã»ããã¨ãµã¤ãºã¯ä¸è´ãã¾ãï¼æ¬¡å
æ°ã«ã¤ãã¦ã¯è«æä¸ã§ã¯"The end result is roughly 13000 audio features and 3000 visual features"ã¨æ¸ããã¦ããã®ã§ãã®å¯¾å¿ã«ã¤ãã¦ã¯ä¸æã§ãï¼
Amazon Access Samples
UCI Machine Learning Repository: Amazon Access Samples Data Set
æå¾ã¯å°ãå¤ãã£ããã¼ã¿ï¼
20000次å
ï¼30000件ã§æ§æãããã¢ãã¾ã³å
ã§ã®ã¢ã¯ã»ã¹æ¨©ã®ä»ä¸ãã¼ã¿ã§ãï¼
誰ã«å¯¾ãã¦ã©ã®æ
å ±ã«ãã¤ã©ã®ãããªã¢ã¯ã»ã¹æ¨©éãä¸ããããã®ãï¼ãã®äººã¯ã©ããªå±æ§ãªã®ãã大éã«ä»ä¸ããã¦ãã¾ãï¼
Amazonã¯Amazon.com - Employee Access Challenge | Kaggleã¨ãã£ãã³ã³ããã£ã·ã§ã³ãéå¬ãã¦ããã®ã§ãããã£ã社å
çãªåãçµã¿ããããã®ã¨æããã¾ãï¼
ã¾ã¨ã
ä»å㯠UCI Machine Learning Repository ã«ã¤ãã¦ï¼ã¡ãã£ã¨å¤ãã£ããã¼ã¿ãç´¹ä»ãã¦ã¿ã¾ããï¼
ãããã«ã¯ UCI Machine Learning Repository ã ãã§ãªãï¼ãã®ä»ãã¾ãã¾ãªäººããã¾ãã¾ãªãã¼ã¿ãå
¬éãã¦ãã¾ãï¼
ææ³ããã§ã¯ãªãï¼é¢ç½ãããªãã¼ã¿ãè¦ã¦ããã°ããããæ°ãããã¿ãæµ®ãã¶ãã¨ãããããããã¾ããï¼