id:naoya ããã®Latent Semantic Indexing の記事ã«è§¦çºããã¦ããã1é±éã»ã©ã¡ããã¡ããè¦ã¦ããè¡åã®è¿ä¼¼è¨ç®ææ³ã«ã¤ãã¦æ¸ãã¦ã¿ããããã§ããããã®ã¯åèª-ææ¸è¡å(ã©ã®åèªãã©ã®ææ¸ã«åºã¦ãããã®å ±èµ·è¡å)ãè³¼å ¥è -ã¢ã¤ãã è¡å(ã©ã®äººãã©ã®æ¬ãè²·ã£ããã¨ããæ¨è¦ã¨ã³ã¸ã³ã§ä½¿ãè¡å)ããã¼ã¸-ãªã³ã¯è¡å(ã©ã®ãã¼ã¸ããã©ã®ãã¼ã¸ã«ãªã³ã¯ãåºã¦ãããããããã¯ãªã³ã¯ãããã£ã¦ããããPageRank ãªã©ãã¼ã¸ã®ã©ã³ãã³ã°ã®è¨ç®ã«ä½¿ã)ãã¨ãã£ããããªè¡åãè¨ç®ããã¨ãã大è¦æ¨¡è¡åã ã¨è¨ç®éã»è¨æ¶ã¹ãã¼ã¹ã¨ãã«è¨å¤§ãªã®ã§ãäºåã«ããç¨åº¦è¨ç®ãã¦ãããã®ã§ããã°ãã§ããã ãå°ãããã¦ãããã(ããã¦å¯è½ãªãã°ç²¾åº¦ãä¸ããã)ãã¨ããææ³ã§ããã
è¡åã®å§ç¸®ã«ã¯å ã®è¡åã A (mè¡nå)ã¨ãã㨠A = USV^T ã¨ããããã«3ã¤ã«å解ãããã¨ãå¤ããããã£ã¨ãããç¥ããã¦ããã®ã¯ naoya ãããæ¸ããã¦ããããã«ãè¡åã®ç¹ç°å¤å解(Singular Value Decomposition: SVD)ãè¡ããU 㨠S 㨠V ãæ±ããææ³ã§ãS 㯠A ã®ç¹ç°å¤ãé ã«å ¥ã£ã¦ãã対è§è¡åã¨ãªããè¡åã®æ å ±éãå§ç¸®ããããã°ãä¸ä½ k åã®ç¹ç°å¤ãç¨ãããããã¨ãæ å ±ã¯è½ã¡ããã®ã®å ã®è¡åãè¿ä¼¼ãããã¨ãã§ãã(æ å ±ãè½ã¡ãããã¤ãã¹ãã¨ããã¨ããã¤ãºãæ¸ããã¹ã ã¼ã¸ã³ã°å¹æããã£ããããã®ã§ãå®å ¨ã«ãã¤ãã¹ã¨ã¯è¨ãåããªã)ã
ããã¯ãã¨ãã°æ¤ç´¢ã ã¨ã¯ã¨ãªã«ãããäºåã«è¨ç®ãã¦ãããã¨ãã§ããã¨ããå©ç¹ããããã¾ãæ°å¦çã«ãããã(主æååæãPCA ãæ¬è³ªçã«ã¯ SVD ãã¦ããã®ã¨ä¸ç·)ãªã®ã§ãå²ã¨åºãç¥ããã¦ãããã¨æã(使ããã¦ãããã©ããã¯åãããªããçç±ã¯ä»¥ä¸ã«è¿°ã¹ã)ããã ãã®ææ³ã«ã¯ä¸ã¤ã®åé¡ç¹ãããã
第ä¸ã«ãSVD ã¯åºæ¬çã« O(min(n^2 * m, n * m^2))ãããè¨ç®(対称è¡åã 㨠O(m^3))ã§ãæ°ä¸ã¨ã³ããªãããã¾ã§ãªããªãã¨ããªãã®ã ãããããè¶ ããã¨ãã¤ã¼ãã«ã¯è¨ç®ã§ããªããªããã¾ããããããã¿ã¹ã¯ã®å ´åè¡åãçè¡åã§ãããã¨ãå¤ãã®ã§ãLanczos method ãªã©ã使ã£ã¦å¹ççã« k åã®åºæå¤ãæ±ãããã¨ãã§ããªãã¯ãªã(ãã ãå®éã«è¨ç®ããã¨ãªãã¨ããã¤è¨ç®ãçµäºããã®ãã®è¦ç©ããããã«ããã¨ããæ¬ ç¹ã¯ãã)ãã¾ããJacobi æ³ãç¨ããã¨å²ã¨æ¥½ã«åå²ãã¦ä¸¦åè¨ç®ãã§ããããã§ã並åãã¦è¨ç®ãããã¨ãå¯è½ã§ããããæ°ç¾ä¸xæ°ç¾ä¸ãæ®éã ã£ãããã(ãã¨ãã° Mixi ã§ãã¢ã«ã¦ã³ãæ°ã¯1,000ä¸è¶ ããããã )大è¦æ¨¡è¡åã®å¦çããããã®ã§ãæ¬è³ªçã«ã¯ããããè¨ç®ã¯é¿ãããããã¡ãããLSI ãç®çã§ããã°ãLSI ã®ç¢ºçç pLSI ã¯ä¸¦åè¨ç®å¯è½(å®é Google ã§ã¯ä¸¦åè¨ç®ãã¦ããããã )ãªã®ã§ãpLSI ã§ãããã¨ãè¨ãã(ãã£ããå ã®è¡åãçè¡åãªã®ã ãããçè¡åã§ããç¹å¾´ããã¾ãå©ç¨ããããã¨ãããã¨ã¯è¨ããã¨æãã)ã
第äºã«ãSVD ãããã¨éè¦ãªæ¬¡å ããå§ç¸®ããã¦ããã®ã ããããå§ç¸®ããã次å ã人éãè¦ã¦ãããªãã ãããããã¨æããã®ããã£ãããã(ã¯ã©ã¹ã¿ãªã³ã°ã§ã人éãã¯ã©ã¹ã¿ãªã³ã°ãããã¾ãèãã¤ããªããããªã¯ã©ã¹ã¿ãã§ãããã¨ãããã®ã¨åã)ãã¤ã¾ããå§ç¸®ãããçµæã®è§£éãé£ãããå§ç¸®ããç¶æ ã®ãã¼ã¿ã§ãã人éãè¦ã¦æå³ãåããã¨å¬ããã(ãããããµãã«ã人éãè¦ã¦åããã¨å¬ãããã¨ããã®ã¯ããã®ã¿ã¹ã¯ã«éãããæ©æ¢°å¦ç¿ã®è¨å®ã§ãããããçæãããã¢ãã«ãè¦ã¦ããªãã ãåãããªããã©ç²¾åº¦ãããã¢ã«ã´ãªãºã ããã¯ãä¸ã§ãªã«ããã£ã¦ããã®ãåãã£ã¦é度ã精度ããããããã¹ã±ã¼ã«ãããããªã¢ã«ã´ãªãºã ã®ã»ãããã)
第ä¸ã«ãSVD ãããã¨ç¢ºãã«å解ããã U 㨠S 㨠V ããããã«æå³ããããå解ããã U 㨠V ã¯ãããã巨大ãªå¯è¡åãS ã¯å°ããçè¡åã«ãªãã®ã ããå ã ã® A ãçè¡åã®å ´åãU 㨠V ãå¯è¡åã«ãªãã®ã¯å¬ãããªã(ã¹ãã¼ã¹ãåã)ãU 㨠V ã¯çè¡åã®ã¾ã¾ã§ãã¦ã»ãã(S ã¯ããããå°ããã®ã§å¯è¡åã§ããã)ã
ã§ãæè¿ææ¡ããã¦ããã®ã CUR å解ã¨ããå解æ³ã§ãè¡åã3ã¤ã«å解ããã®ã¯ SVD ã¨åããªã®ã ããå ã ã®è¡å A ããåããµã³ããªã³ã°ã㦠C ãä½ã(ãã㦠R ãåæ§ã«ä½ã)ãã¨ã§ãä¸è¨3ã¤ã®åé¡ç¹ãå ¨ã¦ã¯ãªã¢ããããµã³ããªã³ã°ããã®ã§ O(m^3) ã¯ããããªãããå ã ã® A ããåãæãåºãã¦ããã ããªã®ã§ã人éãè¦ããã©ãå§ç¸®ãããã®ããä¸ç®çç¶ãããã¦ãC 㨠R ã¯å ã ã®è¡å A ãçè¡åãªãçè¡åã®ã¾ã¾æ±ããã¨ãã§ããã
ç´æçã«ã¯ããã¨ãã°è³¼è²·è -ã¢ã¤ãã è¡åãä¾ã«åãã¨ãå ¨è³¼è²·è ã®ä¸ããã¢ãã¿ã¼ã«ãªã£ã¦ããã購買è ãä¸å®æ°é¸ã³ãããã«å ¨ã¢ã¤ãã ã®ä¸ããæ¶è²»è¡åãç¹å¾´ä»ããååãä¸å®æ°é¸ã³ãå ¨ä½ã®è³¼è²·è ã¨ã¢ã¤ãã ãè¿ä¼¼ãããã¨ããå ·åã§ããã
ãµã³ããªã³ã°ã¯ä¸æ§ã«ãµã³ããªã³ã°ããæ¹æ³ãããéã¿ã¥ããã¦ãµã³ããªã³ã°ããæ¹æ³ããããåãåãä½åããµã³ããªã³ã°ã㦠C ãä½ãã®ã¯ç¡é§ãªã®ã§åãåã¯1åãã使ããªãããã«ããæ¹æ³(CMD)ãããã¦åã®ä¸ã«ã¯ä»ã®äºä¾ã®ç·å½¢çµåã§è¡¨ããã¨ãã§ããäºä¾ãããã®ã§ããã¯ã¾ã¨ãã¦å§ç¸®ããæ¹æ³(Colibri)ããããä¸çªæå¾ã®ææ³ããã¾ã®ã¨ããã® state-of-the-art (KDD 2008)ããã(è¿ä¼¼ãããã¨ã«ãã£ã¦ã©ããããæ§è½ãä½ä¸ããããä¸è¨ã®è«æã«æ¸ããã¦ãã)ã
ã¾ã¨ããã¨ãç¨ä¾ãã¼ã¹ã§è¡åãå解ãããã¨ããææ³ãæè¿ææ¡ããã大è¦æ¨¡ãªè¡åãè¿ä¼¼ã§æ±ãéã«å½¹ã«ç«ã¡ã¾ãããã¨ããã話ããã£ã±ã SVD ã¯ããªãéããå¦çãªã®ã§ãSVD ãå¿ è¦ãªè¨ç®ãããã¨ãã¯å ¨ä½ã®è¡åãæ±ãããµã¤ãº(ç¾å¨ã® PC ã ã¨ããããæ°ä¸xæ°ä¸ãéç)ã«åå²ãããããã¨è¨ç®ãã¦çµåãããã¨ãã£ãè¿ä¼¼ææ³ãä¸è¬çãªã®ããªã¨æãã®ã ããã§ããã ã SVD ã使ããªãã§è¡åã®è¨ç®ãã§ããªãããã¨æã£ã¦èª¿ã¹ã¦ã¿ãã¨ããããããªææ³ãããã®ã ãªããã¨ç¥ã£ãããã(è¡åã®è¿ä¼¼ææ³ã¨ãã¦ã¯ random projection ã¨ãããã¼ã¯ã¼ãã§æ¤ç´¢ããã¨ããããåºã¦ãã)
2æä¸æ¬çºå£²ã®
- ä½è : arton,æ¡ç°èª ,è§ç°ç´è¡,åç°å人,ä¼è¤ç´ä¹,西ç°åä»,岡éå大è¼,縣ä¿è²´,大å¡ç¥æ´,nanto_vi,徳永æä¹,å±±æ¬é½å¹³,ç°ä¸æ´ä¸é,ä¸å²¡ç§å¹¸,ããã¯,æ¦è æ¶ç´,é«æå²,å°é£¼å¼¾,ã¯ã¾ã¡ã2,WEB+DB PRESSç·¨éé¨
- åºç社/ã¡ã¼ã«ã¼: æè¡è©è«ç¤¾
- çºå£²æ¥: 2009/02/23
- ã¡ãã£ã¢: 大åæ¬
- è³¼å ¥: 10人 ã¯ãªãã¯: 373å
- ãã®ååãå«ãããã° (45件) ãè¦ã
ã« id:kzk ããã id:tkng ãããO éåããã速習レコメンドエンジンという特集で SVD や LSH、pLSI について書いているã¨èããã®ã§ãããããæ¹é¢ã«èå³ããæ¹ã¯ã©ããã大規模データを扱う際のテクニック、研究の方ではこういうモデルがあるよ、という紹介までããããã§ãèªåã大ãã«æå¾ ã
åè
- CIKM 2008 ã®ãã¥ã¼ããªã¢ã«ãCUR ã«èå³ãã人ã¯ããããè¦ãã¨ãããã: Large graph mining: patterns, tools and case studies
- CUR ã®åè«æ: Petros Drineas, Ravi Kannan, Michael W. Mahoney. Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition. SIAM Journal of Computing, Vol.36, No.1. 2006. pp.184--206.
- CUR ãçºå±ãããè«æ: Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Sparse Graph Mining with Compact Matrix Decomposition. Statistical Analysis and Data Mining, Vol. 1, No. 1. 2008. pp. 6-22.
- ãããããã«çºå±ãããè«æ: Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S Yu, Christos Faloutsos. Colibri: Fast Mining of Large Static and Dynamic Graphs. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD-2008). pp. 686-694.
- 楽天も情報爆発しています
- NAIST マニアック講義録: リンク解析と周辺の話題
- スペクトラルクラスタリングは次元圧縮しながらKmeansする手法