R ã® stats ããã±ã¼ã¸ã§æä¾ããã¦ãã kmeans é¢æ°ã¯ãæ¢å®ã§ã¯ Hartigan-Wong ã®ã¢ã«ã´ãªãºã ãå©ç¨ãã¾ããé常㮠k-means (Lloyd ã®ã¢ã«ã´ãªãºã ) ã§ã¯ãåãã¼ã¿ç¹ãæãè¿ãã¯ã©ã¹ã¿ã«å²ãå½ã¦ãæä½ãç¹°ãè¿ãã¾ãããHartigan-Wong ã®æ¹æ³ã¯ããç´æ¥çã«ãéåå誤差ã®å¢åãæå°åããã¯ã©ã¹ã¿ã«ãã¼ã¿ç¹ãå²ãå½ã¦ãæ¹æ³ã«ãªã£ã¦ãã¾ãã
Hartigan-Wong ã®è«æ*1ã¯ä¸è¨ã®ã¦ã§ããµã¤ãã«ããã¾ã*2ã
http://www.jstor.org/stable/2346830
ã¾ããä¸è¨ã® 2 æ¬ã®è«æ*3*4ãªã©ã§ããã®ã¢ã«ã´ãªãºã ã«ã¤ãã¦è«ãããã¦ãã¾ãããã¨ãã°åè
ã«ããã¨ãHartigan-Wong ã®æ¹æ³ã§è¦ã¤ããå±æ解㯠Lloyd ã®æ¹æ³ã§è¦ã¤ããå±æ解ã®çé¨åéåã«ãªã£ã¦ããã¨ã®ãã¨ã§ã (Theorem 2.2)*5ã
http://www.jmlr.org/proceedings/papers/v9/telgarsky10a/telgarsky10a.pdf
http://ijcai.org/papers13/Papers/IJCAI13-249.pdf
å¾è ã®è«æã®å 容ãç§ã«ã¯åãããããæããã®ã§ããã¡ã㧠Hartigan-Wong ã®ã¢ã«ã´ãªãºã ãç´¹ä»ãã¾ããè«æã®å¼ (2) ã Hartigan-Wong ã®ã¢ã«ã´ãªãºã ã®æ¬è³ªã§ãããã®å¼ã¯ãã¯ã©ã¹ã¿ã«ãã¼ã¿ç¹ã追å ãããã¨ã«ãã£ã¦çããéåå誤差ã®å¢å éãæå³ãã¾ããè·é¢é¢æ° ãäºä¹ã¦ã¼ã¯ãªããè·é¢ã®å ´åã«ã¯ãå¼ (7) ã®ããã«å¤å½¢ã§ãã¾ããã¾ããåè«æã® Figure 1 ã¯ãã¢ã«ã´ãªãºã ãæ¬ä¼¼ã³ã¼ãã§ç¤ºãããã®ã§ããFigure 1 ã® ãå¼ (7) ã§å ·ä½çã«è¨ç®ããã°ãHartigan-Wong ã®ã¢ã«ã´ãªãºã ãå®è£ ã§ãã¾ãã
äºä¹ã¦ã¼ã¯ãªããè·é¢ã®å ´åã«å¼ (2) ãå¼ (7) ã®ããã«å¤å½¢ãããã¨ãããã¨ããããããã®å¼ãè¦æ¯ã¹ãã ãã§ã¯ç§ã«ã¯ç解ã§ãã¾ããã§ãããããã§ãå®éã«å¼å¤å½¢ãè¡ã£ã¦ããã®ãã¨ã確èªãã¦ã¿ã¾ãã
ç¹ ãã¯ã©ã¹ã¿ ã«è¿½å ããã¨ãã®éåå誤差ã®å¢åã¯ãè«æã®å¼ (2) ã®ã¨ããã次å¼ã§è¡¨ãã¾ãããã ããå°æå㮠㯠ã追å ããåã®ã»ã³ããã¤ãã 㯠ã追å ããå¾ã®ã»ã³ããã¤ãã¨ãã¾ããæ¬å¼§å ã®äºé ã追å å¾ã®éåå誤差ãæå¾ã®é ã追å åã®éåå誤差ã«ç¸å½ãã¾ãã
ãã®å¼ãã 㨠ãæ¶å»ã㦠㨠ã®ã¿ã§è¡¨ãããããè«æã® (7) å¼ã«ãªã£ã¦ãããã¨ã示ãã¾ãã
ã¾ãã 㯠㨠ãç¨ãã¦ä»¥ä¸ã®ããã«è¡¨ãã¾ãã
ãããç¨ããã¨ã ã®ç¬¬ä¸é ã¯ä»¥ä¸ã®ããã«å¤å½¢ã§ãã¾ãã
æ®ãã®äºé ã¯ã次ã®å½¢ã«ã¾ã¨ãã¦èãã¾ãã
ç·åè¨å·ã®å å´ã¯æ¬¡ã®ããã«è¨ç®ã§ãã¾ãã
ãããã£ã¦ã
ã¨ãªãã¾ããããã§ã ã
ã ã£ããã¨ãããä¸å¼ã¯ããã«
ã¨ãªãã¾ãããã㧠ã®åé ã 㨠ã®ã¿ã§è¡¨ããã¨ãã§ãã¾ããã
第ä¸é ã¨åããã¦å ¨ä½ãã¾ã¨ããã¨ãæçµçã«æ¬¡ã®ããã«ãªãã¾ãã
è«æã®å¼ (7) ã¨æ¯è¼ãã¦ã¿ãã¨ã ã®æç¡ãç°ãªãã¾ãã*6ããã㯠k-means ã®å¦çã§ã¯å®æ°é ã«ãªãããç¡è¦ã§ãã¾ãã以ä¸ã®è¨ç®ã«ãããå¼ (7) ãæå°åããå²ãå½ã¦ããéåå誤差ã®å¢åãæå°ã«ããå²ãå½ã¦ãæå³ãã¦ãããã¨ã示ãã¾ããã
*1:J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). Vol. 28, No. 1 (1979), pp. 100-108.
*2:è«æã®ãã¦ã³ãã¼ãã¯ææ ($29.00) ã§ããã¢ã«ã¦ã³ããä½æããã°ãã¦ã§ããã©ã¦ã¶ä¸ã§ã¯ç¡æã§èªãã¾ãã
*3:Matus Telgarsky and Andrea Vattani. Hartigan's Method: k-means Clustering without Voronoi. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010.
*4:Noam Slonim, Ehud Aharoni, and Koby Crammer. 2013. Hartigan's K-means versus Lloyd's K-means: is it time for a change?. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (IJCAI '13), Francesca Rossi (Ed.). AAAI Press 1677-1684.
*5:ã¡ãªã¿ã«ããã® Theorem 2.2 㯠"a (possibly strict) subset" ã¨ãã表ç¾ã«ãªã£ã¦ãã¾ãããåè«æã® introduction ã§ã¯ "a strict subset" ã¨æ¸ããã¦ãã¦ããããã¥ã¢ã³ã¹ãç°ãªãããã«æãã¾ãã
*6:éåå誤差ãäºä¹ã¦ã¼ã¯ãªããè·é¢ã®å®ç¾©ã®å¾®å¦ãªéãã«ç±æ¥ãããã®ã ã¨æãã¾ãã