æç« ãåé¡ããã¡ã¢ pythonã®gensimã¨ããã©ã¤ãã©ãªã使ã LDAãã¤ããã¾ãï¼ LDAã®è§£èª¬ã¯beroberoå çã®ãããè¶ è©³ããã®ã§å²æ Wikiãã¼ã¿ãå¦ç¿ããã¦ä»»æã®æç« ãåé¡ããï¼ ãã®è¨äºãæè¦ãã¦åé¡ãåºæ¥ãã¨ä¾¿å©ã ï¼ã¨æã£ãã®ã§ï¼ åé¡æ師ãã¼ã¿ã®ã¯ã¬ã³ã¸ã³ã°ã¨è¤åèªã«ããåãã¡æ¸ã çµå±å ¬éããããã¼ãã£ã¦ãã¨ã§ï¼ä¸è¨ã®ã¹ã¯ãªããã§ã¯ã¬ã³ã¸ã³ã°ã¨åãã¡æ¸ããä¸æ°ã«è¡ãã¾ã. # -*- coding: utf-8 -*- import MeCab import re import unicodedata class Cleanser(): def __init__(self): self.patUrl = re.compile("https?://[\w/:%#\$&\?\(\)~\.=\+\-]+") self.patXml = re.compile("<(\
{{#tags}}- {{label}}
{{/tags}}