2. å 容 ⢠NLPã§ç¨ãããããããã¯ã¢ãã«ã®ä»£è¡¨ã§ãã LDA(Latent Dirichlet Allocation)ã«ã¤ãã¦ç´¹ä» ãã ⢠æ©æ¢°å¦ç¿ã©ã¤ãã©ãªmalletã使ã£ã¦ãLDAã使 ãæ¹æ³ã«ã¤ãã¦ç´¹ä»ãã
èªãã èªç¶è¨èªå¦çãæ©æ¢°å¦ç¿ã®è«æã twitter ã§ã¡ããã£ã¨ç´¹ä»ãã¦ã¿ãããã¦ããã ããã£ã¨æçã«æ¸ãã¦ããæã(ã¨åæã«æã£ã¦ãã)ãªã®ã ããè«æåãæ¸ãä½ç½ããªãã®ã¨ãçãã¨ã¯è¨ãï¼åï¼åã® tweet ã«ã¯åé¢ãã¦ãã¾ãããããæ¸ç¹ã ã¨ããããã§ãã¯ã¦ãªãã¤ã¢ãªã¼ã® twitter è¨æ³ã§è©¦ãã«ã¾ã¨ãã¦ã¿ãã®ã ãããã¼ãã決ãã¦è¦ãããã¯ãªããªãâ¦â¦ã åç·¨éãã¦ã¾ã§ç´¹ä»ãããè«æãªããå¥éè¨äºãæ¸ãã°ããããæ©ã¾ããã åæ師CRF "Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling" (Jiao+, COLING/ACL 2006) http://www.metabolomics.ca/News/publications/Jiao_et_al
ä»æ¥ã¯å°ãã¿ã OpenAI ã® text-embedding-ada-002(ä»¥ä¸ ada-002) 㨠text-embedding-3-small/large(ä»¥ä¸ 3-small) ã¯ããã¹ããåãè¾¼ã¿ãã¯ãã«ã«å¤æããã¢ãã«ã®ä»£è¡¨æ ¼ã§ã3-small 㯠ada-002 ãã精度ãé«ãã¨è¨ããã¾ãããåè¾¼ã¢ãã«ã®ç²¾åº¦ãé«ãã£ã¦ã©ããããã¨ã ãããã¨ãã話ã åãè¾¼ã¿ãã¯ãã«å士ã®ã³ãµã¤ã³é¡ä¼¼åº¦ãè¨ç®ãããã¨ã§ããã¹ãã®æå³ã®é¡ä¼¼åº¦ãããããã¨ããã®ãåãè¾¼ã¿ã¢ãã«ã®å£²ãæå¥ã§ãããå®éã«ã¯æå³ã ãã§ã¯ãªã表ç¾ã®è¿ããããªãåæ ããã¾ããæãé¡èãªä¾ã¯è¨èªã§ããããå¥ã®è¨èªã ãåãæå³ã®ããã¹ããããå¥ã®æå³ã ãåãè¨èªã®ããã¹ãã®ã»ãããåãè¾¼ã¿ãã¯ãã«ã®é¡ä¼¼åº¦ã大ãããã¨ãçããããã¾ããã ããããäºæ ãããä¾ãã° RAG ã使ã£ãã·ã¹ãã ãæ§ç¯ããå ´åãè¤æ°ã®è¨èªãæ··ãã£ã
One application of LDA in machine learning - specifically, topic discovery, a subproblem in natural language processing â is to discover topics in a collection of documents, and then automatically classify any individual document within the collection in terms of how "relevant" it is to each of the discovered topics. A topic is considered to be a set of terms (i.e., individual words or phrases) th
probabilistic latent semantic analysis (pLSA)â ææ¸ã¨åèªãªã©ï¼é¢æ£2å¤æ°ã®è¨æ°ãã¼ã¿ã®çæã¢ãã«ï¼ ææ¸(document)ï¼\(d\in\mathcal{D}=\{d_1,\ldots,d_N\}\)ï¼ èª(word)ï¼\(w\in\mathcal{W}=\{w_1,\ldots,w_M\}\)ï¼ æ½å¨å¤æ°ã®è©±é¡(topic)ï¼\(z\in\mathcal{Z}=\{z_1,\ldots,z_K\}\) ã使ã£ãææ¸ã¨åèªã®çæã¢ãã«ãpLSA (probabilistic latent semantic analysis) \[\Pr[d,w]=\Pr[d]\sum_{z\in\mathcal{Z}}\Pr[w|z]\Pr[z|d]\] ããã¯ï¼ææ¸ã¨èªã«ã¤ãã¦å¯¾ç§°ã«å®ç¾©ãããã¨ãã§ãã \[\Pr[d,w]=\sum_{z\in\mat
Gibbsãµã³ãã©ã¼ (Gibbs sampler)â ä¸åº¦ã«ä¸ã¤ã®ç¢ºçå¤æ°ã ããæ´æ°ããMCMCã®ä¸ã¤ã§ããï¼MCMC ã®ä¸ã§ãæããã使ããããã®ã§ãããï¼ ç®çã®åå¸ãã決ã¾ãæ¡ä»¶ä»åå¸ã«å¾ã£ã¦ä¹±æ°ãçºçãããï¼ ãããã rejection rate 㯠0 ã ãï¼ç¢ºçå¤æ°ã®å¤ãã¨ã©ã¾ã確ç㯠0 ã§ã¯ãªãã®ã§ï¼Metropolis-Hastingsæ³ããå¿ ãããåãã¨ããããã§ã¯ãªãï¼ -- ããã» â
ããããGibbs Samplingã«ã¤ãã¦ã®ã¡ã¢ã§ãã æç§æ¸ãªã©ã§ã¯ãã®ãã¹ãµã³ãã©ã¼ãã¨æ¸ããã¦ããæ¹ãå¤ãã®ã§ããã ç§ã¯Gibbs Samplingã§ç¿ã£ãã®ã§ããã§ã¯ããã§éãã¾ãã ãGibbs Samplingã®æé ã #include <stdlib.h> #include <stdio.h> #include <math.h> #include "randlib.h" int main( void ) { // æ¯éå£ã®å¹³åå¤ double trueMean = 5.0; // æ¯éå£ã®åæ£ double trueVar = 1.0; // 観測å¤æ° int dataNum = 1000; // 観測å¤æ ¼ç´å double y[dataNum]; // 観測å¤ã®å¹³å double xbar = 0.0; // 観測å¤ã®åæ£ double xvar = 0.0; // äº
å¼ãç¶ãããã¿ã¼ã³èªèã¨æ©æ¢°å¦ç¿ã(PRML) 11ç« äºç¿ä¸ã Gibbs ãµã³ããªã³ã°ãããã¯ãã試ãã¦ã¿ãããã syou6162 ããã試ãã¦ã¯ãã®( http://d.hatena.ne.jp/syou6162/20090115/1231965900 )ããªããã ãã§ããããã ãã©ããã£ããã ããå¤æ¬¡å ä¸è¬åãããã r_mul_norm1 <- function(x, mu, Sig) { idx <- 1:length(mu); for(a in idx) { b <- idx[idx!=a]; # b = [1,D] - a s <- Sig[b,a] %*% solve(Sig[b,b]); # Σ_ab Σ_bb ^ -1 # (PRML 2.81) μ_a|b = μ_a + Σ_ab Σ_bb ^ -1 (x_b - μ_b) mu_a_b <- mu[a] + s
Latent Dirichlet Allocationã¯ããã¹ãã®ãããªä¸é£ç¶ãã¼ã¿ã®ããã®çæç確çã¢ãã«ãå ¥åã¯ããã¥ã¡ã³ããåºåã¯ããã¥ã¡ã³ããç¹å¾´ã¥ããä½ãï¼tf-idfã¿ãããªããï¼ã åºæ¬çãªã¢ã¤ãã£ã¢ã¯ãããããã¥ã¡ã³ãã¯æ½å¨çãªããã¤ãã®ãããã¯ãæ··åãã¦ãã¦ãããããã®ãããã¯ã¯èªã®åå¸ã§ç¹å¾´ã¥ãããã¦ãããã¨ãããã¨ã è«æ[1]ã§ã¯Î±ã¨Î²ã¨ãããã©ã¡ã¼ã¿ãç¨ãã¦ããã¥ã¡ã³ãã以ä¸ã®ããã«çæãããã¨ä»®å®ãã¦ããã ããã¥ã¡ã³ãã®ãããã¯ã®åå¸Î¸ããã£ãªã¯ã¬åå¸Dir(α)ã«åºã¥ãã¦é¸ã°ããã ããã¥ã¡ã³ãã®èªæ°Nåã«ãªãã¾ã§ä»¥ä¸ãç¹°ãè¿ãã ãããã¯znãå¤é åå¸Mult(θ)ã«åºã¥ãã¦é¸ã°ããã åèªwnã確çp(wn|zn,β)ã§é¸ã°ããã ãã ãããããã¯zã®æ°ãkåãåèªwã®ç¨®é¡ãVåã¨ããã¨ããã©ã¡ã¼ã¿Î±ã¯k次å ã®ãã¯ãã«ãβã¯k x V次å ã®è¡åã§Î²ij=
LDA ã¨ã¯ "Latent Dirichlet Allocation"ãææ¸ä¸ã®åèªã®ããããã¯ãã確ççã«æ±ããè¨èªã¢ãã«ã ãæ½å¨çãã£ãªã¯ã¬é åæ³ãã¨è¨³ããã¦ãããã¨ããããããã®ååã ã¨ãããã£ã¦ãªãã ã£ãï¼ãã¨ãã人ã®ã»ããå¤ããï½ã ååèªããé ããããã¯ã(話é¡ãã«ãã´ãªã¼)ããçæããã¦ãããã¨æ³å®ãã¦ããã®ãããã¯ãææ¸éåããæ師ç¡ãã§æ¨å®ãããã¨ãã§ãããç¹å¾´ã¯ãæç©ã® apple ã¨é³æ¥½ã® apple ã¨ã³ã³ãã¥ã¼ã¿é¢é£ã® apple ãåºå¥ãããã¨ãåºæ¥ã(ãã¨ãæå¾ ããã)ã¨ããç¹ããã®ããã«ãã©ã®ãããã¯ãçæãããããã¨ããåå¸ãåæç« ãæã¤ãç´°ãã話ã¯ç¥ã çµæã®è¦æ¹ã¨ãã¦ã¯ãå®éçã«ã¯ãã¼ãã¬ãã·ãã£ãè¦ãã(ä¸è¬ã«å°ããã»ã©ãã)ãå®æ§çã«ã¯åãããã¯ãã©ã®ãããªåèªãçæãããããã®ç¢ºçä¸ä½ã®ãã®ãè¦ã¦ãµããµãããããã®ãåãããã¯ãçæããåèªã
ãç¥ãã
ã©ã³ãã³ã°
ã©ã³ãã³ã°
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}