ãæ¨æ¥ã®ã¯ã¦ãªã®ãããã¨ã³ããªã¼ã«『入門ベイズ統計』の読みどころã¨ããè¨äºãè¼ã£ã¦ãã¾ããããã¤ãºçè«ã®äººæ°ã¯æ ¹å¼·ãã§ããã
ããã¼ã¹ã¨ãã¦ã®æ°å¼ã¯å²ã¨ã·ã³ãã«ãªã®ã§ãèªåã§å®è£
ãã¦ãããç¨æéã¯æãããªãããããã¾ãããããããCPANã®ã¢ã¸ã¥ã¼ã«ã¨ãã¦æä¾ããã¦ããã®ã§ããã¡ãã使ç¨ããã®ãè¯ããã¨æãã¾ããç§ãç¥ã£ã¦ããæã§ã¯ãAlgorithm::NaiveBayesãç°¡åã§ä½¿ããããã£ãã§ãã
ãææ¸ããã³ã¼ãã§ãããä¸ã®ãµã³ãã«ã§ã¯ç°¡åãªã¹ãã ãã£ã«ã¿ã¼ãä½ã£ã¦ãã¾ããspam.txtã¨ham.txtã¯ãããããã®ã³ã¼ãã¹ãå½¢æ
ç´ è§£æãã¦ä½ã£ãåèªã®ã¿ã®ãªã¹ãã§ããtest.txtã¯ãå¤å®ãããæç« ããæ½åºããåèªã®ãªã¹ãã§ããã¹ãã ã¨ãã ã®éãå¢ããã°ãããã ãã§ãå²ã¨ä½¿ãç©ã«ãªãã¾ãã
ãå¿ç¨ä¾ã¨ãã¦ã¯ãã¹ãã ã¨ãã ã®2種é¡ã®ã«ãã´ãªã ãã§ã¯ãªããè¤æ°ç¨®é¡ã®ã«ãã´ãªãä½ãã°ããã°ã®èªååé¡çãåºæ¥ãã§ããããã³ã¼ãã¹ã¨ãã¦ã¯ãã¯ã¦ãªã®ã¿ã°ãYahooï¼ããã°ã®åã«ãã´ãªããèªååéããä»çµã¿ãä½ãã°é¢ç½ãã®ã§ã¯ï¼ããã°çã®æç« ã®èªåã«ãã´ã©ã¤ãºçãåºæ¥ãã¨æãã¾ãã
#!/usr/bin/perl use strict; use warnings use Algorithm::NaiveBayes; use Data::Dumper; my $bayes = Algorithm::NaiveBayes->new; my %list; %list = &getHash("spam.txt"); $bayes->add_instance( attributes => {%list}, label => 'spam', ); %list = &getHash("ham.txt"); $bayes->add_instance( attributes => {%list}, label => 'ham', ); $bayes->train; %list = &getHash("test.txt"); my $result = $bayes->predict( attributes => {%list} ); print Dumper($result); sub getHash { my $file = $_[0]; my %hash; open INFILE, "< $file" or die "Cannot open file: $file"; while (<INFILE>) { chomp(); if (!$hash{$_}) {$hash{$_} = 0}; $hash{$_} = $hash{$_}+1; } return %hash; }
ãã®ä¾ã§ãã¨ãé½åº¦å¤å®ç¨ã®ãã¼ã¿ãã¼ã¹ãä½ã£ã¦ããã®ã§å¹çãæªãã§ããå®éã«ä½¿ãã¨ãã¯ãAlgorithm::NaiveBayes::Model::Frequencyã§æ°¸ç¶åããã¨ãã£ããã¨ãå¿
è¦ã§ããï¼ãã®åã«ã¹ãæ¸ããªã®ã§ãOOã§ä½ããªãããæ¹ãããã§ããããï¼