SPSSã§ç°¡åããã¹ããã¤ãã³ã°
ãSPSSã¯Pythonã¨é£æºãããã¨ãåºæ¥ã¾ããSPSSã¯å¤§å¤å¤æ©è½ã§ãããæ¥åã§å®éæ±ããã¼ã¿ã¯ä¸çç¸ã«ã¯ããã¾ãããæ§ã ãªåå¦çãå¿ è¦ã§ããSPSSã«è½ã¨ãè¾¼ããããããã¼ã¿ã®æ´åãã¯ãªã¼ãã³ã°ãPythonã§ç°¡åã«ãã£ã¦ãã¾ãã¾ããããä»åã¯ããã¹ããSVMã«ãããããã®ä¸æºåãPythonã§è¡ãã¾ããå½¢æ ç´ è§£æã«ã¯MeCab-野良ビルドãç¨ãã¾ããã¾ãã¯é »åº¦ã«ã¦ã³ããã¦ã¿ã¾ã
#coding:utf-8 import sys import MeCab #MeCabãå¼ãã§ä½¿ããããã«ãã tagger = MeCab.Tagger("-Owakati") #åãã¡æ¸ããããæå® read_file = sys.argv[1] #ã³ãã³ãã©ã¤ã³ããèªã¿è¾¼ããã¼ã¿ãã¡ã¤ã«ãæå®ãã all_text = open(read_file).read() #æå®ãããã¡ã¤ã«ãèªã¿è¾¼ã word_list = tagger.parse(all_text).split() #èªã¿è¾¼ãã ãã¡ã¤ã«ãåãã¡æ¸ãããçæãããé åãword_listã«æ ¼ç´ dictionary ={} #空ã®è¾æ¸ä½æ for word in word_list: #dictionaryã«åèªãç»é²ããã¦ããã°é »åº¦ã+1ããç»é²ããã¦ããªããã°è¾æ¸ã«åèªãç»é²ãããã®é »åº¦ã1ã¨ãã if word in dictionary: dictionary[word] = dictionary[word] + 1 else: dictionary[word] = 1 for word, count in sorted(dictionary.items(), key = lambda x:x[1], reverse = True): #dictionaryã«ç»é²ãããåèªãé »åº¦éé ã§è¡¨ç¤º print word + "\t -> " + str(count)
ãããã§é »åº¦ã«ã¦ã³ããåºæ¥ã¾ããããã®ãã¼ã¿ãç¨ãã¦SPSSã§åèªã®ãã¹ãã°ã©ã ãæããªã©ãã¦ã¿ãã¨ããã§ãããï¼SPSSãæã¡ã®æ¹ã¯ï¼ã
ã次ã¯ããã¹ããSVMã«ããã¦ã¿ã¾ããããSVMã«ãããããã«ã¯ãããã¹ããIDåããªããã°ãªãã¾ãããSVMã§å¦çã§ãããã¼ã¿ã¯ãã¯ã©ã¹ã¨IDã¨IDã®å¤ã¨ããå½¢å¼ã§ããä¾"+1 :: ID1:12, ID2:4, ID3:9 ID4:4"
ããã¹ãã®IDåã¯è²ã
ãªããæ¹ãããã¾ãã®ã§ããã®ä¸ä¾ã示ãã¾ãããç¬ãé£ãã¦æ£æ©ãã¨ããããã¹ããä¸ããããID群ãç¬=ID1ãç«=ID2ãæ£æ©=ID3ã¨å²ãæ¯ããã¦ããå ´åï¼ããã¦ãé£ãã¦ãã¨ããåèªã«IDæ¯ããã¦ãªããã°ï¼ããç¬ãé£ãã¦æ£æ©ãâãID1:1, ID2:0, ID3:1ãã¨ãªãã¾ãããã®ãããªãã¼ã¿å½¢å¼ã«è½ã¨ãè¾¼ãããããªPythonã³ã¼ããæ¸ãã¾ãããã
#coding:utf-8 import sys import MeCab tagger = MeCab.Tagger("-Owakati") read_file = sys.argv[1] read_dictionary = sys.argv[2] #ID群ãæ¯ãå½ã¦ãããåèªè¾æ¸ text_list = open(read_file).read().split('\n') dictionary = open(read_dictionary).read().split('\n') print ',' + ','.join(dictionary) def set_id(text): count = 0 id = [] for word in dictionary: count += 1 id.append(str(text.count(word))) return text + ',' + ','.join(id) for text in text_list: print set_id(text)
ãããã§ããã¹ããã¼ã¿ãSVMã«æ¾ãè¾¼ããããIDååºæ¥ã¾ããä»ã®ã¯IDãäºåå²ãæ¯ããã¦ããã¨ããåæã§ããããå®éã¯IDè¾æ¸ãèªä½ããå¿ è¦ãããã¾ããé¢åãããã®ã§ãããèªååãã¦ãã¾ãã¾ãããï¼ç®çã«åããã¦æä½æ¥ããæ¹ã精度è¯ãã§ããï¼ããµã³ãã«ãã¼ã¿ãé£ãããæå®ããä¸éå¤ããåºç¾é »åº¦é«ãåèªã ããæ½åºãã¾ãã
#coding:utf-8 import sys import MeCab tagger = MeCab.Tagger("-Owakati") read_file = sys.argv[1] all_text = open(read_file).read() word_list = tagger.parse(all_text).split() dictionary = {} for word in word_list: if word in dictionary: dictionary[word] = dictionary[word] + 1 else: dictionary[word] = 1 #ããã¾ã§ã¯åã min = sys.argv[2] #é »åº¦ä¸é for word, count in dictionary.items(): if int(count) >= int(min): #è¨å®ããä¸é以ä¸åºç¾ããåèªã ããåºå print word #åºåçµæããªãã¤ã¬ã¯ãã§åå¾ãããªã©
ããã®çµæåãããã¡ã¤ã«ãå ã»ã©ã®ã³ã¼ãã®ç¬¬äºå¼æ°ã«æå®ãã¾ããå®è¡ããã¨ãããªæãã«ãªãã¾ãã
#IDè¾æ¸
ç¯ç½ª
é
ã»ãã¯ã¹
æ´å©
交é
æ»
殺ã
ãã©ãã°
ã·ã³ãã¼
麻è¬
#IDå
åæ ç¯ç½ª é ã»ãã¯ã¹ æ´å© 交é æ» æ®ºã ãã©ãã° ã·ã³ãã¼ éº»è¬ æ´å©äº¤éãã¦ããã人åéä¸ã 0 0 0 1 1 0 0 0 0 0 風éªæ°å³ãªã®ã§é¢¨éªè¬è²·ãã«è¬å±ã¸è¡ã£ã¦ãã 0 0 0 0 0 0 0 0 0 0 æ¸è°·ã«ãã©ãã°ã®å¯å£²äººãããããã 0 0 0 0 0 0 0 1 0 0 ãµããããã¨è¨ã£ã¦ãã¨æ®ºããã絶対殺ã 0 0 0 0 0 0 2 0 0 0 麻è¬ä½é¨ããã°å ¬éä¸ï¼ 0 0 0 0 0 0 0 0 0 1
ãã¨ããããã§ããããã£ã¦ããã¹ãIDåããã¨SPSSã§ç°¡åã«SVMã¨ãã«æ¾ãè¾¼ãã¦æ¥½ããã§ããï¼SPSSãæã¡ã®æ¹ã¯ï¼ãæ¯éãã£ã¦ã¿ã¾ããããã¨ãã£ã¦ãããã¹ããã¼ã¿ååã«ãæã¡ã§ã¯ãªãã±ã¼ã¹ãããã¨æãã®ã§ãtwitterãããã¤ã¼ããåã£ã¦ããã³ã¼ããæ²è¼ãã¦ããã¾ãã
# -*- coding: utf-8 -*- #â ããã¯ä½ï¼ #twitterãããããªãã¯ãªãã¤ã¼ããåå¾ãããã¼ã«ã§ãã #åå¾ããå 容ã¯ãã¤ã¼ãããæéããã¤ã¼ãããIDã¨ååããã¤ã¼ãå 容ã§ãã #èªåã®ã¢ã«ã¦ã³ãã¨ãã¹ãæ¸ããsetting.txtãç¨æãã¦ä¸ãã #ã¹ããªã¼ãã³ã°ããã¼ã¨ãã¦é²è¦§ããã ãã§ã¯ãªããDBãã¡ã¤ã«(tweet.db)ã«æ ¼ç´ãã¾ãã #tweet.dbã¯PupSQLiteãªã©ã§ä¸èº«ãè¦ããã¨ãåºæ¥ã¾ãã #https://www.eonet.ne.jp/~pup/software.html import base64 import simplejson import urllib2 import datetime import sqlite3 import os # ãã¤ãã¿ã¼ã¢ã«ã¦ã³ãè¨å®èªã¿åã with open("setting.txt") as f: userID = f.readline().replace('\r','').replace('\n','') userPassword = f.readline().replace('\r','').replace('\n','') commitDoNum = int(f.readline().replace('\r','').replace('\n','')) #æ¥æ¬èªã®ãã¤ã¼ãã ãåéããããããã¤ã¼ããæ¥æ¬èªãã©ãããã§ã㯠def is_japanese(text): def check_chr(x): return ((x >= 0x3040 and x <= 0x309f) or (x >= 0x30a0 and x <= 0x30ff)) return [ch for ch in text if check_chr(ord(ch))] #SQLite3ã®DBç¨æãæ¢ã«DBãã¡ã¤ã«ãããå ´åã¯ãããå©ç¨ãç¡ãå ´åã¯æ°è¦ã§ä½æãããPython2.5以ä¸ã¯SQLiteãçµã¿è¾¼ã¾ãã¦ãããããé常ã¯ã¤ã³ã¹ãã¼ã«ä½æ¥ä¸è¦ if os.path.exists('tweet.db'): connection = sqlite3.connect('tweet.db') cursor = connection.cursor() else: connection = sqlite3.connect('tweet.db') cursor = connection.cursor() cursor.execute("create table twitter (tweetTime text, create_dt text, user_screen_id text, user_name text, tweet text);") # Streaming APIã«æ¥ç¶ streamingAPIURI = 'https://stream.twitter.com/1/statuses/sample.json' req = urllib2.Request(streamingAPIURI, headers={'Authorization': 'Basic %s' % (base64.encodestring('%s:%s' % (userID, userPassword))[:-1])}) streamingData = urllib2.urlopen(req) commitCnt = 0 #ãcommitDoNumåãã¤ã¼ããinsertãããDBã«Commitãããã¨ããç¨éã«ç¨æããã«ã¦ã³ã¿ for line in streamingData: data = simplejson.loads(line) text = data.get('text') if text and is_japanese(text): tweetTime = datetime.datetime.today() create_dt = data.get('created_at') user_screen_id = data['user']['screen_name'] user_name = data['user']['name'] try: tpl = (str(tweetTime), create_dt, user_screen_id, user_name, text) cursor.execute("insert into twitter values(?,?,?,?,?)", tpl) print str(tweetTime) + ":"+ user_name + "\n" + text + "\n" commitCnt += 1 if commitCnt == commitDoNum: #commitDoNumåãã¤ã¼ããinsertãããDBã«Commitãã connection.commit() commitCnt = 0 except: print "*** insert miss... ***"
ããããçãããã³ããããããã¹ããã¤ãã³ã°ãå§ãã¦ã¿ã¾ãããï¼SPSSãæã¡ã®æ¹ã¯ï¼ã