GAE/Python ã§ãã«ããã¹ããµã¼ãå®è£ ããã redisã使ã£ãã¤ã³ãããã¼ã¸ã§ã³
GAEã«ã©ãã©ãæ©è½ã追å ããã¦ããä¸ããªããªãå®è£ ãããªãã®ãå ¨ææ¤ç´¢ãåè©ãã¨ããã»ã°ã¡ã³ã¿ã¼ã ãã§ãæä¾ãã¦ããããå ¨ç¶ä¾¿å©ã ã¨æããã ãã©ãããªã¢ãã¦ã³ã¹ã¯ã¾ã æãã¾ããã
ãªããä½ãã°ãããããã¨ããäºã§ãå
¨ææ¤ç´¢ãã©ããå®è£
ãã¦ã¿ã¾ãããã²ã¨ã¤åã®ã¨ã³ããªã¼éãTriGramã§ãã
以åãæµæ¯å¯¿ã®ã¤ã±ã¡ã³ ã¤ã¢ã³ããã¨ä¸ç·ã«ä½ã£ãmisopotetoã¨ããã¢ã¸ã¥ã¼ã«ããã¼ã¹ã«ãã¦ãã¾ãã
ä»åã®ãã¤ã³ãã¯ã転置ã¤ã³ããã¯ã¹ãredisãµã¼ãã«éã£ã¦ããã¨ãããGAEï¼ã¨ããDBå
¨è¬ï¼ã¯ãã¤ã³ãµã¼ãããã¡ããã¡ãé
ãã®ã§ãNgramã§gramæ¯ã«ã¨ã³ããªã¼IDãappendãã¦ããã¨ããã®ã¯è¾ãã§ããTwitterã®æ¤ç´¢çµæ15åï½100æåä½ãTriGramã§ã¤ã³ããã¯ã¹ãä½ããã¨ããã¨ã1500åããããgetãã¦appendãã¦ãputããå¿
è¦ãããã¾ãã以åã¯ãTaskQueueãç¡ãã£ãã®ã§ããããQueueãä½ã£ã¦éåæã§Indexãä½æãã¦ãã¾ãããä»ã¯TaskQueueãããã®ã§åé¡ãªãå®è£
ã§ããã¨æã£ã¦ãã¾ããããããããå
¨ç¶é
ããããªãã®ãªãã©ã¤ãå¤çºãã¦ã30åãããã§2000+åã®TaskQueueãç©ã¾ãã£ã±ãªãã§å
¨ç¶æ¶åãããæ°é
ãããã¾ããã§ããã
ãæè¿ã¡ãã£ç±ãªãredisã使ããã¨ã§ãµã¯ãµã¯Indexä½æããã¦ã¿ã¾ããããã®æç¹ã§GAE/Pã ããããªããªã£ã¦ããã§ããã»ã»ã»æéãããã°GAEã§ãåºæ¥ãã®ã§ã¾ãã¾ãã¾ããredisã¯ãsakuraã®VPSã«ubuntuå
¥ãã¦ç«ã¦ã¦ã¾ããPythonã®redisã¢ã¸ã¥ã¼ã«ã¯socketãå¿
é ã§GAEã§ä½¿ããªãã®ã§ãVPSä¸ã«redisãããã·ã¼ãªFlaskAppãä½ã£ã¦POSTã¡ã½ããã§redisã®KVSã使ããããã«ãã¦ãã¾ãã
ã½ã¼ã¹ã¯ãbitbucket.orgã«ä¸ãã¾ãããã¾ããã¨ä¸ããã®ã§åé¡æãããç¥ããªãã§ãããåé¡ãã£ããæãã¦ãã ããã
ã½ã¼ã¹ï¼http://bitbucket.org/a2c/a2c-fts/overview
a2c-ftsï¼http://a2c-fts.appspot.com/
ï¼æ³¨æï¼ChromeãSafari以å¤ã§è¦ãã¨æ²ããè¦ãç®ã«ãªãã¾ãï¼
ã»ã°ã¡ã³ã¿ã¼
utilsã¢ã¸ã¥ã¼ã«ã«NgramSegmenterã¯ã©ã¹ãå ¥ãã¦ã¾ããããã§Ngramã§ã¶ã¤åãã¾ããããã©ã«ãã§Bigramã§ããããã¤ãºããã³ããªãã®ã§ä»åã¯Trigramã§ä½¿ç¨ãã¦ãã¾ãã以ä¸ã½ã¼ã¹
class NgramSegmenter: _word_delimiter_regex = u"[ãã" + string.punctuation + " ]" def __init__(self, text, sp=2, word_delimiter_regex=None): if not word_delimiter_regex: word_delimiter_regex = self._word_delimiter_regex self.text = re.sub(word_delimiter_regex, r'', text) self.ngramArr = [] for pos in range(len(self.text)-sp+1): self.ngramArr.append({ 'word_pos' : pos, 'word_text' : self.text[pos:pos+sp]}) def getText(self): return self.text def getNgramArr(self): return self.ngramArr def getSegmenter(self): a = [x['word_text'] for x in self.ngramArr] return a #return ' '.join(a) if __name__ == '__main__': LOG_FILENAME = 'segmenter.log' logging.basicConfig(filename=LOG_FILENAME,level=logging.DEBUG) text = u'ãã£ã¦ã¿ã è¨ã£ã¦èãã㦠ããã¦è¦ã ã»ãã¦ããã㰠人ã¯åãã' ana2Str = NgramSegmenter(text , 2) print ana2Str.getSegmenter()
Reidsããã¯ã·
GAEããredisãµã¼ãã使ç¨ããçºã«ã·ã³ãã«ãªWebã¢ããªä½ãã¾ãããredisã¯appendæä½ãã¢ãããã¯ã«åãã®ã§ã¨ã¦ã楽ã¡ãã§ããç»é²ããæã«ã¯ãGETã§ãPOSTã§ãåãä»ãã¾ããåç §ããã¨ãã«ã¯GETã®ã¿ããã¹ãç¨ã«ã«ã¼ãã«ã¢ã¯ã»ã¹ããã¨ç»é²ç¨ã®ãã©ã¼ã ãããã¾ããabããã¦ã¿ã¾ããããç§éæ°ç¾è¡ãã¦ãã§ãã
#!/usr/bin/env python import os import redis from flask import Flask, request import json app = Flask(__name__) app.debug = True @app.route('/') def redis_input(): html = ''' <!doctype html> <form action="/api/post" method="post"> Key:<input type=text name="key"><br> Val:<input type=text name="val"><br> <input type=submit value="save"> </form> ''' return html # GET method ================================================ @app.route('/api/set/<key>/<val>') def set_id(key, val): gram = 'redis_' + key twit_id = str(val) r = redis.Redis(host='localhost', port=6379, db=0) r.rpush(gram, twit_id) cur_list = r.lrange(gram, 0, -1) return json.dumps(cur_list, indent=2) @app.route('/api/get/<key>') def get_id(key): gram = 'redis_' + key r = redis.Redis(host='localhost', port=6379, db=0) cur_list = r.lrange(gram, 0, -1) return json.dumps(cur_list, indent=2) # POST method ================================================ @app.route('/api/post', methods=['POST']) def set_post_id(): gram = 'redis_' + request.form['key'] twit_id = str(request.form['val']) r = redis.Redis(host='localhost', port=6379, db=0) r.rpush(gram, twit_id) cur_list = r.lrange(gram, 0, -1) return json.dumps(cur_list, indent=2) if __name__ == '__main__': app.run(host='0.0.0.0', port=8080)
TaskQueueã«è¶ ããããï¼
GAE-jã®ã°ã«ã¼ãã«ãèãã¦ãã¾ã£ãã®ã§ãããTaskQueueã«ç»é²ããã¨ããã§ã¯ã¾ãã¾ããã
TaskQueueã§ç»é²ã§ããTaskã¯ãèªã¢ããªã®ç¹å®URLãªã³ãªã¼ãªã®ã§å¤é¨URLãTaskã«ç»é²ãããã¨ãåºæ¥ã¾ãããããã§ãå¤é¨ãµã¼ãã¼ãå©ãã¨ã³ããã¤ã³ããä½ã£ã¦ãããTaskQueueã§å©ãããã«ããã®ã§ãããæ£å¸¸çµäºãã¦ããã¯ããªã®ã«Taskãåé¤ãããã«å»¶ã
ãªãã©ã¤ãç¶ãã¦ãGAEã®CPUã¨ãã³ãããã³ãã³é£ãã¤ã¶ãç
ã«ããã£ã¦ãã¾ãã¾ããã
以ä¸ãé§ç®ã ã£ãã³ã¼ã
@app.route('/api/send_redis', methods=['POST']) def saveRedisTwitSearchIndex(): gram = request.form['gram'] twit_id =request.form['twit_id'] ext_url = 'http://redis.hoge.com/api/post' form_fields = { "key": gram, "val": twit_id, } form_data = urllib.urlencode(form_fields) result = urlfetch.fetch(url=ext_url, payload=form_data, method=urlfetch.POST, ) if result.status_code == 200: return 1 return 1
ã»ã¼ãµã³ãã«ããã³ããããå¯ãéããªã®ã§åãã¯ããªãã§ãããå
¨ç¶ã ãã§ãåå ããããã¾ããã
ä»ãåå ãåãããªãã§ãããã©ããã if ãããã¨ãã¡ã½ãã§ãã
if result.status_code == 200: return 1
ãã®é¨åããªããã¨ãã¾ãããã¾ãããåãã¦ããç¾å¨ã®ã³ã¼ã
@app.route('/api/send_redis', methods=['POST']) def saveRedisTwitSearchIndex(): gram = request.form['gram'] twit_id =request.form['twit_id'] ext_url = 'http://redis.atusi.me/api/post' form_fields = { "key": gram, "val": twit_id, } form_data = urllib.urlencode(form_fields) result = urlfetch.fetch(url=ext_url, payload=form_data, method=urlfetch.POST, ) ''' if result.status_code == 200: #return result.status_code return json.loads(result.content) ''' return "gram: %s<br>twitid: %s"%(gram, twit_id)
ã³ã¡ã³ãã¢ã¦ãããã ãã§åãã¾ããããªãã§ãããï¼
ä¸è¨å¤æ´ãå ãã¦ããã1000以ä¸ã®TaskQueueãæ°åã§æ¶åã§ããããã«ãªãã¾ããã
æ¤ç´¢çµæã¯ãGAEã®ä¸éã®1000件ã¾ã§è¿ãããã«ãªã£ã¦ã¾ããããããã¼ã¸ã¯ããã£ã±ãåºã¦ãæå³ãç¡ãã®ã§100件ã«çµã£ã¦ã¾ããä»ã«ãè²ã ã¤ã¾ãããæã¨ããã£ããããªæ°ããããã©å¿ããã»ã»ã»
redisã¸ã®Flaskã¢ããªã®ãã³ãã¼ãã¼ã¯çµæ
Document Path: /api/set/hoge1/1 Document Length: 17 bytes Concurrency Level: 100 Time taken for tests: 0.439 seconds Complete requests: 100 Failed requests: 98 (Connect: 0, Receive: 0, Length: 98, Exceptions: 0) Write errors: 0 Total transferred: 59828 bytes HTML transferred: 40740 bytes Requests per second: 227.68 [#/sec] (mean) Time per request: 439.219 [ms] (mean) Time per request: 4.392 [ms] (mean, across all concurrent requests) Transfer rate: 133.02 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 8 15 3.3 16 21 Processing: 38 154 81.8 144 431 Waiting: 38 154 81.6 144 428 Total: 48 169 83.6 160 439