ãããã (id:karaage) ãããKindleã®èµæ¸ä¸è¦§ããã¼ã¿åæãã¦ãã¦é¢ç½ãã£ãã®ã§ãèªåã試ãã¦ã¿ããã¨ã«ãã¾ããã
ãã¼ã¿ãã°ã©ãåãã¦è«¸ã ã®é ç®ã観å¯ããã®ã¯è²ã ã¨çºè¦ãããããã ã¨æãä¸æ¹ãèªåã®å好ããã©ããªã¿ã¤ãã«ã«æ¹ãããããã®ãããã¨è¦ã§æ´ã¿ãããªãã¨ããæãããããè³¼å ¥ã¿ã¤ãã«ãWord Cloudã§å¯è¦åãã¦ã¿ã¾ããã
Word Cloudã¯ãåºç¾é »åº¦ã®é«ãåèªã大ããã§å³ç¤ºããææ³ã§ããä»åã¯Python使ã£ã¦ä½ãã¾ãããWord Cloudã®ã©ã¤ãã©ãªãããã¾ãããæç« ããåèªãæãåºãå½¢æ ç´ è§£æããããã«æ å ±ã沢山ããã¾ããããã
- Kindleèµæ¸ä¸è¦§ã®åå¾ã¨CSVã®æ´ç
- å種ã©ã¤ãã©ãªã®ã¤ã³ã¹ãã¼ã«
- ã³ã¼ã
- åºæ¥ä¸ããããã³å¼ã
- èµæ¸æ°ã¨æ¯å¹´ã®è³¼å ¥æ°
- ã¾ã¨ã
- åèãªã³ã¯
Kindleèµæ¸ä¸è¦§ã®åå¾ã¨CSVã®æ´ç
ãã¡ããåç
§ã«èµæ¸ä¸è¦§ãå
¥æããããèè
ãè³¼å
¥æ¥ãªã©ãã¿ã¤ãã«ä»¥å¤è¦ç´ ãåé¤ãã¾ãã
ã¤ãã§ã«ãã·ãªã¼ãºãã®ã1å·»ã ãã«ãã¦éå¼ããã¦ããã¾ãããããªãã¨é£è¼é·ã漫ç»ã¿ã¤ãã«ãã§ãã§ã表示ãããã ãã§ããããã
èªåã®ã¬ãã«ã ã¨pythonã§çãã¼ã¿å¦çããããã¨ã¯ã»ã«ã§ã´ãªã´ãªcsvããã£ãã»ããæ©ãã®ã§ãããã¯ããã¥ã¢ã«ã§æ³¥èãå¦çãã¾ããæ°ã®å©ããæ¹æ³ï¼ç¹ã«ã·ãªã¼ãºãã®ã®éå¼ãæ¹æ³ï¼ããã°æãã¦ä¸ããã
å種ã©ã¤ãã©ãªã®ã¤ã³ã¹ãã¼ã«
pip install janome pip install wordcloud
ã³ã¼ã
# coding:utf-8 import csv import codecs from janome.tokenizer import Tokenizer from wordcloud import WordCloud from collections import Counter, defaultdict # åè©ã ãæ½åºãåèªãã«ã¦ã³ã def counter(texts): t = Tokenizer() words_count = defaultdict(int) words = [] for text in texts: tokens = t.tokenize(text) for token in tokens: # åè©ããåè©ã ãæ½åº pos = token.part_of_speech.split(',')[0] if pos in ['åè©']: # å®éã®çµæããææã«æ²¿ããªãåèªãåé¤ if token.base_form not in ["ã³ããã¯ã¹", "æ庫", "ç¡æ", "æ°æ¸", "éå®", "ã·ãªã¼ãº", "ãã", "ãã¡", "æé", "試ã"]: words_count[token.base_form] += 1 words.append(token.base_form) return words_count, words # æåã³ã¼ãUTF-8ã ã¨åããã®ã§Shift-JISã§èªã¿è¾¼ã¿ãããã§ãèªããªããã®ã¯ignoreã§ç¡è¦ with codecs.open('./Kindle_croud.csv', 'r', "Shift-JIS", "ignore") as f: reader = csv.reader(f, delimiter='\t') texts = [] for row in reader: if(len(row) > 0): text = row[0] # ãã¡ãã¨èªã¿è¾¼ãã¦ããç¢ºèª print(text) texts.append(text) words_count, words = counter(texts) text = ' '.join(words) # fontã«æ³¨æ wordcloud = WordCloud(background_color="white", font_path='C:\Windows\Fonts\meiryo.ttc', width=900, height=500).generate(text) wordcloud.to_file("./wordcloud_sample.png")
ã³ã¼ãã¯åºæ¬ãããã«è»¢ãã£ã¦ãããã®ã®ç¶ãæ¥ãã§ãªãã¨ããªã£ãã®ã§ãããcsvèªã¿è¾¼ã¿æåã³ã¼ãã§è©°ã¾ã£ãã®ã§ã注æãããã
UTF-8ã ã¨æååãããShift-JISã§ãã¨ã©ã¼ãçºçãignore
ã追å ãã¦ãèªããªããã®ãç¡è¦ãããã¨ã§ããããåé¿ã§ãã¾ããã
with codecs.open('./Kindle_croud.csv', 'r', "Shift-JIS", "ignore") as f:
ãã¨ãKindleã¯èµæ¸ã¿ã¤ãã«ä»¥å¤ã®ãã®ãã¿ã¤ãã«ã«ã²ã£ã¤ãã¦ãã¾ããã¿ã¤ãã«ä»¥å¤ã®è¦ç´ ã¯æ¥µååé¤ãããã£ãã®ã§ãããã¤ãã®åèªã¯èªã¿è¾¼ã¾ãªãããè¨å®ãã¦ãã¾ãã
if token.base_form not in ["ã³ããã¯ã¹", "æ庫", "ç¡æ", "æ°æ¸", "éå®", "ã·ãªã¼ãº", "ãã", "ãã¡", "æé", "試ã"]:
ãã¼ã¿èªã¿è¾¼ã¿æã«replace
ã¨strip
ã§åºç社ã¨ãæ¬å¼§å
ã®æåã¯åé¤ãããã£ãã®ã§ãããä¸æãè¡ããªãã£ãã®ã§å¦¥åã¨åæã§è§£æ±ºã§ãã
åºæ¥ä¸ããããã³å¼ã
ãããã¦åºåãããã®ãè¨äºé ã«åºãããããªã®ã§ãã...
ä»äº...ä¸ç...çµæ¸...æè³...ãããâ¦ã£ã¦æãã§ããã絶対楽ãã人çæ©ãã¦ãªãã ããçãªã
ã³ããã¯ã¹å種ãã¿ã¤ãã«ããæé¤ã§ããªãã£ãã®ã§ãã©ãã®æ¼«ç»ãããè²·ã£ã¦ãã®ãã赤裸ã ã§ããHARTA COMIXãé ï¼æãã¦ãã¾ãããæèãã¦ãªãã£ãã§ãããä¸è¨ãã¸ããHARTA COMIXã§ãã
ãã³ã¸ã§ã³é£¯ 1å·» (HARTA COMIX)
- ä½è :ä¹äº è«å
- çºå£²æ¥: 2015/01/15
- ã¡ãã£ã¢: Kindleç
ããã¾ã¤ã 1 (HARTA COMIX)
- ä½è :å¤§æ¦ æ¿å¤«
- çºå£²æ¥: 2013/08/01
- ã¡ãã£ã¢: Kindleç
ãã¯ã¡ã¤ã¨ãã³ã 1å·» (HARTA COMIX)
- ä½è :æ¨«æ¨ ç¥äºº
- çºå£²æ¥: 2014/02/14
- ã¡ãã£ã¢: Kindleç
å°ããæåã«ããã£ã¼ãã£ã¼ãã¦ããã¨ãæéã¨ãã·ã³ãã«ã¨ãã£ã¦åèªãè¦ããã¾ãããããåºæ¬ä»äºé¢ä¿ã§ãããå¹çããä»äºãããªãããã¨ããæããéãã¦è¦ãã¾ããããªããå®åã«å½¹ã«ç«ã£ã¦ãããã¯ä¸æã§ããå½¹ç«ã£ã¦ãããã¨ã«ãã¦ããã¾ãããã
èµæ¸æ°ã¨æ¯å¹´ã®è³¼å ¥æ°
ãããã¸ãã®ããæ¹ãããããããã®è¨äºãåç §ã§ãã(ã¶ãæã)
èµæ¸æ°ã¯528ã§ãè³¼å ¥æ°ã¯ä¸è¨ã®éãã
2013å¹´ãã2016å¹´ã«ããã¦ã¬ã¯ãã¨ä½ä¸ãã¦ãããããã¡ãã£ã¨ãã¤çãè¿ãã¤ã¤ããæãã§ããã
2013-2014ãããã¯æ¼«ç»ãããããè²·ã£ã¦ããã®ãèµæ¸æ°ã«å¯ä¸ãã¦ãããã ã¨æãã¾ãããããã¸ãã§çµå©ãã¦åä¾ç£ã¾ããã転è·ãããã¨ã人çã®è»¢ææçãªãã®ãç«ã¦ç¶ãã«èµ·ããããã®çµæããã©ã¹ãã£ãã¯ã«ç¾ãã¦ããæãã§ãã
ã¾ã¨ã
Kindleã®èµæ¸ã¿ã¤ãã«ãããèªåãã©ããªã¿ã¤ãã«ã«æ¹ãããå¾åã«ããã®ãè¦ã¦ã¿ã¾ããã
æã£ã¦ã以ä¸ã«ä»äºã«æ¯ããã¦ãããªã¼ãã£ã¦ææ³ã§ããããã¡ãã£ã¨æã¯æ¼«ç»ãçµæ§èªãã§ããããããã«åã«ãªãã¨å°èª¬ã«æãåºãã¦ãããããã§ãããããã¨ãªã«ãªãã£ã¦æãã§ããã 人çãè±ãã«ããããã«ãããã£ã¨è¶£å³çãªæ¬ãæ¶åãã¦ããã¹ãã ãªã£ã¦æ°ã¥ããå¾ããã¾ãããããã»ãã¨ã人éãããªããã¡ã«é ãã¬ããã¬ãã«ãªããã®ã§ãããã³ã¯ã¤ï¼
åèãªã³ã¯
Kindleの蔵書リストをGoogle Colaboratoryでデータ分析してみた - karaage. [からあげ]
Word Cloudでツイートを可視化してみた(python) - Qiita
【Python】ブログの特徴をワードクラウドで可視化しよう!【WordPress】 | みんな栄養に頼りすぎてる
Word Cloudで文章の単語出現頻度を可視化する。[Python] - Qiita
もう怖くない!!Pandasデータ読み込みにおける文字コードの指定 – GeoSpatial Computing LAB Note