æè¿Webå°èª¬ã人æ°ã§ããããæ¸ç±åããããã¢ãã¡åããããã¦ãã¾ã
ä»æã®ã¢ãã¡ã§ã¯ããã³ã¸ã§ã³ã«åºä¼ããæ±ããã®ã¯ééã£ã¦ããã ãããããæ¾éããã¦ãã¦ããããã§ã(ã¢ãã¡ã«åããã¦Kindleçã®1,2å·»ãå¤ä¸ãããã¦ãã¾ã)
ãã³ã¸ã§ã³ã«åºä¼ããæ±ããã®ã¯ééã£ã¦ããã ããã (GAæ庫)
- ä½è : 大森è¤ã,ã¤ã¹ãã¹ãºãã
- åºç社/ã¡ã¼ã«ã¼: SBã¯ãªã¨ã¤ãã£ã
- çºå£²æ¥: 2013/01/16
- ã¡ãã£ã¢: æ庫
- è³¼å ¥: 1人 ã¯ãªãã¯: 50å
- ãã®ååãå«ãããã° (32件) ãè¦ã
- ä½è : 大森è¤ã,ã¯ããããããã,ã¤ã¹ãã¹ãºãã
- åºç社/ã¡ã¼ã«ã¼: SBã¯ãªã¨ã¤ãã£ã
- çºå£²æ¥: 2014/01/15
- ã¡ãã£ã¢: æ庫
- ãã®ååãå«ãããã° (5件) ãè¦ã
sucrose.hatenablog.com
pixivの小説ランキングã¯ç·æ§åããæ¢ãã®ã¯é£ããã®ãæ¢ç¥ã®åé¡ã¨ãã¦ç¥ããã¦ãã¾ã[è¦åºå
¸]
ä¸å¿ç·å¥³å¥äººæ°ã©ã³ãã³ã°ã¨ãã¦「男子に人気」ã¨「女子に人気」ã®ã©ã³ãã³ã°ãããã¾ãããåãæ¹ãã¦ãã¨ã¼ã§ã
男子に人気ランキングãè¦ãã°ãããã¾ããç·åã«äººæ°ã®ããã10ããã¹ã¦刀剣乱舞ãªã®ã§ãã¶ãã ãããè
åãã§ã
ã¡ãªã¿ã«ããã50ã®ãã¡åå£ä¹±è以å¤ã®ãã®ã¯7件ããããã¾ããã§ãã
ã©ã³ãã³ã°ã®å°èª¬ãç·æ§åããã©ãããèªåã§å¤å®ã§ããªãããæ©æ¢°å¦ç¿ã使ã£ã¦ç°¡åã«è©¦ãã¦éãã§ã¿ã¾ãã
ãªããç·æ§åããã女æ§åããã¨ããè¨èããããµãã«ä½¿ã£ã¦ã¾ããã容赦ãã ãã
ãã¼ã¿ã®åé
「男子に人気」ã¨「女子に人気」ã®ã©ã³ãã³ã°ããããããç·å¥³ã®ã©ãã«ä»ãã®æ師ãã¼ã¿ã¨ã¿ãªãã¦å©ç¨ãã¾ã
ãªããéå»ã®ã©ã³ãã³ã°ã¯ãªãã®ã§ãã¼ã¿ã¯ç·å¥³100件ãã¤ããããã¾ãã(ã¤ã©ã¹ãã®æ¹ã«ã¯éå»ã©ã³ãã³ã°ãããã®ã«
ç¹å¾´é
å°èª¬ã®æ
å ±ã®ãã¡ä½¿ããã®ã¯ãæ¬æãã¿ã¤ãã«ãã¿ã°ããã£ãã·ã§ã³ãä½è
ãªã©ããããã¨ããã¾ãã
ä»åã¯æ¬æã¨ã¿ã°ã®ããããã®é »åº¦ãTFIDFã§éã¿ä»ããã¦ãã¯ãã«ã¨ãã¦å¦ç¿ãã¦ã¿ã¾ãã
çµæ
æ®å¿µãªãã¨ã«(ï¼)æ¬æã使ããããã¿ã°ã使ã£ãã»ããæ§è½ãããã£ãã§ã(ã¡ã¿ãã¼ã¿ã¯ã³ã³ãã³ããã®ãã®ã®æ å ±ã使ããããåé¡ã«å½¹ç«ã¤ãã¨ãããããã¾ã)
ã¿ã°
å°èª¬ãã¨ã«ã©ã®ã¿ã°ãå«ã¾ãã¦ãããã®ãã¯ãã«ãä½ã£ã¦SVMã§å¦ç¿ãã¾ãã
ãã¼ã¿ãå°ãªãã£ãã®ã¨ãããã©ãããã£ãã®ã§ã¯ãã¼ãºããªè©ä¾¡ã§ã(è¨ç·´ãã¼ã¿ãã®ãã®ãåé¡ãã¦è©ä¾¡ãã)
「女子に人気」ã®ã©ã³ãã³ã°ã®å°èª¬ã¯9å²ããããæ£ããå¤å®ã§ãã¦ãã¾ãããã「男子に人気」ã©ã³ãã³ã°ã®å°èª¬ã¯5,6å²ãããããæ£è§£ã§ãã¾ããã§ãã
「男子に人気」ã©ã³ãã³ã°ã®åé¡ãå¤ã失æãã¦ããã®ã¯ã「男子に人気」ã©ã³ãã³ã°ã«ã女æ§åãä½åãå¤ãã¨ãã観å¯çµæã¨åè´ãã¦ãã¾ãã
ãã®çµæã¯ã¾ãã¾ã女æ§åãä½åãåãããã¨ãã§ãã¦ããã¨èããã¹ãã§ããã
åé¡ã¸ã®å½±é¿ã大ããã£ããéã¿ã®çµ¶å¯¾å¤ã®å¤§ããªã¿ã°ã確èªããã¨ç¹å¾´çãªä½åããã£ã©ã®ã¿ã°ãã¡ããã¨å¦ç¿ã§ãã¦ããããã«è¦ãã¾ã(ã¨ãããåå£ä¹±èå¼·ããã§ãï¼)
ç·æ§åã | 女æ§åã |
---|---|
ãã¯ã俺ã®éæ¥ã©ãã³ã¡ã¯ã¾ã¡ãã£ã¦ãã | åå£ä¹±è |
ã©ãã©ã¤ã | åå£ä¹±èå°èª¬100userså ¥ã |
æ¯ä¼è°·å «å¹¡ | åå£ä¹±è å°èª¬100userså ¥ã |
naruto | è åã |
ãµã¹ãµã¯ | 女審ç¥è |
ãã«ãã | ãã©ãã¯æ¬ä¸¸ |
éªãä¸éªä¹ | åå£ä¹±å¤¢ |
æ±æ¢å¸ | åå£ä¹±è |
è¦ãã | ã¨ããã¶ã¡ãããã |
narutoå°èª¬50userså ¥ã | æ··åå°èª¬100userså ¥ã |
ã¡ãªã¿ã«ã¿ã°ã®æån-gramã使ã£ã¦å¦ç¿ãããã100usersãã®é¨åæååããè
ããªã©ã®éã¿ã®çµ¶å¯¾å¤ã大ããã¦ãããããã£ãã§ã
pixivã®å°èª¬ã¯å¥³æ§ã¦ã¼ã¶ã¼ãå¤ãã®ã§ãä¸ã®éã¿ã®çµ¶å¯¾å¤ã®å¤§ããªã¿ã°ã¿ã¦ããããããã«ã100usersããªã©ã®äººæ°ã大ããªã¿ã°ãã¤ãã¦ããã ãã§å¥³æ§ã®ç¢ºçãé«ããªãã¨ããç¾è±¡ãããã¾ã
æ師ãã¼ã¿ãæå³ããéãã®åºæºã§åãããã¦ãããã£ã¦ã®ã¯é£ããåé¡ã§ã
ç·å¥³ã§åãããã£ãã®ã«ã人æ°ã®å¤§å°ã®å¤å®æ©ãã§ãããããããããã¾ãã
æ¬æ
å°èª¬ã®æ¬æã使ã£ãå ´åã試ãã¦ã¿ã¾ãã
åç´ã«æ¬æãå½¢æ
ç´ è§£æãã¦å½¢æ
ç´ ã®é »åº¦ã®ãã¯ãã«ã使ãã¾ãã
è¨ç·´ãã¼ã¿ãå°ãªããããæç« ãçãããããã¾ã精度ããããªãã£ãã§ã(ææ³ããã¤ã¼ãéãã
主ã«ãã£ã©å(ã®ä¸é¨)ãªã©ãéã¿ã®çµ¶å¯¾å¤ã®å¤§ããªåèªã¨ãã¦åºã¦ãã¾ãããä»ã«ãç·æ§åãã ã¨ãã¡ããããããããããã¾ããããªã©ã®å¥³æ§ã表ãã¦ããããªåèªãåºã¦ãã¦ã女æ§åãã ã¨ããåããã ããããã¿ãããã¯ããªã©ã®ç·æ§ã£ã½ãè¨èé£ããåºã¦ãã¾ã
ã¾ã¨ã
ããããç·æ§åãã¨å¥³æ§åãã«åé¡ã§ãã¾ãã(ä½åã§å¤å¥ãã¦ããã ããªã®ã§ãªãªã¸ãã«ã§ã¯é£ããã
æ¬å½ã¯ã»ãªãå
ãã©ãããèæ
®ãããé·ãã¨ãåè©ãã¨ã®é »åº¦ã¨ããå
¥ãããç¹å¾´ãã¯ãã«ã工夫ããã»ããããã¨æãã¾ã
ç·åã«äººæ°ã®åé¡çµæã女ååãã¨å¤å®ãããã®ãå¤ãã ã£ãã®ã§èªåã®äººåã§ç·åã«äººæ°ã©ã³ãã³ã°ãåé¡ãã¦ã¿ã¾ãã
åå£ä¹±èãªã©ãé¤ãã¨100ä½åä¸4,50件ãããããç·æ§åãã¨æããããã®ã¯ããã¾ããã§ãã
çµè«ã¨ãã¦ã¯ãç·æ§åãã®Webå°èª¬ãèªã¿ãã人ã¯pixivã ã¨é£ããã®ã§ããªãªã¸ãã«ãªã小説家になろうãäºæ¬¡åµä½ãªãハーメルンã§èªã¿ã¾ããã()
ãã¾ã - åå°èª¬ãµã¤ããã¨ã®ã¤ã¡ã¼ã¸
以ä¸ä¸»è¦³ã§ç¹å¾´ãåæãã¾ãã
- 小説家になろう
- æ大ã®(ï¼)Webå°èª¬ãµã¤ãã§åºæ¬ãªãªã¸ãã«ç³»ããã¡ã³ã¿ã¸ã¼ããã¼ããæãä¸ããããã¼ã¬ã ãåéãç³»ãªã©ããããã¨ç¬ç¹ã®ç¹å¾´ãããã¾ãã
- pixiv 小説
- ã¤ã©ã¹ãæ稿ãµã¤ãã¨ãã¦æåãªpixivã®å°èª¬çã主ã«å¥³æ§ã¦ã¼ã¶ã¼ãå¤ãã©ã³ãã³ã°ã¯è 女ååãã®äºæ¬¡åµä½å°èª¬ãã»ã¨ãã©ã
- ハーメルン
- äºæ¬¡åµä½å°èª¬æ稿ãµã¤ãã主ã«ç·æ§åã(ï¼)
- Arcadia SS投稿掲示板
- æãããããµã¤ãã§ãããªãªã¸ãã«ã¨äºæ¬¡åµä½ã©ã£ã¡ãããã¾ãããæè¿ã¯æ稿æ°ãå°ãªã
ã½ã¼ã¹ã³ã¼ã
ã¿ã°ã§é©å½ã«åé¡ããç(sklearnã¨ãpyqueryãªã©ã®ã©ã¤ãã©ãªã使ã£ã¦ãã¾ã)
# -*- coding: utf-8 -*- import re import time from pyquery import PyQuery import codecs from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.svm import LinearSVC from sklearn.cross_validation import cross_val_score from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report import numpy as np def read_tag(url): pattern_id = re.compile(r'id=(\d+)') q = PyQuery(url) tags = [] for elem in q.find('.ranking-item'): tags.append(PyQuery(elem).find('.tags a[class!=tag-icon]').text()) time.sleep(1) return tags pattern_id = re.compile(r'id=(\d+)') #ãã¼ã¿ã®åå¾ female_tag = [] female_tag += read_tag('http://www.pixiv.net/novel/ranking.php?mode=female&p=1') female_tag += read_tag('http://www.pixiv.net/novel/ranking.php?mode=female&p=2') male_tag = [] male_tag += read_tag('http://www.pixiv.net/novel/ranking.php?mode=male&p=1') male_tag += read_tag('http://www.pixiv.net/novel/ranking.php?mode=male&p=2') #ç¹å¾´ãã¯ãã«å vectorizer = CountVectorizer(min_df=3, ngram_range=(1, 1)) data = vectorizer.fit_transform(female_tag + male_tag) tfidf = TfidfTransformer() data = tfidf.fit_transform(data) print u'ãã©ã¡ã¼ã¿ãå¤ããªããé©å½ã«ã¯ãã¹ããªãã¼ã·ã§ã³(精度)' for i in xrange(-10, 6): model = LinearSVC(C = 4**i, loss='l1') scores = cross_val_score(model, data, [1] * len(female_tag) + [-1] * len(male_tag), cv = 20) print i, scores.mean() model = LinearSVC(C = 4**(-5), loss='l1') model.fit(data, [1] * len(female_tag) + [-1] * len(male_tag)) print u'ç·æ§åã女æ§åãããããã®éã¿ã®å¤§ããªç¹å¾´é' print ', '.join(np.array(vectorizer.get_feature_names())[np.argsort(model.coef_[0])[:10]]) print ', '.join(np.array(vectorizer.get_feature_names())[np.argsort(model.coef_[0])[-10:][::-1]]) print print 'confusion_matrix' print confusion_matrix([1] * len(female_tag) + [-1] * len(male_tag), model.predict(data)) print 'classification_report' print classification_report([1] * len(female_tag) + [-1] * len(male_tag), model.predict(data)) print print u'è¨ç·´ãã¼ã¿èªä½ãäºæ¸¬ããçµæ' print model.predict(data) print u'ã«ã¼ãã¼ã©ã³ãã³ã°ãäºæ¸¬ããçµæ' print model.decision_function(vectorizer.transform(read_tag('http://www.pixiv.net/novel/ranking.php?mode=rookie')))