ããã«ã¡ã¯ãã»ãããã§ãã
以åãSEO対çã¨ãããã¨ã§ãå é¨ãªã³ã¯ã®å¯è¦åã«ã¤ãã¦ã®è¨äºãæ¸ãã¾ããã
ã³ã¬ã ãã§ããååã«åèã«ãªãã®ã§ãããã»ããã²ã¨ã¤æ°ã«ãªããã¨ããªãã§ããï¼
- å¤å³¶ã«æµ®ãã¶è¨äºã¯ã©ããã£ã¦è£è¶³ããã°ãããã
ããèããã®ã¯äººéã®ãä»äºã§ãããã
….
ãã©ãã£ã±ãããã©ãããï¼ï¼
ã¨ããããã§ãèªååãã¾ããããã
- è¨äºã®é¡ä¼¼åº¦ãåºã
- ãã¹ã±ããåæ
- ãã¹ã±ããåæãããã°ã«å¿ç¨
- ã³ããã§ä¸çºï¼è¨äºæ½åºã³ã¼ã
- ã¾ã¨ã
è¨äºã®é¡ä¼¼åº¦ãåºã
è¨äºã®é¡ä¼¼åº¦ãåºãæ¹æ³ã£ã¦ããããããããã§ããã大å¥ããã¨äºç¨®
- è¨äºã®æç« å 容ããé¡ä¼¼åº¦ç®åº
- èªè ã®è¡åããé¡ä¼¼åº¦ç®åº
ä»åã¯äºã¤ç®ãèªè ã®è¡åããé¡ä¼¼åº¦ç®åºãæ¹å¼ã§ããã¾ãã*1
ãã¹ã±ããåæ
ãããã¤ã¨ãã¼ã«ãã®æ³åã£ã¦èãããã¨ãªãã§ããï¼ãªã ãã¨ä¸ç·ã«ãã¼ã«ãè²·ãããã£ã¦ããã¢ã¬ã§ãã
情報マネジメント用語辞典:おむつとビール(おむつとびーる) - ITmedia エンタープライズ
ããã«ä½¿ãããã®ããã¹ã±ããåæã
ãåãçµã¿åããã§è²·ããããã¨ãå¤ããã®å士ã¯é¢é£ãæ·±ãã ããã ã¨ããèãã«åºã¥ãã¦ãã¾ãã
ãã¨ãã°ãããã«AãããBãããCãããDãããããã¨ãã¾ãã ããããä¸å³ã®ãããªååãè²·ãã¾ããã
ãããã縦軸ã顧客ã横軸ãååã«ã¨ãã¾ããããã¨ãåååã誰ã«è²·ããããããããã¾ãã
åååã«ããã¦ãã©ããä¼¼ã¦ããããæ°å¦çã«è¨ç®ãã¦ã 顧客ã®è³¼è²·å±¥æ´ããååã®é¢é£æ§ãç®åºãã¦ãããã¨ãããã¨ã§ãã
ãããããå ±èµ·æ§ããè¦ãã®ã§ãããã詳ããã¯ãã¡ãã®è¨äºã§ã©ããï¼
ãã¹ã±ããåæãããã°ã«å¿ç¨
ãã®èããããã°ã«ä½¿ã£ã¦ã¿ãã®ãä»åã®ã¢ã¬ã
- ååãè¨äº
- 顧客ã¯èªè
- è³¼å ¥ã®æç¡ã¯ãã¯ã
ã¹ãã çãªãã¯ãã«ããã¡ã³ã¯ä¸èº«ããã»ãã®ç®çã§ããã¯ãã¼ã¯ãã¦ããå¯è½æ§ãããã®ã§ããã®å½±é¿ãèããªãããã«ãå°ãã ãç´°å·¥ããããã®ã使ã£ã¦ãã¾ãã(ä¸å³)
ããã®å ç©ãç¨ãã¦ãé¢é£åº¦ãæ½åºãã¦ãã¾ãï¼
ã³ããã§ä¸çºï¼è¨äºæ½åºã³ã¼ã
ä»ã¾ã§ã©ããpythonã使ã£ã¦ãè¨äºæ½åºã®ã³ã¼ããæ¸ãã¾ãããä¸è¨ã³ã¼ããã³ãããã¦ãrelated_article.py
ã¨ãã§ä¿åãã¦ãã ããã
*3
#coding: utf-8 from bs4 import BeautifulSoup import urllib from urllib import request import csv from argparse import ArgumentParser import json import numpy as np from sklearn import manifold import matplotlib.pyplot as plt def extract_urls(root_url): """ ããããã¼ã¸ãæå®ããã¨ãããã°å ã«åå¨ããurlããã¹ã¦æãåºãã¦ããã """ is_articles = True page = 1 urls = [] titles = [] # writer = csv.writer(f, lineterminator='\n') # æ¹è¡ã³ã¼ãï¼\nï¼ãæå®ãã¦ãã while is_articles: try: html = request.urlopen("{}/archive?page={}".format(root_url, page)) except urllib.error.HTTPError as e: # HTTPã¬ã¹ãã³ã¹ã®ã¹ãã¼ã¿ã¹ã³ã¼ãã404, 403, 401ãªã©ã®ä¾å¤å¦ç print(e.reason) break except urllib.error.URLError as e: # ã¢ã¯ã»ã¹ãããã¨ããurlãç¡å¹ãªã¨ãã®ä¾å¤å¦ç print(e.reason) break soup = BeautifulSoup(html, "html.parser") articles = soup.find_all("a",class_="entry-title-link") for article in articles: titles.append(article.text) urls.append(article.get("href")) if len(articles) == 0: # articleããªããªã£ããçµäº is_articles = False page += 1 return titles, urls def get_bookmarks(url): """ ã¯ã¦ãæ å ±ãåå¾ """ data = request.urlopen("http://b.hatena.ne.jp/entry/json/{}".format(url)).read().decode("utf-8") try: info = json.loads(data.strip('(').rstrip(')')) except: info = json.loads(bytes(data).strip(b'(').rstrip(b')'), "r") try: return info["bookmarks"] except: return 0 def make_matrix(urls, save): users = [] for url in urls: bookmarks = get_bookmarks(url) if bookmarks != 0: for bookmark in bookmarks: user = bookmark["user"] if user not in users: users.append(user) # bookmark_matrixä½æ M = np.zeros((len(urls),len(users))) for i, url in enumerate(urls): bookmarks = get_bookmarks(url) if bookmarks != 0: for bookmark in bookmarks: j = users.index(bookmark["user"]) M[i][j] += 1 # ä¿åããããã°ãã if save: with open("data.csv","w") as f: writer = csv.writer(f, lineterminator='\n') a = ["USER"] a.extend(urls) writer.writerow(a) MT = M.T for i, user in enumerate(users): m = [user] m.extend(MT[i]) with open("data.csv","a") as f: writer = csv.writer(f, lineterminator='\n') writer.writerow(m) # ãã¯ãä¸åå¢ãæ¶ã MT = M.T # print(MT.shape) MT_filter = [] for e in MT: if e.sum() > 1: e /= e.sum() MT_filter.append(e) MT_filter = np.array(MT_filter) M_filter = MT_filter.T print(M_filter.shape) return M_filter def calc_dist(M_filter,alpha=0.05): confidences = np.zeros( (len(M_filter),len(M_filter)) ) for i, article0 in enumerate(M_filter): for j, article1 in enumerate(M_filter): a0a1 = np.zeros(len(article0)) for l,(u0, u1) in enumerate(zip(article0, article1)): a0a1[l] = u0*u1 if article0.sum() == 0 or i==j: confidences[i][j] = 0 else: confidences[i][j] = a0a1.sum()#/article0.sum() # symmetricãªè¨éã®å ´å dist = 1-np.power(confidences,0.04) return dist if __name__ == '__main__': parser = ArgumentParser() parser.add_argument("-u", "--url", type=str, required=True,help="input your url") parser.add_argument("-r", "--rank", type=int, required=True,help="input num of related articles") parser.add_argument("-s", "--save_matrix", action="store_true", default=False, help="save matrix default:False") parser.add_argument("-m", "--mds", action="store_true", default=False, help="show MSD scatter default:False") args = parser.parse_args() save = args.save_matrix mds = args.mds n = args.rank titles, urls = extract_urls(args.url) alpha = 0.05 # userãªã¹ãä½æ M_filter = make_matrix(urls, save=False) # csvã«ãã(urlç·¨) with open("related_articles_url.csv","w") as f: writer = csv.writer(f, lineterminator='\n') a = ["original"] a.extend(range(n)) writer.writerow(a) # csvã«ãã(ã¿ã¤ãã«ç·¨) with open("related_articles_title.csv","w") as f: writer = csv.writer(f, lineterminator='\n') a = ["original"] a.extend(range(n)) writer.writerow(a) confidence = np.zeros(len(M_filter)) for i, article0 in enumerate(M_filter): for j, article1 in enumerate(M_filter): a0a1 = np.zeros(len(article0)) for k,(u0, u1) in enumerate(zip(article0, article1)): a0a1[k] = u0*u1 if article0.sum() == 0: confidence[j] = 0 else: confidence[j] = a0a1.sum()/article0.sum() index = confidence.argsort()[::-1] print(titles[i],":",urls[i]) related_article_url = [urls[i]] related_article_title = [titles[i]] related_num = ["#"] # 追å for i in index[1:n]: related_article_url.append(urls[i]) related_article_title.append(titles[i]) related_num.append(confidence[i]) print("\t",confidence[i],titles[i],":",urls[i]) with open("related_articles_url.csv","a") as f: writer = csv.writer(f, lineterminator='\n') writer.writerow(related_article_url) writer.writerow(related_num) with open("related_articles_title.csv","a") as f: writer = csv.writer(f, lineterminator='\n') try: writer.writerow(related_article_title) except: continue writer.writerow(related_num) # confidence heatmapä½ã if mds: dist = calc_dist(M_filter, alpha) mds = manifold.MDS(n_components=2, dissimilarity="precomputed") pos = mds.fit_transform(dist) plt.scatter(pos[:,0], pos[:,1], marker="x", alpha=0.5) plt.show()
å®è¡ã¯ä¸è¨ã³ãã³ãã§ããä¿åå ´æã§å®è¡ãã¾ããã
python related_article.py -u http://www.procrasist.com -r 10 -s -m
-u
èªåã®URLãå ¥ãã-r
é¢é£è¨äºãä½åã¾ã§åºãã-s
é¢é£è¨äºãªã¹ããä¿åãããã©ãã-m
2次å ã«ãããã³ã°ãã¦è¡¨ç¤ºãã¦ã¿ããã©ãã
åè¨äºã«å¯¾ãã¦ã¿ã¤ãã«ãURLãé¢é£åº¦ã®å¤§ããé ã«è¡¨ç¤ºããã¾ãã ã¾ããå®è¡çµæãcsvã«ä¿åãããããã«ãªã£ã¦ãã¾ã
data.csv
: 顧客ã¨ååã®ãããªãã¯ã¹(ä¸ã§è¡¨ç¤ºãã¦ãããããªãã¤)related_article_url.csv
: é¢é£è¨äºã®URLãªã¹ãrelated_article_title.csv
: é¢é£è¨äºã®ã¿ã¤ãã«ãªã¹ã*4
... 1ã¶ææ¸ãç¶ããã°ãã£ã±ãã¶ã¡ä¸ããã®ï¼ããã°æå ±(Â´Îµï½ ) : http://www.procrasist.com/entry/2016/10/31/200000 0.171028210043 ããã°ã¯ä¸å¹´ç¶ãã®ï¼èªè æ°ã¯ï¼2ä¸ä»¶ã®ã¯ã¦ãªããã°ã§åæãã : http://www.procrasist.com/entry/blog-analyzer 0.149092859153 帰çã»ä¸äº¬ã®æ°å¹¹ç·ã®ä¸ã§èãããæ²9é¸ : http://www.procrasist.com/entry/2016/12/30/200000 0.0273836825332 ãããã°éå¶å ±åãåææ¯3åï¼å¾æé åã«åãå ¥ããããã£ã¦ã¦è¯ãã£ã7ã¶æç®ã : http://www.procrasist.com/entry/7th-month 0.0258411085419 ãéå¶å ±åãããºã¨Googleæ§ã¨ï¼ã¶æç®ã®ç§ : http://www.procrasist.com/entry/5th-month 0.0238548574645 ãã³ã¼ãã§ä¸çºãããã°æé©å/SEO対çã§é¢åãªãã¨ã¯å ¨ã¦Pythonã«ããããã : http://www.procrasist.com/entry/python-blog-optimization 0.0238548574645 DeepLearningç³»ã©ã¤ãã©ãªãKerasããã¾ãã«ã便å©ã ã£ãã®ã§ä½¿ãæ¹ã¡ã¢ : http://www.procrasist.com/entry/2017/01/07/154441 0.0175295395488 æè¿ã®youtuberãéçãç±ããããªä»¶ã«ã¤ã㦠: http://www.procrasist.com/entry/2016/12/04/200000 0.0165658732392 ãæ人å¼ãçã®ã¤ã³ãã¼ã¨ã¯ï¼ã¤ã³ãã¼ã®ãªãæ¹ãèãã¦ã¿ãï¼ : http://www.procrasist.com/entry/2017/01/09/234532 0.0165658732392 ãæ¦äºå£®æ強説ãæ¦äºå£®ãæ¬å½ã«ãç¾ç£ã®çãã ã¨æãã¯ã± : http://www.procrasist.com/entry/takeiso ãæ¸è©ããéæã¡ç¶ãã貧ä¹ç¶ãããæ°ç¤¾ä¼äººã¸ã®ããéãã®è¬ç¾© : http://www.procrasist.com/entry/2016/10/29/000000 0.217741935484 ããã°ã¯ä¸å¹´ç¶ãã®ï¼èªè æ°ã¯ï¼2ä¸ä»¶ã®ã¯ã¦ãªããã°ã§åæãã : http://www.procrasist.com/entry/blog-analyzer 0.217741935484 ããªå±è¡ã£ã¦ããã天æéããä½åã¿ã¤ãã«ã©ã³ãã³ã° : http://www.procrasist.com/entry/dali 0.217741935484 åè ã¨ã·ãã³ãå§ã¾ã£ãã®ã§ãã¡ã¬ãã®ä»ã¾ã§ã®åªæãæ¯ãè¿ã£ã¦ã¿ãï¼ : http://www.procrasist.com/entry/2016/10/08/170000 0.00477897252091 ã©ã¤ããæé«ï¼Fall Out Boy (FOB) ã®ç´¹ä»ã¨ãªã¹ã¹ã¡æ²ï¼ : http://www.procrasist.com/entry/2016/11/07/200000 0.00477897252091 æ¯æ¥ã®ãã½ã³ã³çæ´»ãå¿«é©ã«ããäºåè¨å®é : http://www.procrasist.com/entry/pctips 0.00477897252091 1ã¶ææ¸ãç¶ããã°ãã£ã±ãã¶ã¡ä¸ããã®ï¼ããã°æå ±(Â´Îµï½ ) : http://www.procrasist.com/entry/2016/10/31/200000 0.00477897252091 Pentatonixã¨ããæå¼·ã®ããã¢ã«ãã©éå£ãããããæ²ã¯ï¼ : http://www.procrasist.com/entry/2016/11/01/200000 0.00477897252091 æ´ä½å¸«ãæãã¦ããã4ã¤ã®è©ããã»ç®ã®ããã¿å¯¾ç : http://www.procrasist.com/entry/2016/11/04/200000 0.00477897252091 ããã¤ã¹ã¿, NOB, ...ãéæ¥ã彩ãã¡ãã³ã¢ã»ãã³ã¯ããã¯ãã³ãé : http://www.procrasist.com/entry/2016/11/06/200000 ...
ãªãããããã®é¡ä¼¼åº¦ãå ã«ã2次å 空éã«ãããã³ã°ãããã¨ãã§ãã¾ã*5*6ããããªæãï¼
åã®å ´åã¯ã
- ããã°ç³»ã®è¨äº
- æ¥è¨çãªéè¨
- ãã¯ããã¸ã¼ç³»
ã§ãã¯ããã¦ãããæ¹ãã¡ãã£ã¨ãã¤éããã ãªãã£ã¨ããã£ãããã¾ããï¼
ã¾ã¨ã
ãããã§ãããï¼ ããã¯ãã¼ã¯æ°ãããç¨åº¦ãã人ã¯ãããããåææ¹æ³ãæå¹ã ã¨æãã¾ãï¼
試ãã¦ã¿ãããã©ç°å¢ãä½ããªã&ãªããã¨ã©ã¼åºãã£ã¦äººã¯å£°ããã¦ãã ãããªãå¯è½ãªéã対å¿ãã¾ãã
*1:è¨äºå 容ããé¡ä¼¼åº¦ç®åºãå®è£ ãã¦ããã§ãããWindowsã§ããã«ã¯é¢å(mecabãwindowsã«å ¥ããã®ãã´ãªé¢å)ãªãã®ã«ãªã£ãã®ã§ãå ã«2ã¤ç®ããç´¹ä»ãã¾ãã
*2:å調ãã£ã«ã¿ãªã³ã°ã¨è¨ãã¾ãã
*3:ãã®ãã¡ãã®æ©è½ãall in oneã®ã»ãã«å
¥ãã¦ããã¾ãã
【コードで一発】ブログ最適化/SEO対策で面倒なことは全てPythonにやらせよう - プロクラシスト
*4:ç¹æ®ãªæåã³ã¼ããå ¥ã£ã¦ããå ´åã¯ä¿åããã¾ãããä»æ§ã§ãã
*5:å¤æ¬¡å 尺度æ³ã使ã£ã¦ã¾ãã
*6:ã°ã©ãã«é¢ãã¦ã¯è©³ããã¯hokekiyooã®Githubãã¼ã¸ã®å®è£
ãã覧ãã ããã以åè¨äºã«ãæ¸ãã¦ãã¾ãã
ipywidgetsとbokehで『jupyter』の更なる高みへ 【インタラクティブなグラフ描画】 - プロクラシスト