追è¨ï¼2012å¹´2æ24æ¥ï¼
ãã¶ããã¡ã½ããã®ä½¿ãæ¹ãéã£ã¦ããã
æ¬æ
twitterã¯æµããæ©ãããã®ã¨ãæ¬å®¶Twitterãç¾å¨ããéå»ã«åãã¦çºè¨ãæµããããã§ãè°è«ã追ãæ°ã«ãªããªãããã£ããã#shiwake3ã#f_o_sããããããããªã®ã«ã
ããã§ãTwitter APIã使ã£ã¦ãããã·ã¥ã¿ã°ã®çºè¨ãéå»ã«é¡ã£ã¦åå¾ãããããèªåã®è¦ãããå½¢ã«æ ¼ç´ããããããã«ãããã¨ã«ããããã®ã¨ã³ããªã¼ã§ã¯ãTwitter APIã使ã£ã¦ããããã·ã¥ã¿ã°ãã¤ããçºè¨ãé¡ãã¨ããã¾ã§ãæ¸ãã
Ruby twitter gemã®ã¤ã³ã¹ãã¼ã«
Ruby gemsã使ã£ã¦ã¤ã³ã¹ãã¼ã«ãããä»å試ããç°å¢ã¯ä»¥ä¸ã®ã¨ããã
- Debian GNU/Linux Squeeze
- Ruby 1.8.7 (ruby -v)
- RubyGems 1.3.5 (gem1.8 -v)
- twitter gem 0.7.5
ã¾ããgemã®ã¢ãããã¼ã
# update_rubygems ã # gem1.8 update --system
次ã«twitter gemã®ã¤ã³ã¹ãã¼ã«
# gem1.8 install twitter
çºè¨åå¾ã¹ã¯ãªããï¼ç¾å¨ããéå»ï¼
以ä¸ã®ãµã¤ããåèã«ã¹ã¯ãªãããèæ¡ã
- igaiga diary:TwitMusic でみんながどんな曲を聴いてるのか集計するコードを書いてみた
- Twitter API Wiki
- Twitter gemの簡単な使い方(英語)
- Twitter gemのリファレンス(英語)
注æãã¹ãç¹ã¯ä»¥ä¸ã®ã¨ããã
- Twitter APIã¯ã1度ã®ã¢ã¯ã»ã¹ã§æ大100件ããçºè¨ãè¿ãã¦ãããªã
- Twitter APIã¯ãææ°ããéå»ã¸åãã£ã¦çºè¨ãè¿ãã¦ãã
- ããIPã¢ãã¬ã¹ããTwitter APIã¸ã¢ã¯ã»ã¹ã§ããã®ã¯1æéããã100åã¾ã§ï¼ããã¯å¤åãããããï¼
èããã¹ã¯ãªããã¯ä»¥ä¸ã®ã¨ããã
$KCODE = "UTF-8" # ããã·ã¥ã¿ã°ã®æå® tag = 'f_o_s' # åå¾ããçºè¨ã®idããã®å¤æ°ã§å¶éãåæå¤ã¨ãã¦ãææ°ã®çºè¨idãè¨å® twit_status_id_max = 5916066331 flg = true # å¾ã§æç³»åé ã«ãããã®ã§ä¸åº¦é åã«æ ¼ç´ messageArray = Array.new while(flg) do # ãã®ã«ã¦ã³ã¿ã¼ã§çºè¨ãåå¾ã§ããªããªã£ããã¨ãå¤å®ããã counter = 0 # max(id) ã§çºè¨ã®æ°ãããå¶éããã Twitter::Search.new.hashed(tag).max(twit_status_id_max).per_page(100).each{|msg| tmpArray = Array.new localtime = Time.parse(msg.created_at).localtime # ææ°ã®çºè¨idããéå»ã«åãã¦ã©ãã©ãé¡ãããã if msg.id <= twit_status_id_max twit_status_id_max = msg.id end tmpArray.push(msg.id.to_s,msg.from_user,msg.text.strip,localtime) messageArray.push(tmpArray) counter = counter + 1 } # ä½ãç»é¢ã«ã§ãªãã¨æãã®ã§é²æç¶æ³ã®è¡¨ç¤º puts twit_status_id_max.to_s+':'+counter.to_s if counter == 0 flg = false end sleep(10) end # åå¾ããçºè¨ãCSVãã¡ã¤ã«ã§ä¿åã # Rubyæ¨æºã®CSVã©ã¤ãã©ãªã¯è¿½è¨ããªãã®ã§ # http://d.hatena.ne.jp/unageanu/20080824/1219576777 ã®ã¹ã¯ãªããã # å©ç¨ããã¦ããã ãã¦ããã require "csv" # 追è¨ããµãã¼ãããããã«æ¹é ã class << CSV alias_method( :open_org, :open ) def open( path, mode, fs=nil, rs=nil, &block ) if mode == "a" || mode == "ab" open_writer( path, mode, fs, rs, &block) else open_org( path, mode, fs=nil, rs=nil, &block ) end end end # CSVãã¡ã¤ã«ã«åºå output_file = tag+'.csv' CSV.open(output_file, 'a'){| writer | messageArray.reverse.each{| tmpArray | writer << tmpArray } }
ä¸è¨ã®ã¹ã¯ãªããã¯ãã¾ãã¹ãã¼ãã§ã¯ãªãã
- çºè¨ã10,000件以ä¸ã ã¨ãéä¸ã§ç°å¸¸çµäºããï¼çºè¨ãåå¾ã§ããªãã£ãã¨ãã®ä¾å¤å¦çãç¨æãã¦ããªãããï¼
- Twitterã®ææ°çºè¨ã®idãå¿ è¦ã¨ãªãã
- ããã·ã¥ã¿ã°ãTwitterã®ææ°çºè¨ã®idããã¼ãã³ã¼ãã£ã³ã°ï¼ã¹ã¯ãªããä¸ã«åãè¾¼ã¾ãã¦ããï¼ããã¦ãã
çºè¨åå¾ã¹ã¯ãªããï¼æ°ããçºè¨ã追å ã§è¨é²ï¼
cronãªã©ã§åãã¦ãææ°çºè¨ã ããåå¾ãããå ´åã¯ä¸ã®ã¹ã¯ãªããã§
twit_status_id_max = ææ°ã®çºè¨id Twitter::Search.new.hashed(tag).max(twit_status_id_max).per_page(100).each{|msg| ããçç¥ã }
ã¨ãªã£ã¦ããé¨åã
twit_status_id_max = åååå¾ããææ°çºè¨ Twitter::Search.new.hashed(tag).since(twit_status_id_max).per_page(100).each{|msg| ããçç¥ã }
ã¨ããã°è¯ãã¯ãããã¨ã微調æ´ãå¿ è¦ã ãã©ã