ç¶ããæ¸ãã¾ããã
Ruby Advent Calendar 22æ¥ç®ã®è¨äºã§ã
Rroongaã¨ããRubyã§ä½¿ããå ¨ææ¤ç´¢ã¨ã³ã¸ã³ããã£ã¦æç¨ãã¦ããã®ã§ããã使ãåã®æºåã§ã«ã©ã æå®ããã¼ã¿åãæå®ããããå ¨ææ¤ç´¢ã®ããã®ã¤ã³ããã¯ã¹ãã¼ãã«ãä½ãã®ãå°ã大å¤ã§ãã(大è¦æ¨¡ãªã¢ããªã±ã¼ã·ã§ã³ã®æã¯ãã£ããå®ç¾©åºæ¥ãã®ã§ä¾¿å©ãªã®ã§ãã)ã
æ®æ®µä½¿ãã§å ¨ææ¤ç´¢ããããã«ãå®é¨çã«Rubyã®Arrayã®ããã«ä½¿ããããã«ãã¦ã¿ã¾ããã
ã¤ã³ã¹ãã¼ã«
Rroongaã使ãã«ã¯gemã®ã¤ã³ã¹ãã¼ã«ãå¿ è¦ã§ããä»ã®å ¨ææ¤ç´¢ã¨ã³ã¸ã³ã¨éã£ã¦ãã以å¤ã®ã½ããã¦ã§ã¢ã®ã¤ã³ã¹ãã¼ã«ãä¸è¦ãªã®ãããæã§ããWindowsã§ãåé¡ãªãåãã¾ãã
$ gem install rroonga
ä»åæ¸ããã³ã¼ãã¯ä»¥ä¸ã«ã¾ã¨ãã¦ããã¾ãã
$ git clone https://github.com/ongaeshi/grn_array.git $ cd grn_array/
ããã§æºåå®äºã§ãã
åºæ¬çãªä½¿ãæ¹
ãã¼ã¿ã追å ãã¦æ¤ç´¢ããã¾ã§ã®ãµã³ãã«ã³ã¼ãã§ããRroongaã¨æ¯ã¹ãã¨å²ã¨ç°¡åã«ä½¿ããã®ã§ã¯ãªããã¨æãã¾ãã
require_relative './grn_array' # é åã®çæ array = GrnArray.new("db/simple.db") # ãã¼ã¿ã®è¿½å if array.empty? array << {text:"aaaa", name: "a.txt"} array << {text:"BBBB", name: "b.txt"} array << {text:"cccc", name: "c.txt"} end # æ¤ç´¢(ããã©ã«ãã«ã©ã ã¯text) results = array.select("bb OR cc") ï¼ çµæã表示 results.each do |record| puts name: record.name, text: record.text end
å®è¡ããã¨ãããããã¬ã³ã¼ããè¿ãã¾ãã
$ ruby simple.rb {:name=>"b.txt", :text=>"BBBB"} {:name=>"c.txt", :text=>"cccc"}
GrnArray#new(path)
ã§ä¿åå ã®ãã¼ã¿ãã¼ã¹åãæå®ãã¾ããdb/simple.db*
ããã¼ã¿ãã¼ã¹æ¬ä½ã§ãããã§ã«ãã¼ã¿ãã¼ã¹ãåå¨ãã¦ããæã¯ãã®ãã¼ã¿ãã¼ã¹ãéãã¾ããGrnArray#<<
ã§ãã¼ã¿ã追å ãã¾ããã·ã³ãã«ããã¼ã«ããããã·ã¥ã«ãã¦æ¸¡ãã¾ããGrnArray#select(query)
ãæ¤ç´¢ã§ããã©ã®ãããªã³ãã³ãã使ãããã¯Groongaã®ã¯ã¨ãªã¼æ§æãè¦ã¦ä¸ãããtext:
ã§æ¸¡ãããã¼ã¿ãããã©ã«ãã«ã©ã ã«ãªãã¾ãã- æ¤ç´¢çµæã¯IEnumerableãªã®ã§eachã§åãã¾ããåãåã£ãã¬ã³ã¼ãã¯ã·ã³ãã«ã¨åãååã®ã¡ã½ããã§ã¢ã¯ã»ã¹å¯è½ã§ãã
ã©ããããéãã®ï¼
Twitter風ã®ä¸è¡ããã¹ãã 100, 1000, 10000, 100000, 1000000 .. ã¨å¢ãããªãã Array#grep ã¨æ¤ç´¢é度ãæ¯è¼ãã¦ã¿ã¾ãã
require_relative './grn_array' require 'benchmark' GrnArray.tmpdb do |array| native_array = [] texts = File.read('dummy/dummy1.txt').split TEST_TIMING = [100, 1000, 10000, 100000, 1000000] DATA_NUM = TEST_TIMING[-1] test_index = 0 DATA_NUM.times.each do |index| text = texts[rand(texts.size)] array << {text: text} native_array << text if (array.size == TEST_TIMING[test_index]) puts "-- #{array.size} --" Benchmark.bm(16) do |x| x.report("GrnArray#select") { 100.times { array.select("ããã") } } x.report("Array#grep") { 100.times { native_array.grep(/ããã/) } } end test_index += 1 end end end
GrnArray#tmpdbã¯ãã³ãã©ãªã«ãã¼ã¿ãã¼ã¹ãä½ã£ã¦ãããã¯ãæãããåé¤ãããã®ã§ããã¹ãããã°ã©ã ãªã©ã§ãã¡ãã¡ãã¼ã¿ãã¼ã¹ã管çããããªãæã«ä¾¿å©ã§ãã å®è¡ãã¦ã¿ã¾ãããã
$ ruby benchmark.rb -- 100ã®ãã¼ã¿ã100åæ¤ç´¢ -- user system total real GrnArray#select 0.020000 0.010000 0.030000 ( 0.036112) Array#grep 0.000000 0.000000 0.000000 ( 0.006804) -- 1000ã®ãã¼ã¿ã100åæ¤ç´¢ -- user system total real GrnArray#select 0.020000 0.010000 0.030000 ( 0.032740) Array#grep 0.070000 0.010000 0.080000 ( 0.067054) -- 1ä¸ã®ãã¼ã¿ã100åæ¤ç´¢ -- user system total real GrnArray#select 0.040000 0.020000 0.060000 ( 0.047148) Array#grep 0.660000 0.000000 0.660000 ( 0.680305) -- 10ä¸ã®ãã¼ã¿ã100åæ¤ç´¢ -- user system total real GrnArray#select 0.180000 0.040000 0.220000 ( 0.217387) Array#grep 6.600000 0.030000 6.630000 ( 6.674841) -- 100ä¸ã®ãã¼ã¿ã100åæ¤ç´¢ -- user system total real GrnArray#select 1.520000 0.340000 1.860000 ( 1.972344) Array#grep 68.320000 0.390000 68.710000 ( 71.439266)
ãã¼ã¿æ°ãå°ãªãæã¯ããã»ã©å¤ããã¾ãããããã¼ã¿æ°ãå¢ãã¦ããã¨GrnArray#selectãã©ãã©ãé«éã«ãªãã¾ãã
ã¹ãããã
å ¨ææ¤ç´¢ã¨ã³ã¸ã³ã使ãéã«ãããä¸ã¤ç¥ã£ã¦ããã¨ä¾¿å©ãªæ©è½ãããã¾ãã ã¹ããããã¨ãKWICã¨ãããã®ã§ããã¼ã¯ã¼ãå¨è¾ºã®æç« ã表示ããããã®æ©è½ã§ãã詳ããã¯groongaãRackã«è¼ãã¦å ¨ææ¤ç´¢ - ãã¼ã¯ã¼ãå¨è¾ºã®æç« ã®è¡¨ç¤ºãã©ããã
å ã»ã©ã®ãããªTwitter風ã®çãããã¹ããªããããããå ¨ããã¹ãã表示ããã°ããã®ã§ãããGoogleæ¤ç´¢ã®ããã«ä¸ã¤ã®ã¬ã³ã¼ãã«é·ãæç« ãå«ã¾ãã¦ããå ´åã¯ãããå¨è¾ºã ãã表示ããæ¹ã便å©ã§ãã
GrnArrayã§ã¹ããããã使ãã«ã¯æ¤ç´¢çµæã®GrnArray::Result
ã«å¯¾ãã¦Result#snippet_text(open_tag, close_tag)
ã使ãã¾ããçæããã¹ããããã«å¯¾ãã¦ãããå¨è¾ºã ãã表示ãããè¦ç´ ã渡ã(snippet.execute(record.text)
)ã¨ããããåæãæå®ãããããªãã¿ã§å²ã¾ãããã®ãé
åã¨ãã¦è¿ããã¾ããè¥å¹²è¤éãªã®ã§ã³ã¼ããè¦ãã®ãä¸çªåãããããã¨æãã¾ãã
require_relative './grn_array' GrnArray.tmpdb do |array| # ãã¼ã¿ã追å array << {name: 'dummy1.txt', text: File.read('dummy/dummy1.txt') } array << {name: 'dummy2.txt', text: File.read('dummy/dummy2.txt') } # 'ããã©ã'ã§æ¤ç´¢ results = array.select('ããã©ã') # ã¹ããããã®ä½æ snippet = results.snippet_text('<<', '>>') results.each do |record| puts "--- #{record.name} ---" # ã¹ãããããé©ç¨ãããã®ãæ¤ç´¢çµæã¨ãã¦è¡¨ç¤º snippet.execute(record.text).each do |segment| puts segment.gsub("\n", "") end end end
å®è¡çµæã¯ä»¥ä¸ã®ããã«ãªãã¾ãã
$ ruby snippet.rb --- dummy1.txt --- ãããã§ã<<ããã©ã>>å°ããè´é£ããªãã®ã¾ããªãã<<ããã©ã>>ãç¬ããèªç± ã§ãç¡è«ã¯æã人ã£ã¦æ¥ãã§ã<<ããã©ã>>ãç§ã«ã¯æéä¸ã¾ã§ç§ã®ãéå㯠å¦ç¿ã®æå¾ããããã¾ãã®ã§ã<<ããã©ã>>ãä½ãããã¦ããã®æ´»åé¢ã¨ãã --- dummy2.txt --- æ¥äºããã ããããã¨ãããã¯<<ããã©ã>>ã®ãã®ãã¨ããã¾ãã¦ãããã ã®å£°ããããã¾ããã¾ã¾ããã<<ããã©ã>>ãã£ã¨ããã®ããã¾ããã°ãã«ã ãã¨ããã°ããåºãã¨ãããã<<ããã©ã>>ãã©ãã¯ããé ããããããã¦
ãããç®æã'<<','>>'ã§å²ã¾ãã¦è¡¨ç¤ºããã¾ãã(æ¤ç´¢ãµã¤ãã£ã½ããªã£ã¦ãã¾ããã)ãHTMLãçæãããæ㯠Result#snippt_htmlã¨ããã®ãããã¾ãã
GrnArrayã®ã½ã¼ã¹
ä»åä½ã£ãGrnArray(grn_array.rb)ã¯100è¡ç¨åº¦ã®Rroongaã®èãã©ããã¼ã§ããRroongaã®ç°¡åãªãµã³ãã«ã³ã¼ãã¨ãã¦ã使ãä¸ããã
# -*- coding: utf-8 -*- require 'groonga' require 'tmpdir' class GrnArray include Enumerable def self.tmpdb Dir.mktmpdir do |dir| yield self.new(File.join(dir, "tmp.db")) end end def initialize(path) unless File.exist?(path) Groonga::Database.create(path: path) else Groonga::Database.open(path) end unless Groonga["Array"] @grn = Groonga::Array.create(name: "Array", persistent: true) @terms = Groonga::PatriciaTrie.create(name: "Terms", key_normalize: true, default_tokenizer: "TokenBigramSplitSymbolAlphaDigit") else @grn = Groonga["Array"] @terms = Groonga["Terms"] end end def <<(value) if @grn.empty? value.each do |key, value| column = key.to_s @grn.define_column(column, "Text") # ãã¼ã¿åã¯"Text"決ããã¡ @todo valueã®å種é¡ãå ã«é¡æ¨åºæ¥ãã¯ã @terms.define_index_column("array_#{column}", @grn, source: "Array.#{column}", with_position: true) end end @grn.add(value) end def select(query) Results.new(@grn.select(query, {default_column: "text"})) # textã«ã©ã ãæ¤ç´¢æã®ããã©ã«ãã«ã©ã ã¨ãã end def size @grn.size end def empty? size == 0 end def each @grn.each do |record| yield record end end class Results attr_reader :grn include Enumerable def initialize(grn) @grn = grn end def each @grn.each do |r| yield r end end def size @grn.size end def snippet(tags, options = nil) @grn.expression.snippet(tags, options) end def snippet_text(open_tag = '<<', close_tag = ">>") @grn.expression.snippet([[open_tag, close_tag]]) end def snippet_html(open_tag = '<strong>', close_tag = "</strong>") @grn.expression.snippet([[open_tag, close_tag]], {html_escape: true}) end end # ãã®å ... def [] end def []= end def clear end end
ã¤ã¶ãã
- ããã¼ããã¹ãã«ã¯ãã使ããããã¼ããã¹ã - æ¥æ¬èª Lorem ipsumã使ããã¦é ãã¾ãã
- GrnArray#select(query)ã®ä½¿ãæ¹ã¯8.10.1. ã¯ã¨ãªã¼æ§æãåèã«ãã¦ä¸ãã
- Rroongaãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ã¯ãå³ä¸ã®[Class List], [Method List]ã使ãã®ããã¤ã³ãã§ã
- åãã¦Rroongaã使ã人㯠ãã¡ã¤ã«åæ¤ç´¢ãé«éåããããã«Rubyã¨Groongaï¼Rroongaï¼ã使ã£ã話ï¼Windows対å¿ï¼ ãåãããããã§ãã
- GrnArrayã¯ç¾ç¶ããã¹ããã追å åºæ¥ãªãã®ã§ãããRroongaèªä½ã¯æ°å¤ãä½ç½®æ
å ±ãä¿æåºæ¥ãã®ã§å¯¾å¿ããä½å°ã¯ããããã§ã
- æ°å¤ã¨ãã¦ç»é²ããã¨ããã«ã©ã ã®å¤ã100以ä¸ãã¿ãããªæ¤ç´¢ãåºæ¥ãããã«ãªãã¾ã