ãããããªãã¢ã¤ãã¢ãã²ããããã®ã§ããï¼ãããå®ç¾ããããã«ï¼Rubyã§ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ã¨ï¼ãã®è·é¢ã§ãããã¨ã確èªã§ããï¼æååå¤åã®ç³»åãæ±ããã³ã¼ããæ¸ãã¦ã¿ã¾ããï¼
åèã«ãããã®ã¯ä»¥ä¸ã®ã¨ããï¼
- Wikipedia:ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢
- 編集距離 (Levenshtein Distance) - naoyaのはてなダイアリー
- ruby でレーベンシュタイン距離(編集距離)の計算 - Λάδι Βιώσας
- RAA Search (Levenshtein)*1
#!/usr/bin/env ruby # -*- coding: utf-8 -*- # levenshtein-distance.rb class LevenshteinDistance def initialize(str1, str2) @before, @after = str1, str2 analyze end attr_reader :before, :after def before_substr(len); @before[0, len].join; end def after_substr(len); @after[0, len].join; end def analyze @before = @before.split(//u) if String === @before @after = @after.split(//u) if String === @after col = @before.size + 1 row = @after.size + 1 @dist = row.times.inject([]) {|a, i| a << [0] * col} @seq = row.times.inject([]) {|a, i| a << [""] * col} @dir = row.times.inject([]) {|a, i| a << [9] * col} col.times {|i| @dist[0][i] = i; @seq[0][i] = (i == 0) ? [""] : @seq[0][i - 1] + [before_substr(i)] } row.times {|i| @dist[i][0] = i; @seq[i][0] = (i == 0) ? [""] : [after_substr(i)] + @seq[i - 1][0] } @before.size.times do |i| @after.size.times do |j| cost = (@before[i] == @after[j]) ? 0 : 1 x = i + 1 y = j + 1 d1 = @dist[y][x - 1] + 1 d2 = @dist[y - 1][x] + 1 d3 = @dist[y - 1][x - 1] + cost @dist[y][x] = dmin = [d1, d2, d3].min case dmin when d1 @seq[y][x] = @seq[y][x - 1] + [before_substr(i + 1)] @dir[y][x] = 1 when d2 @seq[y][x] = [after_substr(j + 1)] + @seq[y - 1][x] @dir[y][x] = 2 else @seq[y][x] = @seq[y - 1][x - 1].map {|s| s + @before[i]} if cost == 1 @seq[y][x].unshift(after_substr(j + 1)) end @dir[y][x] = 3 end end end end def distance @dist[-1][-1] end def sequence @seq[-1][-1].reverse end def debug_print require "pp" pp @dist @seq.each_with_index do |f, i| f.each_with_index do |g, j| puts "@seq[#{i}][#{j}]<#{@dir[i][j]}> = #{g.inspect}" end end end end if __FILE__ == $0 [["abc", "abc"], ["kitten", "sitting"], ["aaaaa", "bbbbb"], ["ããã®ã", "ããã²ãm"]].each do |str1, str2| lev = LevenshteinDistance.new(str1, str2) puts "distance(#{str1}, #{str2}) = #{lev.distance}" puts "sequence(#{str1}, #{str2}) = #{lev.sequence.inspect}" if $DEBUG lev.debug_print end end end
èãæ¹ã¯åç´ã§ï¼è·é¢ãæ ¼ç´ãã2次å
ã®ãªã¹ã@distã®ã»ãã«ï¼æååã®å¤åãä¿æãããªã¹ã@seqãã¤ããã¾ããï¼
ããå°ãå
·ä½çã«æ¸ãã¨ï¼@dist[y][x] ã¯ï¼ã1çªç®ã®æååã®å
é ããxæååãã¨ã2çªç®ã®æååã®å
é ããyæååãã®ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ã¨ãªãã¾ãï¼xãyã0ã®ã¨ãã¯ï¼ç©ºæååã§ãï¼
@seq[y][x]ã¯ï¼ã2çªç®ã®æååã®å
é ããyæååãããã1çªç®ã®æååã®å
é ããxæååã*2ã¸ã®å¤åã表ãæååã®ãªã¹ãã§ï¼ãã®è¦ç´ æ°ã¯ï¼@dist[y][x]+1ã«ãªãã¾ãï¼ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ã1ãªãï¼ã´ã¼ã«ï¼ã¹ã¿ã¼ãã®2åã®æååã§ãï¼ï¼
xã1çªç®ã®æååã®é·ãï¼yã2çªç®ã®æååã®é·ãã«çããã¨ãã®ï¼@dist[y][x]ãæ±ãããã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ã§ï¼@seq[y][x]ãå転ããããã®ãï¼ç¥ãããæååå¤åã¨ãªãã¾ãï¼
ããã¾ã§åæ§ï¼Ruby1.8ï¼1.9両対å¿ãèæ
®ãã¦ï¼æååã¯1æååä½ã§å解ããé
åã«å¤æãã¦ä½¿ç¨ãã¦ãã¾ãï¼@dirã¯åä½ç¢ºèªç¨ã§ãï¼2次å
ã®è¡¨ã®å·¦ã®å¤ã使ç¨ãã¦ï¼@dist[y][x]ãæ±ããã®ãªãï¼@dir[y][x]ã«ã¯1ãå
¥ãï¼ä¸ãªã2ï¼å·¦ä¸ãªã3ï¼åæè¨å®ï¼æä¸æ®µï¼æå·¦åï¼ã¯9ã¨ãã¦ãã¾ãï¼
å®è¡çµæã¯ä»¥ä¸ã®ã¨ããï¼
$ ruby levenshtein-distance.rb distance(abc, abc) = 0 sequence(abc, abc) = ["abc"] distance(kitten, sitting) = 3 sequence(kitten, sitting) = ["kitten", "sitten", "sittin", "sitting"] distance(aaaaa, bbbbb) = 5 sequence(aaaaa, bbbbb) = ["aaaaa", "baaaa", "bbaaa", "bbbaa", "bbbba", "bbbbb"] distance(ããã®ã, ããã²ãm) = 2 sequence(ããã®ã, ããã²ãm) = ["ããã®ã", "ããã²ã", "ããã²ãm"]
ãruby -d levenshtein-distance.rbããå®è¡ããã¨ï¼@distï¼@seqï¼@dirã®ä¸èº«ãç¥ããã¨ãã§ãã¾ãï¼
ããããããªãã¢ã¤ãã¢ãã¨ã¯ä½ãã§ããï¼ãã¿ã±ãã³ãã§ããã¯ã«ã¤ããã§ãä½ã§ãããã®ã§ï¼ã«ã¿ã«ãæ¸ãã®æååã¨ï¼ãã±ã¢ã³ã®åå称ã¨ã§ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ãæ±ãï¼æå°ã®ãã±ã¢ã³åã¨ï¼ãã®æååã®å¤åãç¥ããããªã£ãã®ã§ããï¼ãã·ã®ããããã¢ã«ã»ã¦ã¹ã¾ã§æ¸ããã¦ãããã±ã¢ã³ããããWebã§è¦ã¤ãã¦ãã¦ã³ãã¼ããï¼ããä¸ã¤Rubyã¹ã¯ãªãããæ¸ãã¦å®è¡ããã¨ããï¼ãã¿ã±ãã³ããã¯ã«ã¤ããã¨ãï¼ãã±ã¢ã³åã®èªç¾¤ã®ä¸ã§æå°ã®ã¬ã¼ãã³ã·ã¥ã¿ã¤ã³è·é¢ã¯ï¼3ã§ããï¼
*1:è¦ã¤ãã£ã2ã¤ã®ã©ã¤ãã©ãªã«ã¤ãã¦ï¼ã½ã¼ã¹ã¾ã§è¦ã¾ãããï¼è·é¢ãæ±ããã ãã§ï¼æååã®å¤åãæ±ãã¦ãã¾ããã§ããï¼
*2:æ¹åãéãªã®ã¯ï¼åèã«ããã³ã¼ãã«ï¼ãã®å¤ãæ±ããå¦çãä»ãå ããããã§ï¼å°ã ã®å¤æ´ã§ï¼éé ã«ãªããªãããã«ã§ãããã§ãï¼