Bæ¨
æ¨å¹´ããç¶ãã¦ããã¢ã«ã´ãªãºã ã¤ã³ãããã¯ã·ã§ã³è¼ªè¬ããæ©ããã®ã§æ¬¡ã¯18ç« ã§ãã18ç« ã®ãã¼ãã¯Bæ¨(B Tree, Bããªã¼) ã§ããBæ¨ã¯ãã«ãã¦ã§ã¤å¹³è¡¡æ¨(å¤åæ¨ã«ãã平衡æ¨)ã§ããã¼ã¿ãã¼ã¹ããã¡ã¤ã«ã·ã¹ãã ãªã©ã§ãè¯ã使ãããéè¦ãªãã¼ã¿æ§é ã§ããBæ¨ã¯ä¸ã¤ã®æ¨ã®é ç¹ã«ã¶ãä¸ããæã®æ¬æ°ã®ä¸éã¨ä¸éãè¨ããä¸ã常ã«å¹³è¡¡æ¨ã§ãããã¨ãå¶ç´ã¨ãããã¼ã¿æ§é ã«ãªãã¾ãã
輪è¬ã®äºç¿ãã¦ããBæ¨ã Python ã§å®è£ ãã¦ã¿ã¾ãããã½ã¼ã¹ã³ã¼ããæå¾ã«æ²è¼ãã¾ãã以ä¸ã¯ Bæ¨ã«é¢ããèå¯ã§ãã
Bæ¨ããªãéè¦ãªã®ã
Bæ¨ãéè¦ãªã®ã¯ãBæ¨(ã®å¤ç¨®ã§ããB+æ¨*1ãªã©)ãäºæ¬¡è¨æ¶è£ ç½®ä¸ã§å¹çè¯ãæä½ã§ããããã«è¨è¨ããããã¼ã¿æ§é ã ããã§ãããã¼ã¿ãã¼ã¹ãå©ç¨ããã¦ã§ãã¢ããªã±ã¼ã·ã§ã³ãªã©ãäºæ¬¡è¨æ¶(ãã¼ããã£ã¹ã¯)ä¸ã®å¤§éã®ãã¼ã¿ãæ±ãã½ããã¦ã§ã¢ãéç¨ããçµé¨ãããæ¹ãªããããã«ãã£ã¹ã¯I/Oãå¹çè¯ãè¡ããããããã©ã¼ãã³ã¹ç¶æã®è¦ã§ããã¨ãããã¨ã¯è¯ããåç¥ãã¨æãã¾ãããã®ãã£ã¹ã¯ä¸ã®ãã¼ã¿ãå¹çè¯ãæä½ã§ãããã¼ã¿æ§é ãªã®ã§ããããBæ¨ã¯ãæ¨ä»æãéè¦ãªãã¼ã¿æ§é ã®ã²ã¨ã¤ã§ããã¨ãè¨ããã§ãããã
è¿é ã®ã³ã¢ãã£ãã£ãªè¨ç®æ©ã®2次è¨æ¶è£ ç½®ã¨è¨ãã°ãã¼ããã£ã¹ã¯ã§ããããã£ã¹ã¯ã¢ã¯ã»ã¹ã«ã¯ãã®æ§é ä¸ããã¼ã¿ã®æ¢ç´¢ã«åç¤ã®åè»¢å¾ ã¡ã¨ãããã®ç§»åãä¼´ãã¾ãããã®åç¤ã®å転ã¨ãããã®ç§»åã¯é»æ°çãªæä½ãããªãã¡ç£æ°ããã®èªã¿åºãã«æ¯ã¹ãã¨é常ã«ä½éã§ãããªç§åä½ã®æéããããã¾ããä¸æ¹ãä¸æ¬¡è¨æ¶è£ ç½®ã®ã¡ã¢ãªã¸ã®ã¢ã¯ã»ã¹æéã¯ããç§åä½ã§ããã£ã¹ã¯ã¨ã®å·®ã¯10é²ã§5æ¡ä»¥ä¸ã«ããªãã¾ãããã£ã¹ã¯ãããã¼ã¿ãä¸åæ¢ãåºãã®ã¨ãã¡ã¢ãªä¸ã®ãã¼ã¿ãæ¢ãåºãã®ã§ã¯105å以ä¸ã®é度差ãããã¨ãããã¨ã§ãã
ãã®ãã£ã¹ã¯èªã¿æ¸ãã«ä¼´ããªã¼ãã¼ããããæå°åããããã« OS ããã®ãã£ã¹ã¯ã¢ã¯ã»ã¹ã¯ãããã¯åä½ã§è¡ãããã®ãä¸è¬çã§ããä¸åº¦ã«ã¾ã¨ãã¦ãã¼ã¿ãèªã¿åºããããã¾ã¨ã¾ã£ããã¼ã¿ã¯ãã£ã¹ã¯ä¸ã§ãé£æ¥ããã¦é ç½®ãããªã©ãã¦åè»¢å¾ ã¡ã極åæãã工夫ããªããã¦ãã¾ãã
ãã¦ãBæ¨ã¯å¤åæ¨ã§ããæ¨ã®åé ç¹ã«æããããã¼ã®åæ°ã調æ´ãããã¨ã§ããã®é ç¹ã®ãµã¤ãºãä»»æã®ãµã¤ãºã«åºå®ããããã¨ãã§ãã¾ãããã®ãµã¤ãºããããã¯ãµã¤ãºã¨åç¨åº¦ã«è¨è¨ããæ¨ã®é ç¹ãä¸ã¤ã®ãããã¯ã«å¯¾å¿ãããã¨ãæ¨ã®é ç¹ã®ä¸åã®ç§»åããã£ã¹ã¯ã®ä¸åã®èªã¿åºãã«å¯¾å¿ãã¾ããåãé ç¹å ã®ãã¼ã¿ã§ããã°ãã£ã¹ã¯èªã¿åºãã¯ä¸åº¦ã ãã§æ¢ç´¢ãå¯è½ã«ãªãã¾ãã
Bæ¨ã¯å¹³è¡¡æ¨ã§ããã®é«ãã¯å¸¸ã«ãã¼ã¿ä»¶æ°ã®å¯¾æ°ãªã¼ãã¼ã§ããä¾ãã° 100ä¸åã®ãã¼ã¿ããã£ã¦ãåæ¨ã®é ç¹ã m = 200 ã®æãæã¦ãã¨ããã¨ããã®é«ãã¯ãã£ãã® 3 ã§ããæ¢ç´¢ã«ãããè¨ç®éã¯ãã®é«ãåã§ãããã100ä¸ä»¶ã®ãã¼ã¿ã«å¯¾ãã¦ã»ãã®æ°å(æ大4å)ã®æ¢ç´¢ã§ç®çã®ãã¼ã¿ãç¹å®ã§ãã¾ããããã¦ããã¾ãè¨è¨ãããBæ¨ã§ã¯æ¨ã®é ç¹ã®ä¸åã®ç§»å = ãã£ã¹ã¯èªã¿åºãä¸åã§ãããããã®æ°åã®æ¢ç´¢ããã®ã¾ã¾ãã£ã¹ã¯ããã®ãããã¯èªã¿åºãåæ°ã«ç¸å½ãã¾ãããã®ããã«ãBæ¨ãªãããã£ã¹ã¯ã®ãªã¼ãã¼ãããæãã¦å¤§éã®ãã¼ã¿ãæ±ããã¨ãã§ãã¾ãã
Oracle ã MySQL ãªã©è¿å¹´ã® RDBMS ã¯ã¤ã³ããã¯ã¹ã®ãã¼ã¿æ§é ã«Bæ¨ (ã®å¤ç¨®) ãæ¡ç¨ãã¦ãã¾ããããã®çç±ã¯ããã«ããã¾ãã
SSD ãã³ã¢ãã£ãã£ã«ãªãã¨ç¶æ³ã¯å¤ãã (追è¨ã¢ãª)
ã¨ããã§ãæ¨ä»ã¯ SSD ã®é«æ§è½åãã³ã¢ãã£ãã£åãããªãã®å¢ãã§é²ãã§ãã¾ãã
SSD ã¯ãã£ã¹ã¯ã¨éããåç¤ã®åè»¢å¾ ã¡ããããã®ç§»åãä¼´ãã¾ããããã£ã¦ãã¼ããã£ã¹ã¯ã®æã«ãã£ãããªç§åä½ã®ãªã¼ãã¼ããããããã¾ããããã®ãããSSD ã«ã¯ã©ã³ãã ã¢ã¯ã»ã¹ãé常ã«é«éã«è¡ããã¨ãã§ãã¾ãããã®éãã¯é常ã«éè¦ã§ãã(SSD ã«ã¯ SSD ãªãã®ãªã¼ãã¼ããããããã¨ã¯è¨ã) ããã¯ã¢ã¼ããã¯ãã£ã®ã¬ãã«ã§åä½ãç°ãªãæè¡é©æ°ã§ããåã«ãã£ã¹ã¯ã¨æ¯ã¹ã¦ââåé«éã«ãªã£ããã¨ããããããã£ã¨å¤§ããªæå³ãæã£ã¦ãã¾ãã
ããå®ä¾ãããã¾ãã
å æ¥ PFI ãããæ¤ç´¢ã¨ã³ã¸ã³Sedueã®ã»ããã¼ãéå¬ããã¦ãã¾ãã(Sedue ã®æèã§ã¯ã¦ãªããã¯ãã¼ã¯ãä½åº¦ãç´¹ä»ãã¦ããã ããããã§ãããããã¨ããããã¾ãã) ãããã®ä¸ã§æ³¨ç®ãã¹ãã¯ãSSD åãå ¨ææ¤ç´¢ã¨ã³ã¸ã³ãã¨ããç°ä¸ããã®çºè¡¨ã§ããSuffix Array (æ¥å°¾è¾é å) ã SSD ä¸ã«æ§ç¯ã SSD ã®ã©ã³ãã ã¢ã¯ã»ã¹æ§è½ãæ´»ãããã¨ã§ãäºæ¬¡è¨æ¶ä¸ã§ã大éã®ãã¼ã¿ãé«éã«æ¤ç´¢ã§ããããã«ãªã£ãã¨ãã話ã§ããSuffix Array ã®æ¢ç´¢ã«ã¯ãã¼ã¿æ§é ä¸ã§ã®ã©ã³ãã ã¢ã¯ã»ã¹ãä¼´ãã¾ããå¾æ¥ã®ãã¼ããã£ã¹ã¯ã§ã¯ããã®ã©ã³ãã ã¢ã¯ã»ã¹ã«ãããåè»¢å¾ ã¡ã®ãªã¼ãã¼ãããããã£ãããã«äºæ¬¡è¨æ¶ä¸ã§ã®å¤§ã㪠Suffix Array ã®æ´»ç¨ã¯é度é¢ã§å³ããã£ãããã§ãããSSD ã«ãªã£ã¦ãã®éå£ããªããªãã¾ããã
以ä¸ã«å°ãåããå ¬éããã¦ãã Sedue on SSD ã®ãã¢ã³ã¹ãã¬ã¼ã·ã§ã³ãµã¤ããããã¾ãã
- Worlds' Wikipedia Search by ONE Solid State Drive (å ¨è¨èªWikipediaæ¤ç´¢ã¨ã³ã¸ã³)
äºæ¬¡è¨æ¶ä¸ã§ã®æ¤ç´¢ç¨ã®ã¤ã³ããã¯ã¹ã«ã¯ããã¾ã§ Bæ¨ã®ãããªãã£ã¹ã¯ã«æé©ãªãã¼ã¿æ§é ãå¿
ç¶çã«é¸æããã¦ãã¾ããããSSD ã«å¤ããã¨ãç¾å®çã«å©ç¨å¯è½ãªãã¼ã¿æ§é ã«ãå¹
ãåºã¦ãã¢ããªã±ã¼ã·ã§ã³ã«ãã£ã¦ã¯åçãªæ¹åãå¯è½ã«ãªãã¨ããããã§ãã
ãã®å SSD ã®æ®åã«ãã£ã¦ãè²ã ãªã½ããã¦ã§ã¢ã§ãé©ããããªæ¹åãè¡ãããæ©ä¼ãç®ã«ãããã¨ãå¤ããªãã®ã§ã¯ãªããã¨æãã¾ãããã®æãåã« SSD ã«å¤ãã£ãããéããªã£ãã¨æããã®ã§ã¯ãªããã©ã®ãããªãã¼ã¿æ§é ãé¸æããã¦ããã®ãã¼ã¿æ§é ã®ç¹æ§ã SSD ã¨ã©ã®ããã«ãããããã®ãã¨ããè¦ç¹ã§è¦ã¦ãããã¨ã大åã§ã¯ãªããã¨æãã¾ãã
è¿½è¨ (2009å¹´4æ15æ¥)
NAND ãã©ãã·ã¥ã¯ HDD ã¨æ¯ã¹ãã¨ã©ã³ãã ã¢ã¯ã»ã¹æ§è½ã¯ãã£ã¨é«ããã©ããã£ã±ããããã¯ããã¤ã¹ã¨ãã¦ããæ±ããªã (åè: NANDåãã©ãã·ã¥ã¡ã¢ãª - Wikipedia ã¨ã) ã®ã§ãã»ã¨ãã©ã®å ´å B+-tree ã使ãããã£ã¦ã®ã¯å¤ãããªãããããªããã¨ã
id:kazuhooku ããããè¿ä¿¡ãããã ãã¾ããããããã¨ããããã¾ããå°ãç ½ãããã¾ããããåçã
追è¨#2 (2009å¹´4æ16æ¥ 14:30)
id:taroleo ãããæè¿ã®è«æãã¾ã¨ãã¦ãã ããã¾ããã(ååä¸ã ãã§8æ¬èªãã ã£ã¦...ããã) ãããã¨ããããã¾ãã
Python ã«ããBæ¨ã®å®è£ ã®ã½ã¼ã¹
以ä¸ãã¢ã«ã´ãªãºã ã¤ã³ãããã¯ã·ã§ã³ 18ç« ãåèã«å®è£ ããBæ¨ã®å®è£ ã§ããç·´ç¿ç¨ã®å®è£ ãªã®ã§ãç¹ã«ãã£ã¹ã¯ã«ç½®ããã¨ã¯èæ ®ãã¦ãã¾ãããå¤å平衡æ¨ã®åããå®è£ ããã ãã®ã·ã³ãã«ãªå®è£ ã§ãã
æ¸ç±ã§ã¯æ¢ç´¢ (search) ã¨æ¿å ¥ (insert) ã¯æ¬ä¼¼ã³ã¼ããæ示ããã¦ãã¾ãããdelete ã®æ¬ä¼¼ã³ã¼ãã¯è§£çã®ãªãæ¼ç¿åé¡ã®åé¡ã«ãªã£ã¦ãããæ²è¼ããã¦ãã¾ããã解説ãèªã¿ãªããå®è£ ããã®ã§ãããæ¤è¨¼ãä¸ååãªã®ã§ãã°ãããããããã¾ãããæ¬ä¼¼ã³ã¼ãã§ã¯ãã¼ãåç¯ç¹ã§ã®ãã¼ã®æ°ãåã®ä½ç½®ãªã©ã¯å ¨ã¦ã°ãã¼ãã«ãªè¡¨ã§ç®¡çããã¦ãã¾ãããæ¬å®è£ ã§ã¯ããããã¯ã©ã¹å ã«åãã¦ãã¾ãã
#!/usr/bin/env python # -*- coding: utf-8 -*- class BTree: def __init__ (self, t = 2): self.t = t self.root = BTree.Node(t) self.root.is_leaf = True def insert(self, k): r = self.root if len(r) == 2 * self.t - 1: s = BTree.Node(self.t) s.children.append(r) s.split_child(0, r) s.insert_nonfull(k) self.root = s else: r.insert_nonfull(k) def delete(self, k): r = self.root if r.search(k) is None: return r.delete(k) if len(r) == 0: self.root = r.children[0] def search(self, k): return self.root.search(k) def show(self): self.root.show(1) class Node: def __init__(self, t): self.t = t self.keys = [] self.children = [] self.is_leaf = False def __len__(self): return len(self.keys) def search(self, k): i = 0 while (i < len(self) and self.keys[i] < k): i += 1 if i < len(self) and self.keys[i] == k: return (self, i) if self.is_leaf: return else: return self.children[i].search(k) def split_child(self, i, y): t = self.t z = BTree.Node(t) z.is_leaf = y.is_leaf z.keys = y.keys[t:] if not y.is_leaf: z.children = y.children[t:] self.children.insert(i + 1, z) self.keys.insert(i, y.keys[t - 1]) y.keys = y.keys[0:t-1] y.children = y.children[0:t] def locate_subtree(self, k): i = 0 while (i < len(self)): if k < self.keys[i]: return i i += 1 return i def insert_nonfull(self, k): if self.is_leaf: i = 0 for i in xrange(len(self)): if k < self.keys[i]: self.keys.insert(i, k) return self self.keys.append(k) else: i = self.locate_subtree(k) c = self.children[i] if (len(c) == 2 * self.t - 1): self.split_child(i, c) if k > self.keys[i]: c = self.children[i + 1] c.insert_nonfull(k) def show(self, pad): print "%s%s" % ('-' * pad, self.keys) if self.is_leaf: return else: for c in self.children: c.show(pad + 1) ## è¦æ¤è¨¼ def delete(self, k): t = self.t flag = False for i, x in enumerate(self.keys): if k == x: flag = True # ç¾å¨çç®ä¸ã®ç¯ã« k ãè¦ã¤ãã£ã if self.is_leaf: ## 1. ãã¼ã x ã«åå¨ããx ãè self.keys.remove(k) else: ## 2. ãã¼ã x ã«åå¨ããx ãå é¨ç¯ç¹ if i > 0 and len(self.children[i]) > t - 1: ## 2a self.keys[i] = self.children[i].keys.pop() elif len(self.children[i + 1]) > t - 1: ## 2b self.keys[i] = self.children[i + 1].keys.pop(0) else: ## 2c self.children[i].keys += [ self.keys.pop(0) ] + self.children[i + 1].keys del(self.children[i + 1]) self.children[i].delete(k) if not flag: ## 3. ç¾å¨çç®ä¸ã®ç¯ x ã« k ããªãã£ã => k ãããé¨åæ¨ã®æ ¹cã決ãã i = self.locate_subtree(k) c = self.children[i] if len(c) > t - 1: c.delete(k) else: ## 3a. cèªèº«ã¯t-1åãããã¼ãæããªãããå å¼ãt以ä¸æã£ã¦ã if i > 0 and len(self.children[i - 1]) > t - 1: flag = True c.keys.insert(0, self.keys[i - 1]) ## cã¨å·¦å å¼ãåé¢ããxã®ãã¼ãcã« self.keys[i - 1] = self.children[i - 1].keys.pop() ## å·¦å å¼ã®æå¾ã®ãã¼ãxã« elif len(self) > i and len(self.children[i + 1]) > t - 1: flag = True c.keys.append(self.keys[i]) ## c ã¨å³å å¼ãåé¢ããxã®ãã¼ãcã« self.keys[i] = self.children[i + 1].keys.pop(0) ## å³å å¼ã®æåã®ãã¼ãxã« if flag: c.delete(k) else: ## 3b. å·¦å³å å¼ãt - 1ããæã£ã¦ãªãã£ã if i > 0: l = self.children[i - 1] c.keys = l.keys + [ self.keys.pop(i - 1) ] + c.keys del(self.children[i - 1]) else: r = self.children[i + 1] c.keys += [ self.keys.pop(i) ] + r.keys del(self.children[i + 1]) c.delete(k) import sys import random ## order 5 ã® BTree ã®ä¾ tree = BTree(5) list = range(100) random.shuffle(list) for x in list: tree.insert(x) tree.show() if len(sys.argv) > 1: result = tree.search(int(sys.argv[1])) if result: print "HIT: %d" % result[0].keys[result[1]]
å®è¡çµæã¯ä»¥ä¸ã«ãªãã¾ããä»ã®æ¸ç±ã§ã¯ãªã¼ãã¼ã m ã®å ´åãåã m/2 ~ m åã¨ãããã®ãå¤ãã£ãã®ã§ãããã¢ã«ã´ãªãºã ã¤ã³ãããã¯ã·ã§ã³ã§ã¯ãªã¼ãã¼ t ã«å¯¾ã㦠t ~ 2t ã¨ããæ¹å¼ã«ãªã£ã¦ãã¾ãã
% python btree.py 10 -[58] --[5, 14, 21, 30, 36, 42, 48] ---[0, 1, 2, 3, 4] ---[6, 7, 8, 9, 10, 11, 12, 13] ---[15, 16, 17, 18, 19, 20] ---[22, 23, 24, 25, 26, 27, 28, 29] ---[31, 32, 33, 34, 35] ---[37, 38, 39, 40, 41] ---[43, 44, 45, 46, 47] ---[49, 50, 51, 52, 53, 54, 55, 56, 57] --[65, 73, 78, 86, 92] ---[59, 60, 61, 62, 63, 64] ---[66, 67, 68, 69, 70, 71, 72] ---[74, 75, 76, 77] ---[79, 80, 81, 82, 83, 84, 85] ---[87, 88, 89, 90, 91] ---[93, 94, 95, 96, 97, 98, 99] HIT: 10
å°è©±
Bæ¨ã® B ã¯ä½ãªãã ãã¨ãã話ãããããã¾ãã
Rudolf Bayer and Ed McCreight invented the B-tree while working at Boeing[1], but did not explain what, if anything, the B stands for.
ã¨ãããã¨ã§ãã¯ã£ãããã¦ãªãã¿ããã§ãããç¶ãã«ãããã¾ãããBayer ããã®ååãåã£ã¦ Bayer Tree ã§ã¯ãªãã®ããã¨ãããã¨ãå¤ãããã§ã¯ããã¾ãã
åèæç®
- Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Thomas H. Cormen (å ±è) æµ éå²å¤«, 岩éåæ£, æ¢ å°¾åå¸, å±±ä¸é å², åç°å¹¸ä¸ (å ±è¨³) ãã¢ã«ã´ãªãºã ã¤ã³ãããã¯ã·ã§ã³ æ¹è¨2ç 第2å·» ã¢ã«ã´ãªãºã ã®è¨è¨ã¨è§£æææ³ã, è¿ä»£ç§å¦è , 2007
- è¿è¤åéªãå®æ¬ Cããã°ã©ãã®ããã®ã¢ã«ã´ãªãºã ã¨ãã¼ã¿æ§é ã, ã½ãããã³ã¯ã¯ãªã¨ã¤ãã£ã, 1998
- ç³çæ¸ ãã¢ã«ã´ãªãºã ã¨ãã¼ã¿æ§é ã, 岩波æ¸åº, 1989
*1:å é¨ç¯ã«ã¯ãã¼ã¿ãæããããèã«ã ããã¼ã¿ãæããããã¨ã§å é¨ç¯ã®ãã¼ã®å¯åº¦ãæ大åããããæ¹è¯ãããBæ¨