# 第ä¸ç« ï¼åå¨ä¸æ£ç´¢ ![](img/ch3.png) > 建ç«ç§©åºï¼çå´æç´¢ > > ââ å¾·å½è°è¯ > ------------------- [TOC] ä¸ä¸ªæ°æ®åºå¨æåºç¡çå±æ¬¡ä¸éè¦å®æ两件äºæ ï¼å½ä½ ææ°æ®äº¤ç»æ°æ®åºæ¶ï¼å®åºå½ææ°æ®åå¨èµ·æ¥ï¼èåå½ä½ åæ°æ®åºè¦æ°æ®æ¶ï¼å®åºå½ææ°æ®è¿åç»ä½ ã å¨[第äºç« ](ch2.md)ä¸ï¼æ们讨论äºæ°æ®æ¨¡ååæ¥è¯¢è¯è¨ï¼å³ç¨åºåå°æ°æ®å½å ¥æ°æ®åºçæ ¼å¼ï¼ä»¥åå次è¦åæ°æ®çæºå¶ãå¨æ¬ç« ä¸æ们ä¼ä»æ°æ®åºçè§è§æ¥è®¨è®ºåæ ·çé®é¢ï¼æ°æ®åºå¦ä½åå¨æ们æä¾çæ°æ®ï¼ä»¥åå¦ä½å¨æ们éè¦æ¶éæ°æ¾å°æ°æ®ã ä½ä¸ºç¨åºåï¼ä¸ºä»ä¹è¦å ³å¿æ°æ®åºå é¨åå¨ä¸æ£ç´¢çæºçï¼ä½ å¯è½ä¸ä¼å»ä»å¤´å¼å§å®ç°èªå·±çåå¨å¼æï¼ä½æ¯ä½ **ç¡®å®**éè¦ä»è®¸å¤å¯ç¨çåå¨å¼æä¸éæ©ä¸ä¸ªåéçãèä¸ä¸ºäºè®©åå¨å¼æè½å¨ä½ çå·¥ä½è´è½½ç±»åä¸è¿è¡è¯å¥½ï¼ä½ ä¹éè¦å¤§è´äºè§£åå¨å¼æå¨åºå±ç©¶ç«åäºä»ä¹ã ç¹å«éè¦æ³¨æï¼é对**äºå¡æ§**è´è½½ä¼åçåé对**åææ§**è´è½½ä¼åçåå¨å¼æä¹é´åå¨å·¨å¤§å·®å¼ãç¨åæ们å°å¨ â[äºå¡å¤çè¿æ¯åæï¼](#äºå¡å¤çè¿æ¯åæï¼)â ä¸èä¸æ¢è®¨è¿ä¸åºå«ï¼å¹¶å¨ â[åå¼åå¨](#åå¼åå¨)âä¸è®¨è®ºä¸ç³»åé对åææ§è´è½½èä¼åçåå¨å¼æã ä½é¦å ï¼æ们å°ä»ä½ å¯è½å·²ç»å¾çæç两大类æ°æ®åºï¼ä¼ ç»çå ³ç³»åæ°æ®åºåå¾å¤æè°çâNoSQLâæ°æ®åºï¼ä¸ä½¿ç¨ç**åå¨å¼æ**æ¥å¼å§æ¬ç« çå 容ãæ们å°ç 究两大类åå¨å¼æï¼**æ¥å¿ç»æï¼log-structuredï¼** çåå¨å¼æï¼ä»¥å**é¢å页é¢ï¼page-orientedï¼** çåå¨å¼æï¼ä¾å¦Bæ ï¼ã ## 驱å¨æ°æ®åºçæ°æ®ç»æ ä¸çä¸æç®åçæ°æ®åºå¯ä»¥ç¨ä¸¤ä¸ªBashå½æ°å®ç°ï¼ ```bash #!/bin/bash db_set () { echo "$1,$2" >> database } db_get () { grep "^$1," database | sed -e "s/^$1,//" | tail -n 1 } ``` è¿ä¸¤ä¸ªå½æ°å®ç°äºé®å¼åå¨çåè½ãæ§è¡ `db_set key value` ä¼å° **é®ï¼keyï¼** å**å¼ï¼valueï¼** åå¨å¨æ°æ®åºä¸ãé®åå¼ï¼å ä¹ï¼å¯ä»¥æ¯ä½ å欢çä»»ä½ä¸è¥¿ï¼ä¾å¦ï¼å¼å¯ä»¥æ¯JSONææ¡£ãç¶åè°ç¨ `db_get key` ä¼æ¥æ¾ä¸è¯¥é®å ³èçææ°å¼å¹¶å°å ¶è¿åã 麻éè½å°ï¼äºèä¿±å ¨ï¼ ```bash $ db_set 123456 '{"name":"London","attractions":["Big Ben","London Eye"]}' $ db_set 42 '{"name":"San Francisco","attractions":["Golden Gate Bridge"]}' $ db_get 42 {"name":"San Francisco","attractions":["Golden Gate Bridge"]} ``` åºå±çåå¨æ ¼å¼é常ç®åï¼ä¸ä¸ªææ¬æ件ï¼æ¯è¡å å«ä¸æ¡éå·åéçé®å¼å¯¹ï¼å¿½ç¥è½¬ä¹é®é¢çè¯ï¼å¤§è´ä¸CSVæ件类似ï¼ãæ¯æ¬¡å¯¹ `db_set` çè°ç¨é½ä¼åæ件æ«å°¾è¿½å è®°å½ï¼æ以æ´æ°é®çæ¶åæ§çæ¬çå¼ä¸ä¼è¢«è¦ç ââ å èæ¥æ¾ææ°å¼çæ¶åï¼éè¦æ¾å°æ件ä¸é®æåä¸æ¬¡åºç°çä½ç½®ï¼å æ¤ `db_get` ä¸ä½¿ç¨äº `tail -n 1 ` )ã ```bash $ db_set 42 '{"name":"San Francisco","attractions":["Exploratorium"]}' $ db_get 42 {"name":"San Francisco","attractions":["Exploratorium"]} $ cat database 123456,{"name":"London","attractions":["Big Ben","London Eye"]} 42,{"name":"San Francisco","attractions":["Golden Gate Bridge"]} 42,{"name":"San Francisco","attractions":["Exploratorium"]} ``` `db_set` å½æ°å¯¹äºæå ¶ç®åçåºæ¯å ¶å®æé常好çæ§è½ï¼å 为å¨æ件尾é¨è¿½å åå ¥é常æ¯é常é«æçãä¸`db_set`åçäºæ 类似ï¼è®¸å¤æ°æ®åºå¨å é¨ä½¿ç¨äº**æ¥å¿ï¼logï¼**ï¼ä¹å°±æ¯ä¸ä¸ª **ä» è¿½å ï¼append-onlyï¼** çæ°æ®æ件ãçæ£çæ°æ®åºææ´å¤çé®é¢éè¦å¤çï¼å¦å¹¶åæ§å¶ï¼åæ¶ç¡¬ç空é´ä»¥é¿å æ¥å¿æ éå¢é¿ï¼å¤çé误ä¸é¨ååå ¥çè®°å½ï¼ï¼ä½åºæ¬åçæ¯ä¸æ ·çãæ¥å¿æå ¶æç¨ï¼æ们è¿å°å¨æ¬ä¹¦çå ¶å®é¨åéå¤è§å°å®å¥½å 次ã > **æ¥å¿ï¼logï¼** è¿ä¸ªè¯é常æåºç¨æ¥å¿ï¼å³åºç¨ç¨åºè¾åºçæè¿°æ£å¨åççäºæ çææ¬ãæ¬ä¹¦å¨æ´æ®éçæä¹ä¸ä½¿ç¨**æ¥å¿**è¿ä¸è¯ï¼ä¸ä¸ªä» 追å çè®°å½åºåãå®å¯è½åæ ¹å°±ä¸æ¯ç»äººç±»ççï¼å®å¯ä»¥ä½¿ç¨äºè¿å¶æ ¼å¼ï¼å¹¶ä» è½ç±å ¶ä»ç¨åºè¯»åã å¦ä¸æ¹é¢ï¼å¦æè¿ä¸ªæ°æ®åºä¸æç大éè®°å½ï¼åè¿ä¸ª`db_get` å½æ°çæ§è½ä¼é常ç³ç³ãæ¯æ¬¡ä½ æ³æ¥æ¾ä¸ä¸ªé®æ¶ï¼`db_get` å¿ é¡»ä»å¤´å°å°¾æ«ææ´ä¸ªæ°æ®åºæ件æ¥æ¥æ¾é®çåºç°ãç¨ç®æ³çè¯è¨æ¥è¯´ï¼æ¥æ¾çå¼éæ¯ `O(n)` ï¼å¦ææ°æ®åºè®°å½æ°é n ç¿»äºä¸åï¼æ¥æ¾æ¶é´ä¹è¦ç¿»ä¸åãè¿å°±ä¸å¥½äºã 为äºé«ææ¥æ¾æ°æ®åºä¸ç¹å®é®çå¼ï¼æ们éè¦ä¸ä¸ªæ°æ®ç»æï¼**ç´¢å¼ï¼indexï¼**ãæ¬ç« å°ä»ç»ä¸ç³»åçç´¢å¼ç»æï¼å¹¶å¨å®ä»¬ä¹é´è¿è¡æ¯è¾ãç´¢å¼èåç大è´ææ³æ¯éè¿ä¿åä¸äºé¢å¤çå æ°æ®ä½ä¸ºè·¯æ æ¥å¸®å©ä½ æ¾å°æ³è¦çæ°æ®ãå¦æä½ æ³ä»¥å ç§ä¸åçæ¹å¼æç´¢åä¸ä»½æ°æ®ï¼é£ä¹ä½ ä¹è®¸éè¦å¨æ°æ®çä¸åé¨åä¸å»ºç«å¤ä¸ªç´¢å¼ã ç´¢å¼æ¯ä»ä¸»æ°æ®è¡çç**é¢å¤çï¼additionalï¼** ç»æã许å¤æ°æ®åºå 许添å ä¸å é¤ç´¢å¼ï¼è¿ä¸ä¼å½±åæ°æ®çå 容ï¼èåªä¼å½±åæ¥è¯¢çæ§è½ãç»´æ¤é¢å¤çç»æä¼äº§çå¼éï¼ç¹å«æ¯å¨åå ¥æ¶ãåå ¥æ§è½å¾é¾è¶ è¿ç®åå°è¿½å åå ¥æ件ï¼å 为追å åå ¥æ¯æç®åçåå ¥æä½ãä»»ä½ç±»åçç´¢å¼é常é½ä¼åæ ¢åå ¥é度ï¼å 为æ¯æ¬¡åå ¥æ°æ®æ¶é½éè¦æ´æ°ç´¢å¼ã è¿æ¯åå¨ç³»ç»ä¸ä¸ä¸ªéè¦çæè¡¡ï¼ç²¾å¿éæ©çç´¢å¼å å¿«äºè¯»æ¥è¯¢çé度ï¼ä½æ¯æ¯ä¸ªç´¢å¼é½ä¼ææ ¢åå ¥é度ãå 为è¿ä¸ªåå ï¼æ°æ®åºé»è®¤å¹¶ä¸ä¼ç´¢å¼ææçå 容ï¼èéè¦ä½ ï¼ä¹å°±æ¯ç¨åºåææ°æ®åºç®¡çåï¼DBAï¼ï¼åºäºå¯¹åºç¨çå ¸åæ¥è¯¢æ¨¡å¼çäºè§£æ¥æå¨éæ©ç´¢å¼ãä½ å¯ä»¥éæ©é£äºè½ä¸ºåºç¨å¸¦æ¥æ大æ¶çèä¸åä¸ä¼å¼å ¥è¶ åºå¿ è¦å¼éçç´¢å¼ã ### æ£åç´¢å¼ è®©æ们ä»**é®å¼æ°æ®ï¼key-value Dataï¼** çç´¢å¼å¼å§ãè¿ä¸æ¯ä½ å¯ä»¥ç´¢å¼çå¯ä¸æ°æ®ç±»åï¼ä½é®å¼æ°æ®æ¯å¾å¸¸è§çã对äºæ´å¤æçç´¢å¼æ¥è¯´ï¼è¿ä¹æ¯ä¸ä¸ªæç¨çæ建模åã é®å¼åå¨ä¸å¨å¤§å¤æ°ç¼ç¨è¯è¨ä¸å¯ä»¥æ¾å°ç**åå ¸ï¼dictionaryï¼** ç±»åé常ç¸ä¼¼ï¼é常åå ¸é½æ¯ç¨**æ£åæ å°ï¼hash mapï¼**æ**æ£å表ï¼hash tableï¼**å®ç°çãæ£åæ å°å¨è®¸å¤ç®æ³æç§ä¹¦ä¸é½ææè¿°ã1,2ãï¼æ以è¿éæ们ä¸ä¼è®¨è®ºå®çå·¥ä½ç»èãæ¢ç¶æ们已ç»å¯ä»¥ç¨æ£åæ å°æ¥è¡¨ç¤º**å åä¸**çæ°æ®ç»æï¼ä¸ºä»ä¹ä¸ä½¿ç¨å®æ¥ç´¢å¼**硬çä¸**çæ°æ®å¢ï¼ å设æ们çæ°æ®åå¨åªæ¯ä¸ä¸ªè¿½å åå ¥çæ件ï¼å°±ååé¢çä¾åä¸æ ·ï¼é£ä¹æç®åçç´¢å¼çç¥å°±æ¯ï¼ä¿çä¸ä¸ªå åä¸çæ£åæ å°ï¼å ¶ä¸æ¯ä¸ªé®é½æ å°å°æ°æ®æ件ä¸çä¸ä¸ªåèå移éï¼ææäºå¯ä»¥æ¾å°å¯¹åºå¼çä½ç½®ï¼å¦[å¾3-1](img/fig3-1.png)æ示ãå½ä½ å°æ°çé®å¼å¯¹è¿½å åå ¥æ件ä¸æ¶ï¼è¿è¦æ´æ°æ£åæ å°ï¼ä»¥åæ åååå ¥çæ°æ®çå移éï¼è¿åæ¶éç¨äºæå ¥æ°é®ä¸æ´æ°ç°æé®ï¼ãå½ä½ æ³æ¥æ¾ä¸ä¸ªå¼æ¶ï¼ä½¿ç¨æ£åæ å°æ¥æ¥æ¾æ°æ®æ件ä¸çå移éï¼**寻æ¾ï¼seekï¼** 该ä½ç½®å¹¶è¯»å该å¼å³å¯ã ![](img/fig3-1.png) **å¾3-1 以类CSVæ ¼å¼åå¨é®å¼å¯¹çæ¥å¿ï¼å¹¶ä½¿ç¨å åæ£åæ å°è¿è¡ç´¢å¼ã** å¬ä¸å»ç®åï¼ä½è¿æ¯ä¸ä¸ªå¯è¡çæ¹æ³ãç°å®ä¸ï¼Bitcaskå®é ä¸å°±æ¯è¿ä¹åçï¼Riakä¸é»è®¤çåå¨å¼æï¼ã3ãã Bitcaskæä¾é«æ§è½ç读åååå ¥æä½ï¼ä½è¦æ±ææçé®å¿ é¡»è½æ¾å ¥å¯ç¨å åä¸ï¼å 为æ£åæ å°å®å ¨ä¿çå¨å åä¸ãèæ°æ®å¼å¯ä»¥ä½¿ç¨æ¯å¯ç¨å åæ´å¤ç空é´ï¼å 为å¯ä»¥å¨ç¡¬çä¸éè¿ä¸æ¬¡ç¡¬çæ¥æ¾æä½æ¥å è½½æéé¨åï¼å¦ææ°æ®æ件çé£é¨åå·²ç»å¨æ件系ç»ç¼åä¸ï¼å读åæ ¹æ¬ä¸éè¦ä»»ä½ç¡¬çI/Oã åBitcaskè¿æ ·çåå¨å¼æé常éåæ¯ä¸ªé®çå¼ç»å¸¸æ´æ°çæ åµãä¾å¦ï¼é®å¯è½æ¯æ个ç«åªè§é¢çç½åï¼URLï¼ï¼èå¼å¯è½æ¯è¯¥è§é¢è¢«ææ¾ç次æ°ï¼æ¯æ¬¡æ人ç¹å»ææ¾æé®æ¶éå¢ï¼ãå¨è¿ç§ç±»åçå·¥ä½è´è½½ä¸ï¼æå¾å¤åæä½ï¼ä½æ¯æ²¡æ太å¤ä¸åçé® ââ æ¯ä¸ªé®æå¾å¤çåæä½ï¼ä½æ¯å°ææé®ä¿åå¨å åä¸æ¯å¯è¡çã ç´å°ç°å¨ï¼æ们åªæ¯è¿½å åå ¥ä¸ä¸ªæ件 ââ æ以å¦ä½é¿å æç»ç¨å®ç¡¬ç空é´ï¼ä¸ç§å¥½ç解å³æ¹æ¡æ¯ï¼å°æ¥å¿å为ç¹å®å¤§å°ç段ï¼segmentï¼ï¼å½æ¥å¿å¢é¿å°ç¹å®å°ºå¯¸æ¶å ³éå½å段æ件ï¼å¹¶å¼å§åå ¥ä¸ä¸ªæ°ç段æ件ãç¶åï¼æ们就å¯ä»¥å¯¹è¿äºæ®µè¿è¡**å缩ï¼compactionï¼**ï¼å¦[å¾3-2](img/fig3-2.png)æ示ãè¿éçå缩æå³çå¨æ¥å¿ä¸ä¸¢å¼éå¤çé®ï¼åªä¿çæ¯ä¸ªé®çæè¿æ´æ°ã ![](img/fig3-2.png) **å¾3-2 é®å¼æ´æ°æ¥å¿ï¼ç»è®¡ç«åªè§é¢çææ¾æ¬¡æ°ï¼çå缩ï¼åªä¿çæ¯ä¸ªé®çæè¿å¼** èä¸ï¼ç±äºå缩ç»å¸¸ä¼ä½¿å¾æ®µåå¾å¾å°ï¼å设å¨ä¸ä¸ªæ®µå é®è¢«å¹³åéåäºå¥½å 次ï¼ï¼æ们ä¹å¯ä»¥å¨æ§è¡å缩çåæ¶å°å¤ä¸ªæ®µå并å¨ä¸èµ·ï¼å¦[å¾3-3](img/fig3-3.png)æ示ã段被åå ¥åæ°¸è¿ä¸ä¼è¢«ä¿®æ¹ï¼æ以å并ç段被åå ¥ä¸ä¸ªæ°çæ件ãå»ç»æ®µçå并åå缩å¯ä»¥å¨åå°çº¿ç¨ä¸å®æï¼è¿ä¸ªè¿ç¨è¿è¡çåæ¶ï¼æ们ä»ç¶å¯ä»¥ç»§ç»ä½¿ç¨æ§ç段æ件æ¥æ£å¸¸æä¾è¯»å请æ±ãå并è¿ç¨å®æåï¼æ们å°è¯»å请æ±è½¬æ¢ä¸ºä½¿ç¨æ°å并ç段èä¸æ¯æ§ç段 ââ ç¶åæ§ç段æ件就å¯ä»¥ç®åå°å é¤æäºã ![](img/fig3-3.png) **å¾3-3 åæ¶æ§è¡å缩åå段å并** æ¯ä¸ªæ®µç°å¨é½æèªå·±çå åæ£å表ï¼å°é®æ å°å°æ件å移éã为äºæ¾å°ä¸ä¸ªé®çå¼ï¼æ们é¦å æ£æ¥æè¿ç段çæ£åæ å°ï¼å¦æé®ä¸åå¨ï¼æ们就æ£æ¥ç¬¬äºä¸ªæè¿ç段ï¼ä¾æ¤ç±»æ¨ãå并è¿ç¨å°ä¿æ段çæ°é足å¤å°ï¼æ以æ¥æ¾è¿ç¨ä¸éè¦æ£æ¥å¤ªå¤çæ£åæ å°ã è¦è®©è¿ä¸ªç®åçæ³æ³å¨å®é ä¸è½å·¥ä½ä¼æ¶åå°å¤§éçç»èãç®åæ¥è¯´ï¼ä¸é¢å ç¹é½æ¯å®ç°è¿ç¨ä¸éè¦è®¤çèèçé®é¢ï¼ * æä»¶æ ¼å¼ CSVä¸æ¯æ¥å¿çæä½³æ ¼å¼ã使ç¨äºè¿å¶æ ¼å¼æ´å¿«ï¼æ´ç®åï¼é¦å 以åè为åä½å¯¹å符串çé¿åº¦è¿è¡ç¼ç ï¼ç¶åæ¯åå§çå符串ï¼ä¸éè¦è½¬ä¹ï¼ã * å é¤è®°å½ å¦æè¦å é¤ä¸ä¸ªé®åå ¶å ³èçå¼ï¼åå¿ é¡»å¨æ°æ®æ件ä¸è¿½å ä¸ä¸ªç¹æ®çå é¤è®°å½ï¼é»è¾å é¤ï¼ææ¶è¢«ç§°ä¸ºå¢ç¢ï¼å³tombstoneï¼ãå½æ¥å¿æ®µè¢«å并æ¶ï¼å并è¿ç¨ä¼éè¿è¿ä¸ªå¢ç¢ç¥éè¦å°è¢«å é¤é®çææåå²å¼é½ä¸¢å¼æã * å´©æºæ¢å¤ å¦ææ°æ®åºéæ°å¯å¨ï¼åå åæ£åæ å°å°ä¸¢å¤±ãååä¸ï¼ä½ å¯ä»¥éè¿ä»å¤´å°å°¾è¯»åæ´ä¸ªæ®µæ件并记å½ä¸æ¥æ¯ä¸ªé®çæè¿å¼æ¥æ¢å¤æ¯ä¸ªæ®µçæ£åæ å°ãä½æ¯ï¼å¦æ段æ件å¾å¤§ï¼å¯è½éè¦å¾é¿æ¶é´ï¼è¿ä¼ä½¿æå¡çéå¯æ¯è¾çè¦ã Bitcask éè¿å°æ¯ä¸ªæ®µçæ£åæ å°çå¿«ç §åå¨å¨ç¡¬çä¸æ¥å éæ¢å¤ï¼å¯ä»¥ä½¿æ£åæ å°æ´å¿«å°å è½½å°å åä¸ã * é¨ååå ¥è®°å½ æ°æ®åºéæ¶å¯è½å´©æºï¼å æ¬å¨å°è®°å½è¿½å å°æ¥å¿çè¿ç¨ä¸ã Bitcaskæ件å å«æ ¡éªåï¼å 许æ£æµå忽ç¥æ¥å¿ä¸çè¿äºæåé¨åã * 并åæ§å¶ ç±äºåæä½æ¯ä»¥ä¸¥æ ¼ç顺åºè¿½å å°æ¥å¿ä¸çï¼æ以常è§çå®ç°æ¯åªæä¸ä¸ªåå ¥çº¿ç¨ãä¹å 为æ°æ®æ件段æ¯ä» 追å çæè 说æ¯ä¸å¯åçï¼æ以å®ä»¬å¯ä»¥è¢«å¤ä¸ªçº¿ç¨åæ¶è¯»åã ä¹ä¸çï¼ä» 追å æ¥å¿ä¼¼ä¹å¾æµªè´¹ï¼ä¸ºä»ä¹ä¸ç´æ¥å¨æ件éæ´æ°ï¼ç¨æ°å¼è¦çæ§å¼ï¼ä» 追å ç设计ä¹æ以æ¯ä¸ªå¥½ç设计ï¼æå¦ä¸å 个åå ï¼ * 追å åå段å并é½æ¯é¡ºåºåå ¥æä½ï¼é常æ¯éæºåå ¥å¿«å¾å¤ï¼å°¤å ¶æ¯å¨ç£æ§æºæ¢°ç¡¬çä¸ãå¨æç§ç¨åº¦ä¸ï¼é¡ºåºåå ¥å¨åºäºéªåç**åºæ硬çï¼SSDï¼** ä¸ä¹æ¯å¥½çéæ©ã4ããæ们å°å¨â[æ¯è¾Bæ åLSMæ ](#æ¯è¾Bæ åLSMæ )âä¸è¿ä¸æ¥è®¨è®ºè¿ä¸ªé®é¢ã * å¦æ段æ件æ¯ä» 追å çæä¸å¯åçï¼å¹¶ååå´©æºæ¢å¤å°±ç®åå¤äºãä¾å¦ï¼å½ä¸ä¸ªæ°æ®å¼è¢«æ´æ°çæ¶ååçå´©æºï¼ä½ ä¸ç¨æ å¿æ件éå°ä¼åæ¶å å«æ§å¼åæ°å¼åèªçä¸é¨åã * å并æ§æ®µçå¤çä¹å¯ä»¥é¿å æ°æ®æ件éçæ¶é´çæ¨ç§»èç¢çåçé®é¢ã ä½æ¯ï¼æ£å表索å¼ä¹æå ¶å±éæ§ï¼ * æ£åè¡¨å¿ é¡»è½æ¾è¿å åãå¦æä½ æé常å¤çé®ï¼é£çæ¯åéãååä¸å¯ä»¥å¨ç¡¬çä¸ç»´æ¤ä¸ä¸ªæ£åæ å°ï¼ä¸å¹¸çæ¯ç¡¬çæ£åæ å°å¾é¾è¡¨ç°ä¼ç§ãå®éè¦å¤§éçéæºè®¿é®I/Oï¼å½å®ç¨æ»¡æ¶æ³è¦åå¢é¿æ¯å¾æè´µçï¼å¹¶ä¸æ£åå²çªçå¤çä¹éè¦å¾ç¦ççé»è¾ã5ãã * èå´æ¥è¯¢æçä¸é«ãä¾å¦ï¼ä½ æ æ³è½»æ¾æ«ækitty00000åkitty99999ä¹é´çææé®ââä½ å¿ é¡»å¨æ£åæ å°ä¸åç¬æ¥æ¾æ¯ä¸ªé®ã å¨ä¸ä¸èä¸ï¼æ们å°çå°ä¸ä¸ªæ²¡æè¿äºéå¶çç´¢å¼ç»æã ### SSTablesåLSMæ å¨[å¾3-3](img/fig3-3.png)ä¸ï¼æ¯ä¸ªæ¥å¿ç»æåå¨æ®µé½æ¯ä¸ç³»åé®å¼å¯¹ãè¿äºé®å¼å¯¹æç §å®ä»¬åå ¥ç顺åºæåï¼æ¥å¿ä¸ç¨åçå¼ä¼å äºæ¥å¿ä¸è¾æ©çç¸åé®çå¼ãé¤æ¤ä¹å¤ï¼æ件ä¸é®å¼å¯¹ç顺åºå¹¶ä¸éè¦ã ç°å¨æ们å¯ä»¥å¯¹æ®µæ件çæ ¼å¼åä¸ä¸ªç®åçæ¹åï¼è¦æ±é®å¼å¯¹çåºåæé®æåºãä¹ä¸çï¼è¿ä¸ªè¦æ±ä¼¼ä¹æç ´äºæ们使ç¨é¡ºåºåå ¥çè½åï¼æ们å°ç¨åååå°è¿ä¸ªé®é¢ã æ们æè¿ä¸ªæ ¼å¼ç§°ä¸º**æåºå符串表ï¼Sorted String Tableï¼**ï¼ç®ç§°SSTableãæ们è¿è¦æ±æ¯ä¸ªé®åªå¨æ¯ä¸ªå并ç段æ件ä¸åºç°ä¸æ¬¡ï¼å缩è¿ç¨å·²ç»ä¿è¯ï¼ãä¸ä½¿ç¨æ£åç´¢å¼çæ¥å¿æ®µç¸æ¯ï¼SSTableæå 个大çä¼å¿ï¼ 1. å³ä½¿æ件大äºå¯ç¨å åï¼å并段çæä½ä»ç¶æ¯ç®åèé«æçãè¿ç§æ¹æ³å°±åå½å¹¶æåºç®æ³ä¸ä½¿ç¨çæ¹æ³ä¸æ ·ï¼å¦[å¾3-4](img/fig3-4.png)æ示ï¼ä½ å¼å§å¹¶æ读åå¤ä¸ªè¾å ¥æ件ï¼æ¥çæ¯ä¸ªæ件ä¸ç第ä¸ä¸ªé®ï¼å¤å¶æä½çé®ï¼æ ¹æ®æåºé¡ºåºï¼å°è¾åºæ件ï¼ä¸æéå¤æ¤æ¥éª¤ï¼å°äº§çä¸ä¸ªæ°çå并段æ件ï¼èä¸å®ä¹æ¯ä¹æé®æåºçã ![](img/fig3-4.png) **å¾3-4 å并å 个SSTable段ï¼åªä¿çæ¯ä¸ªé®çææ°å¼** å¦æå¨å 个è¾å ¥æ®µä¸åºç°ç¸åçé®ï¼è¯¥æä¹åï¼è¯·è®°ä½ï¼æ¯ä¸ªæ®µé½å å«å¨ä¸æ®µæ¶é´å åå ¥æ°æ®åºçææå¼ãè¿æå³çä¸ä¸ªè¾å ¥æ®µä¸çææå¼ä¸å®æ¯å¦ä¸ä¸ªæ®µä¸çææå¼é½æ´è¿ï¼å设æ们æ»æ¯å并ç¸é»ç段ï¼ãå½å¤ä¸ªæ®µå å«ç¸åçé®æ¶ï¼æ们å¯ä»¥ä¿çæè¿æ®µçå¼ï¼å¹¶ä¸¢å¼æ§æ®µä¸çå¼ã 2. 为äºå¨æ件ä¸æ¾å°ä¸ä¸ªç¹å®çé®ï¼ä½ ä¸åéè¦å¨å åä¸ä¿åææé®çç´¢å¼ã以[å¾3-5](img/fig3-5.png)为ä¾ï¼åè®¾ä½ æ£å¨å åä¸å¯»æ¾é® `handiwork`ï¼ä½æ¯ä½ ä¸ç¥éè¿ä¸ªé®å¨æ®µæ件ä¸çç¡®åå移éãç¶èï¼ä½ ç¥é `handbag` å `handsome` çå移ï¼èä¸ç±äºæåºç¹æ§ï¼ä½ ç¥é `handiwork` å¿ é¡»åºç°å¨è¿ä¸¤è ä¹é´ãè¿æå³çä½ å¯ä»¥è·³å° `handbag` çå移ä½ç½®å¹¶ä»é£éæ«æï¼ç´å°ä½ æ¾å° `handiwork`ï¼æ没æ¾å°ï¼å¦æ该æ件ä¸æ²¡æ该é®ï¼ã ![](img/fig3-5.png) **å¾3-5 å ·æå åç´¢å¼çSSTable** ä½ ä»ç¶éè¦ä¸ä¸ªå åä¸çç´¢å¼æ¥åè¯ä½ ä¸äºé®çå移éï¼ä½å®å¯ä»¥æ¯ç¨ççï¼æ¯å ååèç段æ件æä¸ä¸ªé®å°±è¶³å¤äºï¼å 为å ååèå¯ä»¥å¾å¿«å°è¢«æ«æå®[^i]ã [^i]: å¦æææçé®ä¸å¼é½æ¯å®é¿çï¼ä½ å¯ä»¥ä½¿ç¨æ®µæ件ä¸çäºåæ¥æ¾å¹¶å®å ¨é¿å 使ç¨å åç´¢å¼ãç¶èå®è·µä¸çé®åå¼é常é½æ¯åé¿çï¼å æ¤å¦æ没æç´¢å¼ï¼å°±å¾é¾ç¥éè®°å½çåçç¹ï¼åä¸æ¡è®°å½ç»æ以ååä¸æ¡è®°å½å¼å§çå°æ¹ï¼ã 3. ç±äºè¯»å请æ±æ 论å¦ä½é½éè¦æ«ææ请æ±èå´å çå¤ä¸ªé®å¼å¯¹ï¼å æ¤å¯ä»¥å°è¿äºè®°å½åç»ä¸ºåï¼blockï¼ï¼å¹¶å¨å°å ¶åå ¥ç¡¬çä¹åå¯¹å ¶è¿è¡å缩ï¼å¦[å¾3-5](img/fig3-5.png)ä¸çé´å½±åºåæ示ï¼[^è¯æ³¨i] ãç¨çå åç´¢å¼ä¸çæ¯ä¸ªæ¡ç®é½æåå缩åçå¼å§å¤ãé¤äºèç硬ç空é´ä¹å¤ï¼å缩è¿å¯ä»¥åå°å¯¹I/O带宽ç使ç¨ã [^è¯æ³¨i]: è¿éçå缩æ¯compressionï¼ä¸æ¯åæçcompactionï¼è¯·æ³¨æåºåã #### æ建åç»´æ¤SSTables å°ç®å为æ¢è¿ä¸éï¼ä½æ¯å¦ä½è®©ä½ çæ°æ®è½å¤é¢å æ好åºå¢ï¼æ¯ç«æ们æ¥æ¶å°çåå ¥è¯·æ±å¯è½ä»¥ä»»ä½é¡ºåºåçã è½ç¶å¨ç¡¬çä¸ç»´æ¤æåºç»æä¹æ¯å¯è½çï¼è¯·åé â[Bæ ](#Bæ )âï¼ï¼ä½å¨å åä¿ååè¦å®¹æå¾å¤ãæ许å¤å¯ä»¥ä½¿ç¨çä¼æå¨ç¥çæ å½¢æ°æ®ç»æï¼ä¾å¦çº¢é»æ æAVLæ ã2ãã使ç¨è¿äºæ°æ®ç»æï¼ä½ å¯ä»¥æä»»ä½é¡ºåºæå ¥é®ï¼å¹¶ææåºé¡ºåºè¯»åå®ä»¬ã ç°å¨æ们å¯ä»¥è®©æ们çåå¨å¼æ以å¦ä¸æ¹å¼å·¥ä½ï¼ * ææ°åå ¥æ¶ï¼å°å ¶æ·»å å°å åä¸ç平衡æ æ°æ®ç»æï¼ä¾å¦çº¢é»æ ï¼ãè¿ä¸ªå åæ ææ¶è¢«ç§°ä¸º**å å表ï¼memtableï¼**ã * å½**å å表**大äºæ个éå¼ï¼é常为å å åèï¼æ¶ï¼å°å ¶ä½ä¸ºSSTableæ件åå ¥ç¡¬çãè¿å¯ä»¥é«æå°å®æï¼å 为æ å·²ç»ç»´æ¤äºæé®æåºçé®å¼å¯¹ãæ°çSSTableæ件å°æ为æ°æ®åºä¸ææ°ç段ãå½è¯¥SSTable被åå ¥ç¡¬çæ¶ï¼æ°çåå ¥å¯ä»¥å¨ä¸ä¸ªæ°çå å表å®ä¾ä¸ç»§ç»è¿è¡ã * æ¶å°è¯»å请æ±æ¶ï¼é¦å å°è¯å¨å å表ä¸æ¾å°å¯¹åºçé®ï¼å¦æ没æå°±å¨æè¿ç硬ç段ä¸å¯»æ¾ï¼å¦æè¿æ²¡æå°±å¨ä¸ä¸ä¸ªè¾æ§ç段ä¸ç»§ç»å¯»æ¾ï¼ä»¥æ¤ç±»æ¨ã * æ¶ä¸æ¶å°ï¼å¨åå°è¿è¡ä¸ä¸ªå并åå缩è¿ç¨ï¼ä»¥å并段æ件并å°å·²è¦çæå·²å é¤çå¼ä¸¢å¼æã è¿ä¸ªæ¹æ¡ææå¾å¥½ãå®åªä¼éå°ä¸ä¸ªé®é¢ï¼å¦ææ°æ®åºå´©æºï¼åæè¿çåå ¥ï¼å¨å å表ä¸ï¼ä½å°æªåå ¥ç¡¬çï¼å°ä¸¢å¤±ã为äºé¿å è¿ä¸ªé®é¢ï¼æ们å¯ä»¥å¨ç¡¬çä¸ä¿åä¸ä¸ªåç¬çæ¥å¿ï¼æ¯ä¸ªåå ¥é½ä¼ç«å³è¢«è¿½å å°è¿ä¸ªæ¥å¿ä¸ï¼å°±åå¨åé¢çç« èä¸ææè¿°çé£æ ·ãè¿ä¸ªæ¥å¿æ²¡æææåºé¡ºåºï¼ä½è¿å¹¶ä¸éè¦ï¼å 为å®çå¯ä¸ç®çæ¯å¨å´©æºåæ¢å¤å å表ãæ¯å½å å表ååºå°SSTableæ¶ï¼ç¸åºçæ¥å¿é½å¯ä»¥è¢«ä¸¢å¼ã #### ç¨SSTableså¶ä½LSMæ è¿éæè¿°çç®æ³æ¬è´¨ä¸æ¯LevelDBã6ãåRocksDBã7ãè¿äºé®å¼åå¨å¼æåºæ使ç¨çææ¯ï¼è¿äºåå¨å¼æ被设计åµå ¥å°å ¶ä»åºç¨ç¨åºä¸ãé¤æ¤ä¹å¤ï¼LevelDBå¯ä»¥å¨Riakä¸ç¨ä½Bitcaskçæ¿ä»£åãå¨CassandraåHBaseä¸ä¹ä½¿ç¨äºç±»ä¼¼çåå¨å¼æã8ãï¼èä¸ä»ä»¬é½åå°äºGoogleçBigtable论æã9ãï¼å¼å ¥äºæ¯è¯ SSTable å memtable ï¼çå¯åã æåè¿ç§ç´¢å¼ç»ææ¯ç±Patrick O'Neilç人æè¿°çï¼ä¸è¢«å½å为æ¥å¿ç»æå并æ ï¼æLSMæ ï¼ã10ãï¼å®æ¯åºäºæ´æ©ä¹åçæ¥å¿ç»ææ件系ç»ã11ãæ¥æ建çãåºäºè¿ç§å并åå缩æåºæ件åççåå¨å¼æé常被称为LSMåå¨å¼æã Luceneæ¯ElasticsearchåSolr使ç¨çä¸ç§å ¨ææç´¢çç´¢å¼å¼æï¼å®ä½¿ç¨ç±»ä¼¼çæ¹æ³æ¥åå¨å®çå ³é®è¯è¯å ¸ã12,13ããå ¨æç´¢å¼æ¯é®å¼ç´¢å¼å¤æå¾å¤ï¼ä½æ¯åºäºç±»ä¼¼çæ³æ³ï¼å¨æç´¢æ¥è¯¢ä¸ç»åºä¸ä¸ªåè¯ï¼æ¾å°æååè¯çææææ¡£ï¼ç½é¡µï¼äº§åæè¿°çï¼ãè¿æ¯éè¿é®å¼ç»æå®ç°çï¼å ¶ä¸é®æ¯åè¯ï¼æ**è¯è¯**ï¼å³termï¼ï¼å¼æ¯ææå å«è¯¥åè¯çææ¡£çIDå表ï¼è®°å½å表ï¼ãå¨Luceneä¸ï¼ä»è¯è¯å°è®°å½å表çè¿ç§æ å°ä¿åå¨ç±»ä¼¼äºSSTableçæåºæ件ä¸ï¼å¹¶æ ¹æ®éè¦å¨åå°å并ã14ãã #### æ§è½ä¼å ä¸å¾å¸¸ä¸æ ·ï¼è¦è®©åå¨å¼æå¨å®è·µä¸è¡¨ç°è¯å¥½æ¶åå°å¤§é设计ç»èãä¾å¦ï¼å½æ¥æ¾æ°æ®åºä¸ä¸åå¨çé®æ¶ï¼LSMæ ç®æ³å¯è½ä¼å¾æ ¢ï¼ä½ å¿ é¡»å æ£æ¥å å表ï¼ç¶åæ¥çä»æè¿çå°ææ§çææç段ï¼å¯è½è¿å¿ é¡»ä»ç¡¬ç读åæ¯ä¸ä¸ªæ®µæ件ï¼ï¼ç¶åæè½ç¡®å®è¿ä¸ªé®ä¸åå¨ã为äºä¼åè¿ç§è®¿é®ï¼åå¨å¼æé常使ç¨é¢å¤çå¸éè¿æ»¤å¨ï¼Bloom filtersï¼ã15ãã ï¼å¸éè¿æ»¤å¨æ¯ç¨äºè¿ä¼¼éåå 容çé«æå åæ°æ®ç»æï¼å®å¯ä»¥åè¯ä½ æ°æ®åºä¸æ¯ä¸æ¯ä¸åå¨æ个é®ï¼ä»è为ä¸åå¨çé®èçæ许å¤ä¸å¿ è¦ç硬ç读åæä½ã) è¿æä¸äºä¸åççç¥æ¥ç¡®å®SSTables被å缩åå并ç顺åºåæ¶é´ãæ常è§çéæ©æ¯size-tieredåleveled compactionãLevelDBåRocksDB使ç¨leveled compactionï¼LevelDBå æ¤å¾åï¼ï¼HBase使ç¨size-tieredï¼Cassandraåæ¶æ¯æè¿ä¸¤ç§ã16ãã对äºsized-tieredï¼è¾æ°åè¾å°çSSTablesç¸ç»§è¢«å并å°è¾æ§çåè¾å¤§çSSTableä¸ã对äºleveled compactionï¼keyèå´è¢«æåå°è¾å°çSSTablesï¼èè¾æ§çæ°æ®è¢«ç§»å¨å°åç¬çå±çº§ï¼levelï¼ï¼è¿ä½¿å¾å缩ï¼compactionï¼è½å¤æ´å å¢éå°è¿è¡ï¼å¹¶ä¸ä½¿ç¨è¾å°ç硬ç空é´ã å³ä½¿æ许å¤å¾®å¦çä¸è¥¿ï¼LSMæ çåºæ¬ææ³ ââ ä¿åä¸ç³»åå¨åå°å并çSSTables ââ ç®åèææãå³ä½¿æ°æ®éæ¯å¯ç¨å å大å¾å¤ï¼å®ä»è½ç»§ç»æ£å¸¸å·¥ä½ãç±äºæ°æ®ææåºé¡ºåºåå¨ï¼ä½ å¯ä»¥é«æå°æ§è¡èå´æ¥è¯¢ï¼æ«æææä»æ个æå°å¼å°æ个æ大å¼ä¹é´çææé®ï¼ï¼å¹¶ä¸å 为硬çåå ¥æ¯è¿ç»çï¼æ以LSMæ å¯ä»¥æ¯æé常é«çåå ¥ååéã ### Bæ åé¢è®¨è®ºçæ¥å¿ç»æç´¢å¼æ£å¤å¨éæ¸è¢«æ¥åçé¶æ®µï¼ä½å®ä»¬å¹¶ä¸æ¯æ常è§çç´¢å¼ç±»åã使ç¨æ广æ³çç´¢å¼ç»æåæ¥å¿ç»æç´¢å¼ç¸å½ä¸åï¼å®å°±æ¯æ们æ¥ä¸æ¥è¦è®¨è®ºçBæ ã ä»1970年被å¼å ¥ã17ãï¼ä» ä¸å°10å¹´åå°±åå¾âæ å¤ä¸å¨âã18ãï¼Bæ å¾å¥½å°ç»åäºæ¶é´çèéªãå¨å ä¹ææçå ³ç³»æ°æ®åºä¸ï¼å®ä»¬ä»ç¶æ¯æ åçç´¢å¼å®ç°ï¼è®¸å¤éå ³ç³»æ°æ®åºä¹ä¼ä½¿ç¨å°Bæ ã åSSTablesä¸æ ·ï¼Bæ ä¿ææé®æåºçé®å¼å¯¹ï¼è¿å 许é«æçé®å¼æ¥æ¾åèå´æ¥è¯¢ãä½è¿ä¹å°±æ¯ææçç¸ä¼¼ä¹å¤äºï¼Bæ æçé常ä¸åç设计ç念ã æ们åé¢çå°çæ¥å¿ç»æç´¢å¼å°æ°æ®åºå解为å¯å大å°ç段ï¼é常æ¯å å åèææ´å¤§ç大å°ï¼å¹¶ä¸æ»æ¯æ顺åºåå ¥æ®µãç¸æ¯ä¹ä¸ï¼Bæ å°æ°æ®åºå解æåºå®å¤§å°çåï¼blockï¼æ页é¢ï¼pageï¼ï¼ä¼ ç»ä¸å¤§å°ä¸º4KBï¼ææ¶ä¼æ´å¤§ï¼ï¼å¹¶ä¸ä¸æ¬¡åªè½è¯»åæåå ¥ä¸ä¸ªé¡µé¢ãè¿ç§è®¾è®¡æ´æ¥è¿äºåºå±ç¡¬ä»¶ï¼å 为硬ç空é´ä¹æ¯æåºå®å¤§å°çåæ¥ç»ç»çã æ¯ä¸ªé¡µé¢é½å¯ä»¥ä½¿ç¨å°åæä½ç½®æ¥æ è¯ï¼è¿å 许ä¸ä¸ªé¡µé¢å¼ç¨å¦ä¸ä¸ªé¡µé¢ ââ 类似äºæéï¼ä½å¨ç¡¬çèä¸æ¯å¨å åä¸ãæ们å¯ä»¥ä½¿ç¨è¿äºé¡µé¢å¼ç¨æ¥æ建ä¸ä¸ªé¡µé¢æ ï¼å¦[å¾3-6](img/fig3-6.png)æ示ã ![](img/fig3-6.png) **å¾3-6 使ç¨Bæ ç´¢å¼æ¥æ¾ä¸ä¸ªé®** ä¸ä¸ªé¡µé¢ä¼è¢«æå®ä¸ºBæ çæ ¹ï¼å¨ç´¢å¼ä¸æ¥æ¾ä¸ä¸ªé®æ¶ï¼å°±ä»è¿éå¼å§ã该页é¢å å«å 个é®å对å页é¢çå¼ç¨ãæ¯ä¸ªå页é¢è´è´£ä¸æ®µè¿ç»èå´çé®ï¼å¼ç¨ä¹é´çé®ï¼ææäºå¼ç¨å页é¢çé®èå´ã å¨[å¾3-6](img/fig3-6.png)çä¾åä¸ï¼æ们æ£å¨å¯»æ¾é®251 ï¼æ以æ们ç¥éæ们éè¦è·è¸ªè¾¹ç200å300ä¹é´ç页é¢å¼ç¨ãè¿å°æ们带å°ä¸ä¸ªç±»ä¼¼ç页é¢ï¼è¿ä¸æ¥å°200å°300çèå´æåå°åèå´ã æç»ï¼æ们å°å°è¾¾æ个å å«å个é®ç页é¢ï¼å¶å页é¢ï¼leaf pageï¼ï¼è¯¥é¡µé¢æè ç´æ¥å å«æ¯ä¸ªé®çå¼ï¼æè å å«äºå¯¹å¯ä»¥æ¾å°å¼ç页é¢çå¼ç¨ã å¨Bæ çä¸ä¸ªé¡µé¢ä¸å¯¹å页é¢çå¼ç¨çæ°é称为åæ¯å åãä¾å¦ï¼å¨[å¾3-6](img/fig3-6.png)ä¸ï¼åæ¯å åæ¯6ãå¨å®è·µä¸ï¼åæ¯å ååå³äºåå¨é¡µé¢å¼ç¨åèå´è¾¹çæéç空é´éï¼ä½é常æ¯å ç¾ä¸ªã å¦æè¦æ´æ°Bæ ä¸ç°æé®çå¼ï¼éè¦æç´¢å å«è¯¥é®çå¶å页é¢ï¼æ´æ¹è¯¥é¡µé¢ä¸çå¼ï¼å¹¶å°è¯¥é¡µé¢ååå°ç¡¬çï¼å¯¹è¯¥é¡µé¢çä»»ä½å¼ç¨é½å°ä¿æææï¼ãå¦æä½ æ³æ·»å ä¸ä¸ªæ°çé®ï¼ä½ éè¦æ¾å°å ¶èå´è½å å«æ°é®ç页é¢ï¼å¹¶å°å ¶æ·»å å°è¯¥é¡µé¢ãå¦æ页é¢ä¸æ²¡æ足å¤çå¯ç¨ç©ºé´å®¹çº³æ°é®ï¼åå°å ¶åæ两个å满页é¢ï¼å¹¶æ´æ°ç¶é¡µé¢ä»¥åæ æ°çé®èå´ååºï¼å¦[å¾3-7](img/fig3-7.png)æ示[^ii]ã ![](img/fig3-7.png) **å¾3-7 éè¿åå²é¡µé¢æ¥çé¿Bæ ** [^ii]: åBæ ä¸æå ¥ä¸ä¸ªæ°çé®æ¯ç¸å½ç¬¦åç´è§çï¼ä½å é¤ä¸ä¸ªé®ï¼åæ¶ä¿ææ 平衡ï¼å°±ä¼çµæ¯å¾å¤å ¶ä»ä¸è¥¿äºã2ãã è¿ä¸ªç®æ³å¯ä»¥ç¡®ä¿æ ä¿æ平衡ï¼å ·æn个é®çBæ æ»æ¯å ·æ $O(log n)$ ç深度ã大å¤æ°æ°æ®åºå¯ä»¥æ¾å ¥ä¸ä¸ªä¸å°åå±çBæ ï¼æä»¥ä½ ä¸éè¦è¿½è¸ªå¤ä¸ªé¡µé¢å¼ç¨æ¥æ¾å°ä½ æ£å¨æ¥æ¾ç页é¢ï¼åæ¯å å为500ç4KB页é¢çåå±æ å¯ä»¥åå¨å¤è¾¾256TBçæ°æ®ï¼ã #### 让Bæ æ´å¯é Bæ çåºæ¬åºå±åæä½æ¯ç¨æ°æ°æ®è¦å硬çä¸ç页é¢ï¼å¹¶åå®è¦åä¸æ¹å页é¢çä½ç½®ï¼å³ï¼å½é¡µé¢è¢«è¦åæ¶ï¼å¯¹è¯¥é¡µé¢çææå¼ç¨ä¿æå®æ´ãè¿ä¸æ¥å¿ç»æç´¢å¼ï¼å¦LSMæ ï¼å½¢æé²æ对æ¯ï¼åè åªè¿½å å°æ件ï¼å¹¶æç»å é¤è¿æ¶çæ件ï¼ï¼ä½ä»ä¸ä¿®æ¹æ件ä¸å·²æçå 容ã ä½ å¯ä»¥æè¦å硬çä¸ç页é¢å¯¹åºä¸ºå®é ç硬件æä½ãå¨ç£æ§ç¡¬ç驱å¨å¨ä¸ï¼è¿æå³çå°ç£å¤´ç§»å¨å°æ£ç¡®çä½ç½®ï¼çå¾ æ转çä¸çæ£ç¡®ä½ç½®åºç°ï¼ç¶åç¨æ°çæ°æ®è¦åéå½çæåºãå¨åºæ硬çä¸ï¼ç±äºSSDå¿ é¡»ä¸æ¬¡æ¦é¤åéåç¸å½å¤§çåå¨è¯çåï¼æ以ä¼åçæ´å¤æçäºæ ã19ãã èä¸ï¼ä¸äºæä½éè¦è¦åå 个ä¸åç页é¢ãä¾å¦ï¼å¦æå 为æå ¥å¯¼è´é¡µé¢è¿æ»¡èæå页é¢ï¼åéè¦åå ¥æ°æåç两个页é¢ï¼å¹¶è¦åå ¶ç¶é¡µé¢ä»¥æ´æ°å¯¹ä¸¤ä¸ªå页é¢çå¼ç¨ãè¿æ¯ä¸ä¸ªå±é©çæä½ï¼å 为å¦ææ°æ®åºå¨ä» æé¨å页é¢è¢«åå ¥æ¶å´©æºï¼é£ä¹æç»å°å¯¼è´ä¸ä¸ªæåçç´¢å¼ï¼ä¾å¦ï¼å¯è½æä¸ä¸ªå¤å¿é¡µé¢ä¸æ¯ä»»ä½ç¶é¡¹çåé¡¹ï¼ ã 为äºä½¿æ°æ®åºè½å¤çå¼å¸¸å´©æºçåºæ¯ï¼Bæ å®ç°é常ä¼å¸¦æä¸ä¸ªé¢å¤ç硬çæ°æ®ç»æï¼**é¢åå¼æ¥å¿**ï¼WALï¼å³write-ahead logï¼ä¹ç§°ä¸º**éåæ¥å¿**ï¼å³redo logï¼ãè¿æ¯ä¸ä¸ªä» 追å çæ件ï¼æ¯ä¸ªBæ çä¿®æ¹å¨å ¶è½è¢«åºç¨å°æ æ¬èº«ç页é¢ä¹åé½å¿ é¡»å åå ¥å°è¯¥æ件ãå½æ°æ®åºå¨å´©æºåæ¢å¤æ¶ï¼è¿ä¸ªæ¥å¿å°è¢«ç¨æ¥ä½¿Bæ æ¢å¤å°ä¸è´çç¶æã5,20ãã å¦å¤è¿æä¸ä¸ªæ´æ°é¡µé¢çå¤ææ åµæ¯ï¼å¦æå¤ä¸ªçº¿ç¨è¦åæ¶è®¿é®Bæ ï¼åéè¦ä»ç»ç并åæ§å¶ ââ å¦å线ç¨å¯è½ä¼çå°æ å¤äºä¸ä¸è´çç¶æãè¿é常æ¯éè¿ä½¿ç¨**éåå¨**ï¼latchesï¼è½»é级éï¼ä¿æ¤æ çæ°æ®ç»ææ¥å®æãæ¥å¿ç»æåçæ¹æ³å¨è¿æ¹é¢æ´ç®åï¼å 为å®ä»¬å¨åå°è¿è¡ææçå并ï¼èä¸ä¼å¹²æ°æ°æ¥æ¶å°çæ¥è¯¢ï¼å¹¶ä¸è½å¤æ¶ä¸æ¶å°å°æ§ç段åå交æ¢ä¸ºæ°ç段ã #### Bæ çä¼å ç±äºBæ å·²ç»åå¨äºå¾ä¹ ï¼æ以并ä¸å¥æªè¿ä¹å¤å¹´ä¸æ¥æå¾å¤ä¼åç设计被å¼ååºæ¥ï¼ä» 举å ä¾ï¼ * ä¸äºæ°æ®åºï¼å¦LMDBï¼ä½¿ç¨åæ¶å¤å¶æ¹æ¡ã21ãï¼èä¸æ¯è¦ç页é¢å¹¶ç»´æ¤WAL以æ¯æå´©æºæ¢å¤ãä¿®æ¹ç页é¢è¢«åå ¥å°ä¸åçä½ç½®ï¼å¹¶ä¸è¿å¨æ ä¸å建äºç¶é¡µé¢çæ°çæ¬ï¼ä»¥æåæ°çä½ç½®ãè¿ç§æ¹æ³å¯¹äºå¹¶åæ§å¶ä¹å¾æç¨ï¼æ们å°å¨â[å¿«ç §é离åå¯éå¤è¯»](ch7.md#å¿«ç §é离åå¯éå¤è¯»)âä¸çå°ã * æ们å¯ä»¥éè¿ä¸åå¨æ´ä¸ªé®ï¼èæ¯ç¼©çå ¶å¤§å°ï¼æ¥èç页é¢ç©ºé´ãç¹å«æ¯å¨æ å é¨ç页é¢ä¸ï¼é®åªéè¦æä¾è¶³å¤çä¿¡æ¯æ¥å å½é®èå´ä¹é´çè¾¹çãå¨é¡µé¢ä¸å å«æ´å¤çé®å 许æ å ·ææ´é«çåæ¯å åï¼å æ¤ä¹å°±å 许æ´å°çå±çº§[^iii]ã * é常ï¼é¡µé¢å¯ä»¥æ¾ç½®å¨ç¡¬çä¸çä»»ä½ä½ç½®ï¼æ²¡æä»ä¹è¦æ±ç¸é»é®èå´ç页é¢ä¹æ¾å¨ç¡¬çä¸ç¸é»çåºåãå¦ææ个æ¥è¯¢éè¦æç §æåºé¡ºåºæ«æ大é¨åçé®èå´ï¼é£ä¹è¿ç§æ页é¢åå¨çå¸å±å¯è½ä¼æçä½ä¸ï¼å 为æ¯æ¬¡é¡µé¢è¯»åå¯è½é½éè¦è¿è¡ç¡¬çæ¥æ¾ãå æ¤ï¼è®¸å¤Bæ çå®ç°å¨å¸å±æ æ¶ä¼å°½é使å¶å页é¢æ顺åºåºç°å¨ç¡¬çä¸ãä½æ¯ï¼éçæ çå¢é¿ï¼è¦ç»´æè¿ä¸ªé¡ºåºæ¯å¾å°é¾çãç¸æ¯ä¹ä¸ï¼ç±äºLSMæ å¨å并è¿ç¨ä¸ä¸æ¬¡åä¸æ¬¡å°éååå¨ç大é¨åï¼æ以å®ä»¬æ´å®¹æ使顺åºé®å¨ç¡¬çä¸å½¼æ¤é è¿ã * é¢å¤çæé已被添å å°æ ä¸ãä¾å¦ï¼æ¯ä¸ªå¶å页é¢å¯ä»¥å¼ç¨å ¶å·¦è¾¹åå³è¾¹çå å¼é¡µé¢ï¼ä½¿å¾ä¸ç¨è·³åç¶é¡µé¢å°±è½æ顺åºå¯¹é®è¿è¡æ«æã * Bæ çåä½å¦åå½¢æ ï¼fractal treeï¼ã22ãåç¨ä¸äºæ¥å¿ç»æçææ³æ¥åå°ç¡¬çæ¥æ¾ï¼èä¸å®ä»¬ä¸åå½¢æ å ³ï¼ã [^iii]: è¿ä¸ªåç§ææ¶è¢«ç§°ä¸ºB+æ ï¼ä½å 为è¿ä¸ªä¼å已被广æ³ä½¿ç¨ï¼æ以ç»å¸¸æ æ³åºåäºå ¶å®çBæ åç§ã ### æ¯è¾Bæ åLSMæ 尽管Bæ å®ç°é常æ¯LSMæ å®ç°æ´æçï¼ä½LSMæ ç±äºå ¶æ§è½ç¹ç¹ä¹é常æ趣ãæ ¹æ®ç»éªï¼é常LSMæ çåå ¥é度æ´å¿«ï¼èBæ ç读åé度æ´å¿«ã23ãã LSMæ ä¸ç读åé常æ¯è¾æ ¢ï¼å 为å®ä»¬å¿ é¡»æ£æ¥å ç§ä¸åçæ°æ®ç»æåä¸åå缩ï¼Compactionï¼å±çº§çSSTablesã ç¶èï¼åºåæµè¯çç»æé常åå·¥ä½è´è½½çç»èç¸å ³ãä½ éè¦ç¨ä½ ç¹æçå·¥ä½è´è½½æ¥æµè¯ç³»ç»ï¼ä»¥ä¾¿è¿è¡ææçæ¯è¾ãå¨æ¬èä¸ï¼æ们å°ç®è¦è®¨è®ºä¸äºå¨è¡¡éåå¨å¼ææ§è½æ¶å¼å¾èèçäºæ ã #### LSMæ çä¼ç¹ Bæ ç´¢å¼ä¸çæ¯åæ°æ®é½å¿ é¡»è³å°åå ¥ä¸¤æ¬¡ï¼ä¸æ¬¡åå ¥é¢å åå ¥æ¥å¿ï¼WALï¼ï¼ä¸æ¬¡åå ¥æ 页é¢æ¬èº«ï¼å¦ææå页è¿éè¦ååå ¥ä¸æ¬¡ï¼ãå³ä½¿å¨è¯¥é¡µé¢ä¸åªæå 个åèåçäºååï¼ä¹éè¦æ¥ååå ¥æ´ä¸ªé¡µé¢çå¼éãæäºåå¨å¼æçè³ä¼è¦ååä¸ä¸ªé¡µé¢ä¸¤æ¬¡ï¼ä»¥å å¨çµæºæ éçæ åµä¸å¯¼è´é¡µé¢é¨åæ´æ°ã24,25ãã ç±äºåå¤å缩åå并SSTablesï¼æ¥å¿ç»æç´¢å¼ä¹ä¼å¤æ¬¡éåæ°æ®ãè¿ç§å½±å ââ å¨æ°æ®åºççå½å¨æä¸æ¯æ¬¡åå ¥æ°æ®åºå¯¼è´å¯¹ç¡¬ççå¤æ¬¡åå ¥ ââ 被称为**åæ¾å¤§ï¼write amplificationï¼**ãéè¦ç¹å«æ³¨æçæ¯åºæ硬çï¼åºæ硬ççéªå寿å½å¨è¦åæé次æ°åå°±ä¼èå°½ã å¨åå ¥ç¹éçåºç¨ç¨åºä¸ï¼æ§è½ç¶é¢å¯è½æ¯æ°æ®åºå¯ä»¥åå ¥ç¡¬ççé度ãå¨è¿ç§æ åµä¸ï¼åæ¾å¤§ä¼å¯¼è´ç´æ¥çæ§è½ä»£ä»·ï¼åå¨å¼æåå ¥ç¡¬çç次æ°è¶å¤ï¼å¯ç¨ç¡¬ç带宽å å®è½å¤ççæ¯ç§åå ¥æ¬¡æ°å°±è¶å°ã èä¸ï¼LSMæ é常è½å¤æ¯Bæ æ¯ææ´é«çåå ¥ååéï¼é¨ååå æ¯å®ä»¬ææ¶å ·æè¾ä½çåæ¾å¤§ï¼å°½ç®¡è¿åå³äºåå¨å¼æçé ç½®åå·¥ä½è´è½½ï¼ï¼é¨åæ¯å 为å®ä»¬é¡ºåºå°åå ¥ç´§åçSSTableæ件èä¸æ¯å¿ é¡»è¦åæ ä¸çå 个页é¢ã26ããè¿ç§å·®å¼å¨ç£æ§ç¡¬ç驱å¨å¨ä¸å°¤å ¶éè¦ï¼å ¶é¡ºåºåå ¥æ¯éæºåå ¥è¦å¿«å¾å¤ã LSMæ å¯ä»¥è¢«å缩å¾æ´å¥½ï¼å æ¤é常è½æ¯Bæ å¨ç¡¬çä¸äº§çæ´å°çæ件ãBæ åå¨å¼æä¼ç±äºç¢çåï¼fragmentationï¼èçä¸ä¸äºæªä½¿ç¨ç硬ç空é´ï¼å½é¡µé¢è¢«æåææè¡ä¸è½æ¾å ¥ç°æ页é¢æ¶ï¼é¡µé¢ä¸çæäºç©ºé´ä»æªè¢«ä½¿ç¨ãç±äºLSMæ ä¸æ¯é¢å页é¢çï¼å¹¶ä¸ä¼éè¿å®æéåSSTables以å»é¤ç¢çï¼æ以å®ä»¬å ·æè¾ä½çåå¨å¼éï¼ç¹å«æ¯å½ä½¿ç¨åå±å缩ï¼leveled compactionï¼æ¶ã27ãã å¨è®¸å¤åºæ硬çä¸ï¼åºä»¶å é¨ä½¿ç¨äºæ¥å¿ç»æåç®æ³ï¼ä»¥å°éæºåå ¥è½¬å为顺åºåå ¥åºå±åå¨è¯çï¼å æ¤åå¨å¼æåå ¥æ¨¡å¼çå½±åä¸å¤ªææ¾ã19ããä½æ¯ï¼è¾ä½çåå ¥æ¾å¤§çååå°çç¢çä»ç¶å¯¹åºæ硬çæ´æå©ï¼æ´ç´§åå°è¡¨ç¤ºæ°æ®å 许å¨å¯ç¨çI/O带宽å å¤çæ´å¤ç读åååå ¥è¯·æ±ã #### LSMæ çç¼ºç¹ æ¥å¿ç»æåå¨ç缺ç¹æ¯å缩è¿ç¨ææ¶ä¼å¹²æ°æ£å¨è¿è¡ç读åæä½ã尽管åå¨å¼æå°è¯å¢éå°æ§è¡å缩以尽éä¸å½±å并å访é®ï¼ä½æ¯ç¡¬çèµæºæéï¼æ以å¾å®¹æåçæ个请æ±éè¦çå¾ ç¡¬çå å®ææè´µçå缩æä½ã对ååéåå¹³åååºæ¶é´çå½±åé常å¾å°ï¼ä½æ¯æ¥å¿ç»æååå¨å¼æå¨æ´é«ç¾åä½çååºæ¶é´ï¼è¯·åé â[æè¿°æ§è½](ch1.md#æè¿°æ§è½)âï¼ææ¶ä¼ç¸å½é¿ï¼èBæ çè¡ä¸ºåç¸å¯¹æ´å ·å¯é¢æµæ§ã28ãã å缩çå¦ä¸ä¸ªé®é¢åºç°å¨é«åå ¥ååéæ¶ï¼ç¡¬ççæéåå ¥å¸¦å®½éè¦å¨åå§åå ¥ï¼è®°å½æ¥å¿åå·æ°å å表å°ç¡¬çï¼åå¨åå°è¿è¡çå缩线ç¨ä¹é´å ±äº«ãåå ¥ç©ºæ°æ®åºæ¶ï¼å¯ä»¥ä½¿ç¨å ¨ç¡¬ç带宽è¿è¡åå§åå ¥ï¼ä½æ°æ®åºè¶å¤§ï¼å缩æéç硬ç带宽就è¶å¤ã å¦æåå ¥ååéå¾é«ï¼å¹¶ä¸å缩没æä»ç»é 置好ï¼æå¯è½å¯¼è´å缩è·ä¸ä¸åå ¥éçãå¨è¿ç§æ åµä¸ï¼ç¡¬çä¸æªå并段çæ°éä¸æå¢å ï¼ç´å°ç¡¬ç空é´ç¨å®ï¼è¯»åé度ä¹ä¼åæ ¢ï¼å 为å®ä»¬éè¦æ£æ¥æ´å¤ç段æ件ãé常æ åµä¸ï¼å³ä½¿å缩æ æ³è·ä¸ï¼åºäºSSTableçåå¨å¼æä¹ä¸ä¼éå¶ä¼ å ¥åå ¥çéçï¼æä»¥ä½ éè¦è¿è¡æç¡®ççæ§æ¥æ£æµè¿ç§æ åµã29,30ãã Bæ çä¸ä¸ªä¼ç¹æ¯æ¯ä¸ªé®åªåå¨äºç´¢å¼ä¸çä¸ä¸ªä½ç½®ï¼èæ¥å¿ç»æåçåå¨å¼æå¯è½å¨ä¸åç段ä¸æç¸åé®çå¤ä¸ªå¯æ¬ãè¿ä¸ªæ¹é¢ä½¿å¾Bæ å¨æ³è¦æä¾å¼ºå¤§çäºå¡è¯ä¹çæ°æ®åºä¸å¾æå¸å¼åï¼å¨è®¸å¤å ³ç³»æ°æ®åºä¸ï¼äºå¡é离æ¯éè¿å¨é®èå´ä¸ä½¿ç¨éæ¥å®ç°çï¼å¨Bæ ç´¢å¼ä¸ï¼è¿äºéå¯ä»¥ç´æ¥éå å°æ ä¸ã5ããå¨[第ä¸ç« ](ch7.md)ä¸ï¼æ们å°æ´è¯¦ç»å°è®¨è®ºè¿ä¸ç¹ã Bæ å¨æ°æ®åºæ¶æä¸æ¯éå¸¸æ ¹æ·±èåºçï¼ä¸ºè®¸å¤å·¥ä½è´è½½é½æä¾äºå§ç»å¦ä¸çè¯å¥½æ§è½ï¼æ以å®ä»¬ä¸å¯è½å¾å¿«å°±ä¼æ¶å¤±ãå¨æ°çæ°æ®åå¨ä¸ï¼æ¥å¿ç»æåç´¢å¼åå¾è¶æ¥è¶æµè¡ã没æå¿«éå容æçè§åæ¥ç¡®å®åªç§ç±»åçåå¨å¼æå¯¹ä½ çåºæ¯æ´å¥½ï¼æ以å¼å¾å»éè¿ä¸äºæµè¯æ¥å¾å°ç¸å ³çç»éªã ### å ¶ä»ç´¢å¼ç»æ å°ç®å为æ¢ï¼æ们åªè®¨è®ºäºé®å¼ç´¢å¼ï¼å®ä»¬å°±åå ³ç³»æ¨¡åä¸ç**主é®ï¼primary keyï¼** ç´¢å¼ã主é®å¯ä¸æ è¯å ³ç³»è¡¨ä¸çä¸è¡ï¼æææ¡£æ°æ®åºä¸çä¸ä¸ªææ¡£æå¾å½¢æ°æ®åºä¸çä¸ä¸ªé¡¶ç¹ãæ°æ®åºä¸çå ¶ä»è®°å½å¯ä»¥éè¿å ¶ä¸»é®ï¼æIDï¼å¼ç¨è¯¥è¡/ææ¡£/顶ç¹ï¼ç´¢å¼å°±è¢«ç¨äºè§£æè¿æ ·çå¼ç¨ã 次级索å¼ï¼secondary indexesï¼ä¹å¾å¸¸è§ãå¨å ³ç³»æ°æ®åºä¸ï¼ä½ å¯ä»¥ä½¿ç¨ `CREATE INDEX` å½ä»¤å¨åä¸ä¸ªè¡¨ä¸å建å¤ä¸ªæ¬¡çº§ç´¢å¼ï¼èä¸è¿äºç´¢å¼é常对äºææå°æ§è¡èæ¥ï¼joinï¼èè¨è³å ³éè¦ãä¾å¦ï¼å¨[第äºç« ](ch2.md)ä¸ç[å¾2-1](img/fig2-1.png)ä¸ï¼å¾å¯è½å¨ `user_id` åä¸æä¸ä¸ªæ¬¡çº§ç´¢å¼ï¼ä»¥ä¾¿ä½ å¯ä»¥å¨æ¯ä¸ªè¡¨ä¸æ¾å°å±äºåä¸ç¨æ·çææè¡ã 次级索å¼å¯ä»¥å¾å®¹æå°ä»é®å¼ç´¢å¼æ建ã次级索å¼ä¸»è¦çä¸åæ¯é®ä¸æ¯å¯ä¸çï¼å³å¯è½æ许å¤è¡ï¼ææ¡£ï¼é¡¶ç¹ï¼å ·æç¸åçé®ãè¿å¯ä»¥éè¿ä¸¤ç§æ¹å¼æ¥è§£å³ï¼æè å°å¹é è¡æ è¯ç¬¦çå表ä½ä¸ºç´¢å¼éçå¼ï¼å°±åå ¨æç´¢å¼ä¸çè®°å½å表ï¼ï¼æè éè¿åæ¯ä¸ªé®æ·»å è¡æ è¯ç¬¦æ¥ä½¿é®å¯ä¸ãæ 论åªç§æ¹å¼ï¼Bæ åæ¥å¿ç»æç´¢å¼é½å¯ä»¥ç¨ä½æ¬¡çº§ç´¢å¼ã #### å°å¼åå¨å¨ç´¢å¼ä¸ ç´¢å¼ä¸çé®æ¯æ¥è¯¢è¦æç´¢çå 容ï¼èå ¶å¼å¯ä»¥æ¯ä»¥ä¸ä¸¤ç§æ åµä¹ä¸ï¼å®å¯ä»¥æ¯å®é çè¡ï¼ææ¡£ï¼é¡¶ç¹ï¼ï¼ä¹å¯ä»¥æ¯å¯¹åå¨å¨å«å¤çè¡çå¼ç¨ãå¨åä¸ç§æ åµä¸ï¼è¡è¢«åå¨çå°æ¹è¢«ç§°ä¸º**å æ件ï¼heap fileï¼**ï¼å¹¶ä¸åå¨çæ°æ®æ²¡æç¹å®ç顺åºï¼å®å¯ä»¥æ¯ä» 追å çï¼æè å®å¯ä»¥è·è¸ªè¢«å é¤çè¡ä»¥ä¾¿åç»å¯ä»¥ç¨æ°çæ°æ®è¿è¡è¦çï¼ãå æ件æ¹æ³å¾å¸¸è§ï¼å 为å®é¿å äºå¨åå¨å¤ä¸ªæ¬¡çº§ç´¢å¼æ¶å¯¹æ°æ®çå¤å¶ï¼æ¯ä¸ªç´¢å¼åªå¼ç¨å æ件ä¸çä¸ä¸ªä½ç½®ï¼å®é çæ°æ®é½ä¿åå¨ä¸ä¸ªå°æ¹ã å¨ä¸æ´æ¹é®çæ åµä¸æ´æ°å¼æ¶ï¼å æ件æ¹æ³å¯ä»¥é常é«æï¼åªè¦æ°å¼çåèæ°ä¸å¤§äºæ§å¼ï¼å°±å¯ä»¥è¦ç该记å½ãå¦ææ°å¼æ´å¤§ï¼æ åµä¼æ´å¤æï¼å 为å®å¯è½éè¦ç§»å°å ä¸æ足å¤ç©ºé´çæ°ä½ç½®ãå¨è¿ç§æ åµä¸ï¼è¦ä¹ææçç´¢å¼é½éè¦æ´æ°ï¼ä»¥æåè®°å½çæ°å ä½ç½®ï¼æè å¨æ§å ä½ç½®çä¸ä¸ä¸ªè½¬åæéã5ãã å¨æäºæ åµä¸ï¼ä»ç´¢å¼å°å æ件çé¢å¤è·³è·å¯¹è¯»åæ¥è¯´æ§è½æ失太大ï¼å æ¤å¯è½å¸æå°è¢«ç´¢å¼çè¡ç´æ¥åå¨å¨ç´¢å¼ä¸ãè¿è¢«ç§°ä¸ºèéç´¢å¼ï¼clustered indexï¼ãä¾å¦ï¼å¨MySQLçInnoDBåå¨å¼æä¸ï¼è¡¨ç主é®æ»æ¯ä¸ä¸ªèéç´¢å¼ï¼æ¬¡çº§ç´¢å¼åå¼ç¨ä¸»é®ï¼èä¸æ¯å æ件ä¸çä½ç½®ï¼ã31ããå¨SQL Serverä¸ï¼å¯ä»¥ä¸ºæ¯ä¸ªè¡¨æå®ä¸ä¸ªèéç´¢å¼ã32ãã å¨ **èéç´¢å¼**ï¼å¨ç´¢å¼ä¸åå¨ææçè¡æ°æ®ï¼å **éèéç´¢å¼**ï¼ä» å¨ç´¢å¼ä¸åå¨å¯¹æ°æ®çå¼ç¨ï¼ä¹é´çæ衷被称为 **è¦çç´¢å¼ï¼covering indexï¼** æ **å å«åçç´¢å¼ï¼index with included columnsï¼**ï¼å ¶å¨ç´¢å¼å åå¨è¡¨çä¸é¨ååã33ããè¿å 许éè¿åç¬ä½¿ç¨ç´¢å¼æ¥å¤çä¸äºæ¥è¯¢ï¼è¿ç§æ åµå«åï¼ç´¢å¼ **è¦çï¼coverï¼** äºæ¥è¯¢ï¼ã32ãã ä¸ä»»ä½ç±»åçæ°æ®éå¤ä¸æ ·ï¼èéç´¢å¼åè¦çç´¢å¼å¯ä»¥å 快读åé度ï¼ä½æ¯å®ä»¬éè¦é¢å¤çåå¨ç©ºé´ï¼å¹¶ä¸ä¼å¢å åå ¥å¼éãæ°æ®åºè¿éè¦é¢å¤çåªåæ¥æ§è¡äºå¡ä¿è¯ï¼å 为åºç¨ç¨åºä¸åºçå°ä»»ä½å 为éå¤è导è´çä¸ä¸è´ã #### å¤åç´¢å¼ è³ä»è®¨è®ºçç´¢å¼åªæ¯å°ä¸ä¸ªé®æ å°å°ä¸ä¸ªå¼ãå¦ææ们éè¦åæ¶æ¥è¯¢ä¸ä¸ªè¡¨ä¸çå¤ä¸ªåï¼æææ¡£ä¸çå¤ä¸ªå段ï¼ï¼è¿æ¾ç¶æ¯ä¸å¤çã æ常è§çå¤åç´¢å¼è¢«ç§°ä¸º **è¿æ¥ç´¢å¼ï¼concatenated indexï¼** ï¼å®éè¿å°ä¸åçå¼è¿½å å°å¦ä¸ååé¢ï¼ç®åå°å°å¤ä¸ªå段ç»åæä¸ä¸ªé®ï¼ç´¢å¼å®ä¹ä¸æå®äºå段çè¿æ¥é¡ºåºï¼ãè¿å°±åä¸ä¸ªèå¼ç纸质çµè¯ç°¿ï¼å®æä¾äºä¸ä¸ªä»ï¼å§æ°ï¼ååï¼å°çµè¯å·ç çç´¢å¼ãç±äºæåºé¡ºåºï¼ç´¢å¼å¯ä»¥ç¨æ¥æ¥æ¾ææå ·æç¹å®å§æ°ç人ï¼æææå ·æç¹å®å§æ°-ååç»åç人ãä½å¦æä½ æ³æ¾å°ææå ·æç¹å®ååç人ï¼è¿ä¸ªç´¢å¼æ¯æ²¡æç¨çã **å¤ç»´ç´¢å¼ï¼multi-dimensional indexï¼** æ¯ä¸ç§æ¥è¯¢å¤ä¸ªåçæ´ä¸è¬çæ¹æ³ï¼è¿å¯¹äºå°ç空é´æ°æ®å°¤ä¸ºéè¦ãä¾å¦ï¼é¤å æç´¢ç½ç«å¯è½æä¸ä¸ªæ°æ®åºï¼å ¶ä¸å å«æ¯ä¸ªé¤å çç»åº¦å纬度ãå½ç¨æ·å¨å°å¾ä¸æ¥çé¤é¦æ¶ï¼ç½ç«éè¦æç´¢ç¨æ·æ£å¨æ¥ççç©å½¢å°å¾åºåå çææé¤é¦ãè¿éè¦ä¸ä¸ªäºç»´èå´æ¥è¯¢ï¼å¦ä¸æç¤ºï¼ ```sql SELECT * FROM restaurants WHERE latitude > 51.4946 AND latitude < 51.5079 AND longitude > -0.1162 AND longitude < -0.1004; ``` ä¸ä¸ªæ åçBæ æè LSMæ ç´¢å¼ä¸è½å¤é«æå°å¤çè¿ç§æ¥è¯¢ï¼å®å¯ä»¥è¿åä¸ä¸ªçº¬åº¦èå´å çææé¤é¦ï¼ä½ç»åº¦å¯è½æ¯ä»»æå¼ï¼ï¼æè è¿åå¨åä¸ä¸ªç»åº¦èå´å çææé¤é¦ï¼ä½çº¬åº¦å¯è½æ¯åæååæä¹é´çä»»æå°æ¹ï¼ï¼ä½ä¸è½åæ¶æ»¡è¶³ä¸¤ä¸ªæ¡ä»¶ã ä¸ç§éæ©æ¯ä½¿ç¨ç©ºé´å¡«å æ²çº¿å°äºç»´ä½ç½®è½¬æ¢ä¸ºå个æ°åï¼ç¶å使ç¨å¸¸è§Bæ ç´¢å¼ã34ããæ´æ®éçæ¯ï¼ä½¿ç¨ç¹æ®åç空é´ç´¢å¼ï¼ä¾å¦Ræ ãä¾å¦ï¼PostGIS使ç¨PostgreSQLçéç¨GiSTå·¥å ·ã35ãå°å°ç空é´ç´¢å¼å®ç°ä¸ºRæ ãè¿éæ们没æ足å¤çå°æ¹æ¥æè¿°Ræ ï¼ä½æ¯æ大éçæç®å¯ä¾åèã æ趣çæ¯ï¼å¤ç»´ç´¢å¼ä¸ä» å¯ä»¥ç¨äºå°çä½ç½®ãä¾å¦ï¼å¨çµååå¡ç½ç«ä¸å¯ä»¥ä½¿ç¨å»ºç«å¨ï¼çº¢ï¼ç»¿ï¼èï¼ç»´åº¦ä¸çä¸ç»´ç´¢å¼æ¥æç´¢ç¹å®é¢è²èå´å ç产åï¼ä¹å¯ä»¥å¨å¤©æ°è§æµæ°æ®åºä¸å»ºç«ï¼æ¥æï¼æ¸©åº¦ï¼çäºç»´ç´¢å¼ï¼ä»¥ä¾¿ææå°æç´¢2013å¹´å ç温度å¨25è³30°Cä¹é´çææè§æµèµæãå¦æ使ç¨ä¸ç»´ç´¢å¼ï¼ä½ å°ä¸å¾ä¸æ«æ2013å¹´çææè®°å½ï¼ä¸ç®¡æ¸©åº¦å¦ä½ï¼ï¼ç¶åéè¿æ¸©åº¦è¿è¡è¿æ»¤ï¼æè åä¹äº¦ç¶ã äºç»´ç´¢å¼å¯ä»¥åæ¶éè¿æ¶é´æ³å温度æ¥æ¶çªæ°æ®éãè¿ä¸ªææ¯è¢«HyperDexæ使ç¨ã36ãã #### å ¨ææç´¢å模ç³ç´¢å¼ å°ç®å为æ¢æ讨论çææç´¢å¼é½åå®ä½ æç¡®åçæ°æ®ï¼å¹¶å è®¸ä½ æ¥è¯¢é®çç¡®åå¼æå ·ææåºé¡ºåºçé®çå¼èå´ãä»ä»¬ä¸å è®¸ä½ åçæ¯æ索类似çé®ï¼å¦æ¼åé误çåè¯ãè¿ç§æ¨¡ç³çæ¥è¯¢éè¦ä¸åçææ¯ã ä¾å¦ï¼å ¨ææç´¢å¼æé常å 许æç´¢ä¸ä¸ªåè¯ä»¥æ©å±ä¸ºå æ¬è¯¥åè¯çåä¹è¯ï¼å¿½ç¥åè¯çè¯æ³åä½ï¼æç´¢å¨ç¸åææ¡£ä¸å½¼æ¤é è¿çåè¯çåºç°ï¼å¹¶ä¸æ¯æåç§å ¶ä»åå³äºææ¬çè¯è¨åæåè½ã为äºå¤çææ¡£ææ¥è¯¢ä¸çæ¼åé误ï¼Luceneè½å¤å¨ä¸å®çç¼è¾è·ç¦»ï¼ç¼è¾è·ç¦»1æå³çæ·»å ï¼å é¤ææ¿æ¢äºä¸ä¸ªåæ¯ï¼å æç´¢ææ¬ã37ãã æ£å¦â[ç¨SSTableså¶ä½LSMæ ](#ç¨SSTableså¶ä½LSMæ )âä¸ææå°çï¼Luceneä¸ºå ¶è¯å ¸ä½¿ç¨äºä¸ä¸ªç±»ä¼¼äºSSTableçç»æãè¿ä¸ªç»æéè¦ä¸ä¸ªå°çå åç´¢å¼ï¼åè¯æ¥è¯¢éè¦å¨æåºæ件ä¸åªä¸ªå移éæ¥æ¾é®ãå¨LevelDBä¸ï¼è¿ä¸ªå åä¸çç´¢å¼æ¯ä¸äºé®çç¨çéåï¼ä½å¨Luceneä¸ï¼å åä¸çç´¢å¼æ¯é®ä¸å符çæéç¶æèªå¨æºï¼ç±»ä¼¼äºtrie ã38ããè¿ä¸ªèªå¨æºå¯ä»¥è½¬æ¢æLevenshteinèªå¨æºï¼å®æ¯æå¨ç»å®çç¼è¾è·ç¦»å ææå°æç´¢åè¯ã39ãã å ¶ä»ç模ç³æç´¢ææ¯æ£æçææ¡£åç±»åæºå¨å¦ä¹ çæ¹ååå±ãæ´å¤è¯¦ç»ä¿¡æ¯è¯·åé ä¿¡æ¯æ£ç´¢æç§ä¹¦ï¼ä¾å¦ã40ãã #### å¨å åä¸åå¨ä¸å æ¬ç« å°ç®å为æ¢è®¨è®ºçæ°æ®ç»æé½æ¯å¯¹ç¡¬çéå¶çåºå¯¹ãä¸ä¸»å åç¸æ¯ï¼ç¡¬çå¤çèµ·æ¥å¾éº»ç¦ã对äºç£æ§ç¡¬çååºæ硬çï¼å¦æè¦å¨è¯»åååå ¥æ¶è·å¾è¯å¥½æ§è½ï¼åéè¦ä»ç»å°å¸ç½®ç¡¬çä¸çæ°æ®ãä½æ¯ï¼æ们è½å®¹å¿è¿ç§éº»ç¦ï¼å 为硬çæ两个æ¾èçä¼ç¹ï¼å®ä»¬æ¯æä¹ çï¼å®ä»¬çå 容å¨çµæºå ³éæ¶ä¸ä¼ä¸¢å¤±ï¼ï¼å¹¶ä¸æ¯GBçææ¬æ¯RAMä½ã éçRAMåå¾æ´ä¾¿å®ï¼æ¯GBææ¬ç论æ®è¢«ä¾µèäºã许å¤æ°æ®éä¸æ¯é£ä¹å¤§ï¼æ以å°å®ä»¬å ¨é¨ä¿åå¨å åä¸æ¯é常å¯è¡çï¼å æ¬å¯è½åå¸å¨å¤ä¸ªæºå¨ä¸ãè¿å¯¼è´äºå åæ°æ®åºçåå±ã æäºå åä¸çé®å¼åå¨ï¼å¦Memcachedï¼ä» ç¨äºç¼åï¼å¨éæ°å¯å¨è®¡ç®æºæ¶ä¸¢å¤±çæ°æ®æ¯å¯ä»¥æ¥åçãä½å ¶ä»å åæ°æ®åºçç®æ æ¯æä¹ æ§ï¼å¯ä»¥éè¿ç¹æ®ç硬件ï¼ä¾å¦çµæ± ä¾çµçRAMï¼æ¥å®ç°ï¼ä¹å¯ä»¥å°æ´æ¹æ¥å¿åå ¥ç¡¬çï¼è¿å¯ä»¥å°å®æ¶å¿«ç §åå ¥ç¡¬çæè å°å åä¸çç¶æå¤å¶å°å ¶ä»æºå¨ä¸ã å åæ°æ®åºéæ°å¯å¨æ¶ï¼éè¦ä»ç¡¬çæéè¿ç½ç»ä»å¯æ¬éæ°å è½½å ¶ç¶æï¼é¤é使ç¨ç¹æ®ç硬件ï¼ã尽管åå ¥ç¡¬çï¼å®ä»ç¶æ¯ä¸ä¸ªå åæ°æ®åºï¼å 为硬çä» åºäºæä¹ æ§ç®çè¿è¡æ¥å¿è¿½å ï¼è¯»å请æ±å®å ¨ç±å åæ¥å¤çãåå ¥ç¡¬çåæ¶è¿æè¿ç»´ä¸ç好å¤ï¼ç¡¬çä¸çæ件å¯ä»¥å¾å®¹æå°ç±å¤é¨å®ç¨ç¨åºè¿è¡å¤ä»½ãæ£æ¥ååæã 诸å¦VoltDBãMemSQLåOracle TimesTenç产åæ¯å ·æå ³ç³»æ¨¡åçå åæ°æ®åºï¼ä¾åºå声称ï¼éè¿æ¶é¤ä¸ç®¡ç硬çä¸çæ°æ®ç»æç¸å ³çææå¼éï¼ä»ä»¬å¯ä»¥æä¾å·¨å¤§çæ§è½æ¹è¿ã41,42ãã RAM Cloudæ¯ä¸ä¸ªå¼æºçå åé®å¼åå¨å¨ï¼å ·ææä¹ æ§ï¼å¯¹å åå硬çä¸çæ°æ®é½ä½¿ç¨æ¥å¿ç»æåæ¹æ³ï¼ã43ãã RedisåCouchbaseéè¿å¼æ¥åå ¥ç¡¬çæä¾äºè¾å¼±çæä¹ æ§ã åç´è§çæ¯ï¼å åæ°æ®åºçæ§è½ä¼å¿å¹¶ä¸æ¯å 为å®ä»¬ä¸éè¦ä»ç¡¬ç读åçäºå®ãåªè¦æ足å¤çå åå³ä½¿æ¯åºäºç¡¬ççåå¨å¼æä¹å¯è½æ°¸è¿ä¸éè¦ä»ç¡¬ç读åï¼å 为æä½ç³»ç»å¨å åä¸ç¼åäºæè¿ä½¿ç¨ç硬çåãç¸åï¼å®ä»¬æ´å¿«çåå å¨äºçå»äºå°å åæ°æ®ç»æç¼ç 为硬çæ°æ®ç»æçå¼éã44ãã é¤äºæ§è½ï¼å åæ°æ®åºçå¦ä¸ä¸ªæ趣çå°æ¹æ¯æä¾äºé¾ä»¥ç¨åºäºç¡¬ççç´¢å¼å®ç°çæ°æ®æ¨¡åãä¾å¦ï¼Redis为åç§æ°æ®ç»æï¼å¦ä¼å 级éååéåï¼æä¾äºç±»ä¼¼æ°æ®åºçæ¥å£ãå 为å®å°æææ°æ®ä¿åå¨å åä¸ï¼æ以å®çå®ç°ç¸å¯¹ç®åã æè¿çç 究表æï¼å åæ°æ®åºä½ç³»ç»æå¯ä»¥æ©å±å°æ¯ææ¯å¯ç¨å åæ´å¤§çæ°æ®éï¼èä¸å¿ éæ°éç¨ä»¥ç¡¬ç为ä¸å¿çä½ç³»ç»æã45ããæè°ç **åç¼åï¼anti-cachingï¼** æ¹æ³éè¿å¨å åä¸è¶³çæ åµä¸å°æè¿æå°ä½¿ç¨çæ°æ®ä»å å转移å°ç¡¬çï¼å¹¶å¨å°æ¥å次访é®æ¶å°å ¶éæ°å è½½å°å åä¸ãè¿ä¸æä½ç³»ç»å¯¹èæå åå交æ¢æ件çæä½ç±»ä¼¼ï¼ä½æ°æ®åºå¯ä»¥æ¯æä½ç³»ç»æ´ææå°ç®¡çå åï¼å 为å®å¯ä»¥æå个记å½çç²åº¦å·¥ä½ï¼èä¸æ¯æ´ä¸ªå å页é¢ã尽管å¦æ¤ï¼è¿ç§æ¹æ³ä»ç¶éè¦ç´¢å¼è½å®å ¨æ¾å ¥å åä¸ï¼å°±åæ¬ç« å¼å¤´çBitcaskä¾åï¼ã å¦æ **éæ失æ§åå¨å¨ï¼non-volatile memory, NVMï¼** ææ¯å¾å°æ´å¹¿æ³çåºç¨ï¼å¯è½è¿éè¦è¿ä¸æ¥æ¹ååå¨å¼æ设计ã46ããç®åè¿æ¯ä¸ä¸ªæ°çç 究é¢åï¼å¼å¾å ³æ³¨ã ## äºå¡å¤çè¿æ¯åæï¼ å¨æ©æçä¸å¡æ°æ®å¤çè¿ç¨ä¸ï¼ä¸æ¬¡å ¸åçæ°æ®åºåå ¥é常ä¸ä¸ç¬ *åä¸äº¤æï¼commercial transactionï¼* ç¸å¯¹åºï¼å个货ï¼åä¾åºåä¸è®¢åï¼æ¯ä»å工工èµççãä½éçæ°æ®åºå¼å§åºç¨å°é£äºä¸æ¶åå°é±çé¢åï¼æ¯è¯**交æ/äºå¡ï¼transactionï¼** ä»çäºä¸æ¥ï¼ç¨äºæ代ä¸ç»è¯»åæä½ææçé»è¾åå ã > äºå¡ä¸ä¸å®å ·æACIDï¼ååæ§ï¼ä¸è´æ§ï¼é离æ§åæä¹ æ§ï¼å±æ§ãäºå¡å¤çåªæ¯æå³çå 许客æ·ç«¯è¿è¡ä½å»¶è¿ç读åååå ¥ ââ èä¸æ¯åªè½å®æè¿è¡ï¼ä¾å¦æ¯å¤©ä¸æ¬¡ï¼çæ¹å¤çä½ä¸ãæ们å¨[第ä¸ç« ](ch7.md)ä¸è®¨è®ºACIDå±æ§ï¼å¨[第åç« ](ch10.md)ä¸è®¨è®ºæ¹å¤çã å³ä½¿æ°æ®åºå¼å§è¢«ç¨äºè®¸å¤ä¸åç±»åçæ°æ®ï¼æ¯å¦å客æç« çè¯è®ºï¼æ¸¸æä¸çå¨ä½ï¼å°åç°¿ä¸çè系人ççï¼åºæ¬ç访é®æ¨¡å¼ä»ç¶ç±»ä¼¼äºå¤çåä¸äº¤æãåºç¨ç¨åºé常使ç¨ç´¢å¼éè¿æ个é®æ¥æ¾å°éè®°å½ãæ ¹æ®ç¨æ·çè¾å ¥æå ¥ææ´æ°è®°å½ãç±äºè¿äºåºç¨ç¨åºæ¯äº¤äºå¼çï¼è¿ç§è®¿é®æ¨¡å¼è¢«ç§°ä¸º**å¨çº¿äºå¡å¤çï¼OLTP, OnLine Transaction Processingï¼**ã ä½æ¯ï¼æ°æ®åºä¹å¼å§è¶æ¥è¶å¤å°ç¨äºæ°æ®åæï¼è¿äºæ°æ®åæå ·æé常ä¸åç访é®æ¨¡å¼ãé常ï¼åææ¥è¯¢éè¦æ«æ大éè®°å½ï¼æ¯ä¸ªè®°å½åªè¯»åå åï¼å¹¶è®¡ç®æ±æ»ç»è®¡ä¿¡æ¯ï¼å¦è®¡æ°ãæ»åæå¹³åå¼ï¼ï¼èä¸æ¯å°åå§æ°æ®è¿åç»ç¨æ·ãä¾å¦ï¼å¦æä½ çæ°æ®æ¯ä¸ä¸ªéå®äº¤æ表ï¼é£ä¹åææ¥è¯¢å¯è½æ¯ï¼ * ä¸æ份æ¯ä¸ªååºçæ»æ¶å ¥æ¯å¤å°ï¼ * å¨æè¿çæ¨å¹¿æ´»å¨ä¸å¤åäºå¤å°é¦èï¼ * åªä¸ªçåçå©´å¿é£åæ常ä¸Xåççå°¿å¸åæ¶è´ä¹°ï¼ è¿äºæ¥è¯¢é常ç±ä¸å¡åæå¸ç¼åï¼å¹¶æä¾æ¥å以帮å©å ¬å¸ç®¡çå±ååºæ´å¥½çå³çï¼åä¸æºè½ï¼ã为äºå°è¿ç§ä½¿ç¨æ°æ®åºç模å¼åäºå¡å¤çåºåå¼ï¼å®è¢«ç§°ä¸º**å¨çº¿åæå¤çï¼OLAP, OnLine Analytice Processingï¼**ã47ã[^iv]ãOLTPåOLAPä¹é´çåºå«å¹¶ä¸æ»æ¯æ¸ æ°çï¼ä½æ¯ä¸äºå ¸åçç¹å¾å¨[表3-1]()ä¸ååºã **表3-1 æ¯è¾äºå¡å¤çååæç³»ç»çç¹ç¹** | å±æ§ | äºå¡å¤çç³»ç» OLTP | åæç³»ç» OLAP | | :----------: | :--------------------------: | :----------------------: | | 主è¦è¯»åæ¨¡å¼ | æ¥è¯¢å°éè®°å½ï¼æé®è¯»å | å¨å¤§æ¹éè®°å½ä¸èå | | 主è¦åå ¥æ¨¡å¼ | éæºè®¿é®ï¼åå ¥è¦æ±ä½å»¶æ¶ | æ¹éå¯¼å ¥ï¼ETLï¼æè äºä»¶æµ | | 主è¦ç¨æ· | ç»ç«¯ç¨æ·ï¼éè¿Webåºç¨ | å é¨æ°æ®åæå¸ï¼ç¨äºå³çæ¯æ | | å¤ççæ°æ® | æ°æ®çææ°ç¶æï¼å½åæ¶é´ç¹ï¼ | éæ¶é´æ¨ç§»çåå²äºä»¶ | | æ°æ®é尺寸 | GB ~ TB | TB ~ PB | [^iv]: OLAPä¸çé¦åæ¯Oï¼onlineï¼çå«ä¹å¹¶ä¸æç¡®ï¼å®å¯è½æ¯ææ¥è¯¢å¹¶ä¸æ¯ç¨æ¥çæé¢å®ä¹å¥½çæ¥åçäºå®ï¼ä¹å¯è½æ¯æåæå¸é常æ¯äº¤äºå¼å°ä½¿ç¨OLAPç³»ç»æ¥è¿è¡æ¢ç´¢å¼çæ¥è¯¢ã èµ·åï¼äºå¡å¤çååææ¥è¯¢ä½¿ç¨äºç¸åçæ°æ®åºã SQLå¨è¿æ¹é¢å·²è¯ææ¯é常çµæ´»çï¼å¯¹äºOLTPç±»åçæ¥è¯¢ä»¥åOLAPç±»åçæ¥è¯¢æ¥è¯´ææé½å¾å¥½ã尽管å¦æ¤ï¼å¨äºåä¸çºªå «å年代æ«åä¹å年代åæï¼ä¼ä¸æåæ¢ä½¿ç¨OLTPç³»ç»è¿è¡åæçè¶å¿ï¼è½¬èå¨åç¬çæ°æ®åºä¸è¿è¡åæãè¿ä¸ªåç¬çæ°æ®åºè¢«ç§°ä¸º**æ°æ®ä»åºï¼data warehouseï¼**ã ### æ°æ®ä»åº ä¸ä¸ªä¼ä¸å¯è½æå å个ä¸åç交æå¤çç³»ç»ï¼é¢åç»ç«¯å®¢æ·çç½ç«ï¼æ§å¶å®ä½ååºçæ¶é¶ç³»ç»ï¼ä»åºåºåè·è¸ªï¼è½¦è¾è·¯çº¿è§åï¼ä¾åºé¾ç®¡çï¼å工管ççãè¿äºç³»ç»ä¸æ¯ä¸ä¸ªé½å¾å¤æï¼éè¦ä¸äººç»´æ¤ï¼æ以æç»è¿äºç³»ç»äºç¸ä¹é´é½æ¯ç¬ç«è¿è¡çã è¿äºOLTPç³»ç»å¾å¾å¯¹ä¸å¡è¿ä½è³å ³éè¦ï¼å èé常ä¼è¦æ± **é«å¯ç¨** ä¸ **ä½å»¶è¿**ãæ以DBAä¼å¯åå ³æ³¨ä»ä»¬çOLTPæ°æ®åºï¼ä»ä»¬é常ä¸æ¿æ让ä¸å¡åæ人åå¨OLTPæ°æ®åºä¸è¿è¡ä¸´æ¶çåææ¥è¯¢ï¼å 为è¿äºæ¥è¯¢é常å¼é巨大ï¼ä¼æ«æ大é¨åæ°æ®éï¼è¿ä¼æ害åæ¶å¨æ§è¡çäºå¡çæ§è½ã ç¸æ¯ä¹ä¸ï¼æ°æ®ä»åºæ¯ä¸ä¸ªç¬ç«çæ°æ®åºï¼åæ人åå¯ä»¥æ¥è¯¢ä»ä»¬æ³è¦çå 容èä¸å½±åOLTPæä½ã48ããæ°æ®ä»åºå å«å ¬å¸åç§OLTPç³»ç»ä¸ææçåªè¯»æ°æ®å¯æ¬ãä»OLTPæ°æ®åºä¸æåæ°æ®ï¼ä½¿ç¨å®æçæ°æ®è½¬å¨æè¿ç»çæ´æ°æµï¼ï¼è½¬æ¢æéååæç模å¼ï¼æ¸ ç并å è½½å°æ°æ®ä»åºä¸ãå°æ°æ®åå ¥ä»åºçè¿ç¨ç§°ä¸ºâ**æ½å-转æ¢-å è½½ï¼ETLï¼**âï¼å¦[å¾3-8](img/fig3-8.png)æ示ã ![](img/fig3-8.png) **å¾3-8 ETLè³æ°æ®ä»åºçç®åæ纲** å ä¹ææç大åä¼ä¸é½ææ°æ®ä»åºï¼ä½å¨å°åä¼ä¸ä¸å ä¹é»ææªé»ãè¿å¯è½æ¯å 为大å¤æ°å°å ¬å¸æ²¡æè¿ä¹å¤ä¸åçOLTPç³»ç»ï¼å¤§å¤æ°å°å ¬å¸åªæå°éçæ°æ® ââ å¯ä»¥å¨ä¼ ç»çSQLæ°æ®åºä¸æ¥è¯¢ï¼çè³å¯ä»¥å¨çµåè¡¨æ ¼ä¸åæãå¨ä¸å®¶å¤§å ¬å¸éï¼è¦åä¸äºå¨ä¸å®¶å°å ¬å¸å¾ç®åçäºæ ï¼éè¦å¾å¤ç¹éçå·¥ä½ã 使ç¨åç¬çæ°æ®ä»åºï¼èä¸æ¯ç´æ¥æ¥è¯¢OLTPç³»ç»è¿è¡åæçä¸å¤§ä¼å¿æ¯æ°æ®ä»åºå¯é对åæ访é®æ¨¡å¼è¿è¡ä¼åãäºå®è¯æï¼æ¬ç« ååé¨å讨论çç´¢å¼ç®æ³å¯¹äºOLTPæ¥è¯´å·¥ä½å¾å¾å¥½ï¼ä½å¯¹äºå¤çåææ¥è¯¢å¹¶ä¸æ¯å¾å¥½ãå¨æ¬ç« çå ¶ä½é¨åä¸ï¼æ们å°ç 究为åæèä¼åçåå¨å¼æã #### OLTPæ°æ®åºåæ°æ®ä»åºä¹é´çåæ§ æ°æ®ä»åºçæ°æ®æ¨¡åé常æ¯å ³ç³»åçï¼å 为SQLé常å¾éååææ¥è¯¢ãæ许å¤å¾å½¢æ°æ®åæå·¥å ·å¯ä»¥çæSQLæ¥è¯¢ï¼å¯è§åç»æï¼å¹¶å 许åæ人åæ¢ç´¢æ°æ®ï¼éè¿ä¸é»ãåçåååçæä½ï¼ã 表é¢ä¸ï¼ä¸ä¸ªæ°æ®ä»åºåä¸ä¸ªå ³ç³»åOLTPæ°æ®åºçèµ·æ¥å¾ç¸ä¼¼ï¼å 为å®ä»¬é½æä¸ä¸ªSQLæ¥è¯¢æ¥å£ãç¶èï¼ç³»ç»çå é¨çèµ·æ¥å¯è½å®å ¨ä¸åï¼å 为å®ä»¬é对é常ä¸åçæ¥è¯¢æ¨¡å¼è¿è¡äºä¼åãç°å¨è®¸å¤æ°æ®åºä¾åºåé½åªæ¯éç¹æ¯æäºå¡å¤çè´è½½ååæå·¥ä½è´è½½è¿ä¸¤è ä¸çä¸ä¸ªï¼èä¸æ¯é½æ¯æã ä¸äºæ°æ®åºï¼ä¾å¦Microsoft SQL ServeråSAP HANAï¼æ¯æå¨åä¸äº§åä¸è¿è¡äºå¡å¤çåæ°æ®ä»åºãä½æ¯ï¼å®ä»¬ä¹æ£æ¥çæ为两个ç¬ç«çåå¨åæ¥è¯¢å¼æï¼åªæ¯è¿äºå¼ææ£å¥½å¯ä»¥éè¿ä¸ä¸ªéç¨çSQLæ¥å£è®¿é®ã49,50,51ãã TeradataãVerticaãSAP HANAåParAccelçæ°æ®ä»åºä¾åºåé常使ç¨æè´µçåä¸è®¸å¯è¯éå®ä»ä»¬çç³»ç»ã Amazon RedShiftæ¯ParAccelçæ管çæ¬ãæè¿ï¼å¤§éçå¼æºSQL-on-Hadoop项ç®å·²ç»åºç°ï¼å®ä»¬è¿å¾å¹´è½»ï¼ä½æ¯æ£å¨ä¸åä¸æ°æ®ä»åºç³»ç»ç«äºï¼å æ¬Apache HiveãSpark SQLãCloudera ImpalaãFacebook PrestoãApache TajoåApache Drillã52,53ããå ¶ä¸ä¸äºåºäºäºè°·æDremelçæ³æ³ã54ãã ### æååéªè±åï¼åæçæ¨¡å¼ æ£å¦[第äºç« ](ch2.md)ææ¢è®¨çï¼æ ¹æ®åºç¨ç¨åºçéè¦ï¼å¨äºå¡å¤çé¢åä¸ä½¿ç¨äºå¤§éä¸åçæ°æ®æ¨¡åãå¦ä¸æ¹é¢ï¼å¨åæåä¸å¡ä¸ï¼æ°æ®æ¨¡åçå¤æ ·æ§åå°å¾å¤ã许å¤æ°æ®ä»åºé½ä»¥ç¸å½å ¬å¼åçæ¹å¼ä½¿ç¨ï¼è¢«ç§°ä¸ºæå模å¼ï¼ä¹ç§°ä¸ºç»´åº¦å»ºæ¨¡ã55ãï¼ã [å¾3-9](img/fig3-9.md)ä¸ç示ä¾æ¨¡å¼æ¾ç¤ºäºå¯è½å¨é£åé¶å®åå¤æ¾å°çæ°æ®ä»åºãå¨æ¨¡å¼çä¸å¿æ¯ä¸ä¸ªæè°çäºå®è¡¨ï¼å¨è¿ä¸ªä¾åä¸ï¼å®è¢«ç§°ä¸º `fact_sales`ï¼ãäºå®è¡¨çæ¯ä¸è¡ä»£è¡¨å¨ç¹å®æ¶é´åççäºä»¶ï¼è¿éï¼æ¯ä¸è¡ä»£è¡¨å®¢æ·è´ä¹°ç产åï¼ãå¦ææ们åæçæ¯ç½ç«æµéèä¸æ¯é¶å®éï¼åæ¯è¡å¯è½ä»£è¡¨ä¸ä¸ªç¨æ·ç页é¢æµè§æç¹å»ã ![](img/fig3-9.png) **å¾3-9 ç¨äºæ°æ®ä»åºçæå模å¼ç示ä¾** é常æ åµä¸ï¼äºå®è¢«è§ä¸ºåç¬çäºä»¶ï¼å 为è¿æ ·å¯ä»¥å¨ä»¥ååæä¸è·å¾æ大ççµæ´»æ§ãä½æ¯ï¼è¿æå³çäºå®è¡¨å¯ä»¥åå¾é常大ãåè¹æï¼æ²å°çæeBayè¿æ ·ç大ä¼ä¸å¨å ¶æ°æ®ä»åºä¸å¯è½æå åPBç交æåå²ï¼å ¶ä¸å¤§é¨åä¿åå¨äºå®è¡¨ä¸ã56ãã äºå®è¡¨ä¸çä¸äºåæ¯å±æ§ï¼ä¾å¦äº§åéå®çä»·æ ¼åä»ä¾åºåé£éè´ä¹°çææ¬ï¼å¯ä»¥ç¨æ¥è®¡ç®å©æ¶¦ä½é¢ï¼ãäºå®è¡¨ä¸çå ¶ä»åæ¯å¯¹å ¶ä»è¡¨ï¼ç§°ä¸ºç»´åº¦è¡¨ï¼çå¤é®å¼ç¨ãç±äºäºå®è¡¨ä¸çæ¯ä¸è¡é½è¡¨ç¤ºä¸ä¸ªäºä»¶ï¼å æ¤è¿äºç»´åº¦ä»£è¡¨äºä»¶åçç对象ãå 容ãå°ç¹ãæ¶é´ãæ¹å¼ååå ã ä¾å¦ï¼å¨[å¾3-9](img/fig3-9.md)ä¸ï¼å ¶ä¸ä¸ä¸ªç»´åº¦æ¯å·²å®åºç产åã `dim_product` 表ä¸çæ¯ä¸è¡ä»£è¡¨ä¸ç§å¾ å®äº§åï¼å æ¬åºååä½ï¼SKUï¼ã产åæè¿°ãåçå称ãç±»å«ãèèªå«éãå è£ å°ºå¯¸çã`fact_sales` 表ä¸çæ¯ä¸è¡é½ä½¿ç¨å¤é®è¡¨æå¨ç¹å®äº¤æä¸éå®äºä»ä¹äº§åã ï¼ç®åèµ·è§ï¼å¦æ客æ·ä¸æ¬¡è´ä¹°äºå ç§ä¸åç产åï¼åå®ä»¬å¨äºå®è¡¨ä¸è¢«è¡¨ç¤ºä¸ºåç¬çè¡ï¼ã çè³æ¥æåæ¶é´ä¹é常使ç¨ç»´åº¦è¡¨æ¥è¡¨ç¤ºï¼å 为è¿å 许对æ¥æçéå ä¿¡æ¯ï¼è¯¸å¦å ¬å ±åæï¼è¿è¡ç¼ç ï¼ä»èå 许åºååæåéåæçéå®æ¥è¯¢ã âæå模å¼âè¿ä¸ªååæ¥æºäºè¿æ ·ä¸ä¸ªäºå®ï¼å³å½æ们对表ä¹é´çå ³ç³»è¿è¡å¯è§åæ¶ï¼äºå®è¡¨å¨ä¸é´ï¼è¢«ç»´åº¦è¡¨å å´ï¼ä¸è¿äºè¡¨çè¿æ¥å°±åææçå èã è¿ä¸ªæ¨¡æ¿çåä½è¢«ç§°ä¸ºéªè±æ¨¡å¼ï¼å ¶ä¸ç»´åº¦è¢«è¿ä¸æ¥å解为å维度ãä¾å¦ï¼åçå产åç±»å«å¯è½æåç¬çè¡¨æ ¼ï¼å¹¶ä¸ `dim_product` è¡¨æ ¼ä¸çæ¯ä¸è¡é½å¯ä»¥å°åçåç±»å«ä½ä¸ºå¤é®å¼ç¨ï¼èä¸æ¯å°å®ä»¬ä½ä¸ºå符串åå¨å¨ `dim_product` è¡¨æ ¼ä¸ãéªè±æ¨¡å¼æ¯æ形模å¼æ´è§èåï¼ä½æ¯æ形模å¼é常æ¯é¦éï¼å 为åæå¸ä½¿ç¨å®æ´ç®åã55ãã å¨å ¸åçæ°æ®ä»åºä¸ï¼è¡¨æ ¼é常é常宽ï¼äºå®è¡¨é常æ100å以ä¸ï¼ææ¶çè³ææ°ç¾åã51ãã维度表ä¹å¯ä»¥æ¯é常宽çï¼å 为å®ä»¬å æ¬äºææå¯è½ä¸åæç¸å ³çå æ°æ®ââä¾å¦ï¼`dim_store` 表å¯ä»¥å æ¬å¨æ¯ä¸ªååºæä¾åªäºæå¡çç»èï¼å®æ¯å¦å ·æåºå é¢å æ¿ï¼åºé¢é¢ç§¯ï¼ååºç¬¬ä¸æ¬¡å¼å¼ çæ¥æï¼æè¿ä¸æ¬¡æ¹é çæ¶é´ï¼ç¦»æè¿çé«éå ¬è·¯çè·ç¦»ççã ## åå¼åå¨ å¦æäºå®è¡¨ä¸æä¸äº¿è¡åæ°PBçæ°æ®ï¼é£ä¹é«æå°åå¨åæ¥è¯¢å®ä»¬å°±æ为ä¸ä¸ªå ·ææææ§çé®é¢ã维度表é常è¦å°å¾å¤ï¼æ°ç¾ä¸è¡ï¼ï¼æ以å¨æ¬èä¸æ们å°ä¸»è¦å ³æ³¨äºå®è¡¨çåå¨ã 尽管äºå®è¡¨éå¸¸è¶ è¿100åï¼ä½å ¸åçæ°æ®ä»åºæ¥è¯¢ä¸æ¬¡åªä¼è®¿é®å ¶ä¸4个æ5个åï¼ â `SELECT *` â æ¥è¯¢å¾å°ç¨äºåæï¼ã51ãã以[ä¾3-1]()ä¸çæ¥è¯¢ä¸ºä¾ï¼å®è®¿é®äºå¤§éçè¡ï¼å¨2013æ¥åå¹´ä¸æ¯æ¬¡é½æ人è´ä¹°æ°´ææç³æï¼ï¼ä½åªé访é®`fact_sales`表çä¸åï¼`date_key, product_sk, quantity`ã该æ¥è¯¢å¿½ç¥äºææå ¶ä»çåã **ä¾3-1 åæ人们æ¯å¦æ´å¾åäºå¨ä¸å¨çæä¸å¤©è´ä¹°æ°é²æ°´ææç³æ** ```sql SELECT dim_date.weekday, dim_product.category, SUM(fact_sales.quantity) AS quantity_sold FROM fact_sales JOIN dim_date ON fact_sales.date_key = dim_date.date_key JOIN dim_product ON fact_sales.product_sk = dim_product.product_sk WHERE dim_date.year = 2013 AND dim_product.category IN ('Fresh fruit', 'Candy') GROUP BY dim_date.weekday, dim_product.category; ``` æ们å¦ä½ææå°æ§è¡è¿ä¸ªæ¥è¯¢ï¼ å¨å¤§å¤æ°OLTPæ°æ®åºä¸ï¼åå¨é½æ¯ä»¥é¢åè¡çæ¹å¼è¿è¡å¸å±çï¼è¡¨æ ¼çä¸è¡ä¸çææå¼é½ç¸é»åå¨ãææ¡£æ°æ®åºä¹æ¯ç¸ä¼¼çï¼æ´ä¸ªææ¡£é常åå¨ä¸ºä¸ä¸ªè¿ç»çåèåºåãä½ å¯ä»¥å¨[å¾3-1](img/fig3-1.png)çCSVä¾åä¸çå°è¿ä¸ªã 为äºå¤çå[ä¾3-1]()è¿æ ·çæ¥è¯¢ï¼ä½ å¯è½å¨ `fact_sales.date_key`ã`fact_sales.product_sk`ä¸æç´¢å¼ï¼å®ä»¬åè¯åå¨å¼æå¨åªéæ¥æ¾ç¹å®æ¥ææç¹å®äº§åçææéå®æ åµãä½æ¯ï¼é¢åè¡çåå¨å¼æä»ç¶éè¦å°ææè¿äºè¡ï¼æ¯ä¸ªå å«è¶ è¿100个å±æ§ï¼ä»ç¡¬çå è½½å°å åä¸ï¼è§£æå®ä»¬ï¼å¹¶è¿æ»¤æé£äºä¸ç¬¦åè¦æ±çå±æ§ãè¿å¯è½éè¦å¾é¿æ¶é´ã åå¼åå¨èåçæ³æ³å¾ç®åï¼ä¸è¦å°æææ¥èªä¸è¡çå¼åå¨å¨ä¸èµ·ï¼èæ¯å°æ¥èªæ¯ä¸åçææå¼åå¨å¨ä¸èµ·ãå¦ææ¯ä¸ªåå¼åå¨å¨ä¸ä¸ªåç¬çæ件ä¸ï¼æ¥è¯¢åªéè¦è¯»åå解ææ¥è¯¢ä¸ä½¿ç¨çé£äºåï¼è¿å¯ä»¥èç大éçå·¥ä½ãè¿ä¸ªåçå¦[å¾3-10](img/fig3-10.png)æ示ã ![](img/fig3-10.png) **å¾3-10 æååå¨å ³ç³»åæ°æ®ï¼èä¸æ¯è¡** > åå¼åå¨å¨å ³ç³»æ°æ®æ¨¡åä¸æ¯æ容æç解çï¼ä½å®åæ ·éç¨äºéå ³ç³»æ°æ®ãä¾å¦ï¼Parquetã57ãæ¯ä¸ç§åå¼åå¨æ ¼å¼ï¼æ¯æåºäºGoogleçDremelçææ¡£æ°æ®æ¨¡åã54ãã åå¼åå¨å¸å±ä¾èµäºæ¯ä¸ªåæ件å å«ç¸å顺åºçè¡ã å æ¤ï¼å¦æä½ éè¦éæ°ç»è£ å®æ´çè¡ï¼ä½ å¯ä»¥ä»æ¯ä¸ªåç¬çåæ件ä¸è·å第23项ï¼å¹¶å°å®ä»¬æ¾å¨ä¸èµ·å½¢æ表ç第23è¡ã ### åå缩 é¤äºä» ä»ç¡¬çå è½½æ¥è¯¢æéçå以å¤ï¼æ们è¿å¯ä»¥éè¿å缩æ°æ®æ¥è¿ä¸æ¥éä½å¯¹ç¡¬çååéçéæ±ã幸è¿çæ¯ï¼åå¼åå¨é常å¾éåå缩ã çç[å¾3-10](img/fig3-10.png)ä¸æ¯ä¸åçå¼åºåï¼å®ä»¬é常çèµ·æ¥æ¯ç¸å½éå¤çï¼è¿æ¯å缩ç好å 头ãæ ¹æ®åä¸çæ°æ®ï¼å¯ä»¥ä½¿ç¨ä¸åçå缩ææ¯ãå¨æ°æ®ä»åºä¸ç¹å«ææçä¸ç§ææ¯æ¯ä½å¾ç¼ç ï¼å¦[å¾3-11](img/fig3-11.png)æ示ã ![](img/fig3-11.png) **å¾3-11 å缩çä½å¾ç´¢å¼åå¨å¸å±** é常æ åµä¸ï¼ä¸åä¸ä¸åå¼çæ°éä¸è¡æ°ç¸æ¯è¦å°å¾å¤ï¼ä¾å¦ï¼é¶å®åå¯è½ææ°å亿çéå®äº¤æï¼ä½åªæ100,000个ä¸åç产åï¼ãç°å¨æ们å¯ä»¥æ¿ä¸ä¸ªæ n 个ä¸åå¼çåï¼å¹¶æå®è½¬æ¢æ n 个ç¬ç«çä½å¾ï¼æ¯ä¸ªä¸åå¼å¯¹åºä¸ä¸ªä½å¾ï¼æ¯è¡å¯¹åºä¸ä¸ªæ¯ç¹ä½ãå¦æ该è¡å ·æ该å¼ï¼å该ä½ä¸º1ï¼å¦å为0ã å¦æné常å°ï¼ä¾å¦ï¼å½å®¶/å°åºåå¯è½æ大约200个ä¸åçå¼ï¼ï¼åè¿äºä½å¾å¯ä»¥å°æ¯è¡åå¨æä¸ä¸ªæ¯ç¹ä½ãä½æ¯ï¼å¦ænæ´å¤§ï¼å¤§é¨åä½å¾ä¸å°ä¼æå¾å¤çé¶ï¼æ们说å®ä»¬æ¯ç¨ççï¼ãå¨è¿ç§æ åµä¸ï¼ä½å¾å¯ä»¥å¦å¤åè¿è¡æ¸¸ç¨ç¼ç ï¼å¦[å¾3-11](fig3-11.png)åºé¨æ示ãè¿å¯ä»¥ä½¿åçç¼ç é常紧åã è¿äºä½å¾ç´¢å¼é常éåæ°æ®ä»åºä¸å¸¸è§çåç§æ¥è¯¢ãä¾å¦ï¼ ```sql WHERE product_sk INï¼30ï¼68ï¼69ï¼ ``` å è½½`product_sk = 30`ã`product_sk = 68`å`product_sk = 69`è¿ä¸ä¸ªä½å¾ï¼å¹¶è®¡ç®ä¸ä¸ªä½å¾çæä½æï¼ORï¼ï¼è¿å¯ä»¥é常ææå°å®æã ```sql WHERE product_sk = 31 AND store_sk = 3 ``` å è½½`product_sk = 31`å`store_sk = 3`çä½å¾ï¼å¹¶è®¡ç®æä½ä¸ï¼ANDï¼ãè¿æ¯å 为åæç §ç¸åç顺åºå å«è¡ï¼å æ¤ä¸åçä½å¾ä¸ç第kä½åå¦ä¸åçä½å¾ä¸ç第kä½å¯¹åºç¸åçè¡ã 对äºä¸åç§ç±»çæ°æ®ï¼ä¹æåç§ä¸åçå缩æ¹æ¡ï¼ä½æ们ä¸ä¼è¯¦ç»è®¨è®ºå®ä»¬ï¼è¯·åé ã58ãçæ¦è¿°ã > #### åå¼åå¨ååæ > > CassandraåHBaseæä¸ä¸ªåæï¼column familiesï¼çæ¦å¿µï¼ä»ä»¬ä»Bigtable继æ¿ã9ããç¶èï¼æå®ä»¬ç§°ä¸ºåå¼ï¼column-orientedï¼æ¯éå¸¸å ·æ误导æ§çï¼å¨æ¯ä¸ªåæä¸ï¼å®ä»¬å°ä¸è¡ä¸çææåä¸è¡é®ä¸èµ·åå¨ï¼å¹¶ä¸ä¸ä½¿ç¨åå缩ãå æ¤ï¼Bigtable模åä»ç¶ä¸»è¦æ¯é¢åè¡çã > #### å å带宽åç¢éåå¤ç 对äºéè¦æ«ææ°ç¾ä¸è¡çæ°æ®ä»åºæ¥è¯¢æ¥è¯´ï¼ä¸ä¸ªå·¨å¤§çç¶é¢æ¯ä»ç¡¬çè·åæ°æ®å°å åç带宽ãä½æ¯ï¼è¿ä¸æ¯å¯ä¸çç¶é¢ãåæåæ°æ®åºçå¼å人åè¿éè¦ææå°å©ç¨ä¸»åå¨å¨å°CPUç¼åç带宽ï¼é¿å CPUæ令å¤çæµæ°´çº¿ä¸çåæ¯é¢æµé误åæ°æ³¡ï¼ä»¥åå¨ç°ä»£CPUä¸ä½¿ç¨åæ令å¤æ°æ®ï¼SIMDï¼æ令ã59,60ãã é¤äºåå°éè¦ä»ç¡¬çå è½½çæ°æ®é以å¤ï¼åå¼åå¨å¸å±ä¹å¯ä»¥ææå©ç¨CPUå¨æãä¾å¦ï¼æ¥è¯¢å¼æå¯ä»¥å°å¤§éå缩çåæ°æ®æ¾å¨CPUçL1ç¼åä¸ï¼ç¶åå¨ç´§å¯ç循ç¯ï¼å³æ²¡æå½æ°è°ç¨ï¼ä¸éåãç¸æ¯è¾æ¯ä¸ªè®°å½çå¤çé½éè¦å¤§éå½æ°è°ç¨åæ¡ä»¶å¤æç代ç ï¼CPUæ§è¡è¿æ ·ä¸ä¸ªå¾ªç¯è¦å¿«å¾å¤ãåå缩å 许åä¸çæ´å¤è¡è¢«æ¾è¿ç¸åæ°éçL1ç¼åãåé¢æè¿°çæä½âä¸âåâæâè¿ç®ç¬¦å¯ä»¥è¢«è®¾è®¡ä¸ºç´æ¥å¨è¿æ ·çå缩åæ°æ®åä¸æä½ãè¿ç§ææ¯è¢«ç§°ä¸ºç¢éåå¤çã58,49ãã ### åå¼åå¨ä¸çæåºé¡ºåº å¨åå¼åå¨ä¸ï¼åå¨è¡ç顺åºå¹¶ä¸ä¸å®å¾éè¦ãææå ¥é¡ºåºåå¨å®ä»¬æ¯æç®åçï¼å 为æå ¥ä¸ä¸ªæ°è¡åªéè¦è¿½å å°æ¯ä¸ªåæ件ãä½æ¯ï¼æ们å¯ä»¥éæ©å¢å ä¸ä¸ªç¹å®ç顺åºï¼å°±åæ们ä¹å对SSTablesæåçé£æ ·ï¼å¹¶å°å ¶ç¨ä½ç´¢å¼æºå¶ã 注æï¼æ¯åç¬èªæåºæ¯æ²¡ææä¹çï¼å 为é£æ ·æ们就没æ³ç¥éä¸ååä¸çåªäºé¡¹å±äºåä¸è¡ãæ们åªè½å¨ç¥éä¸åä¸ç第k项ä¸å¦ä¸åä¸ç第k项å±äºåä¸è¡çæ åµæè½é建åºå®æ´çè¡ã ç¸åï¼å³ä½¿æåå¼åå¨æ°æ®ï¼ä¹éè¦ä¸æ¬¡å¯¹æ´è¡è¿è¡æåºãæ°æ®åºç管çåå¯ä»¥æ ¹æ®ä»ä»¬å¯¹å¸¸ç¨æ¥è¯¢çäºè§£æ¥éæ©è¡¨æ ¼åºè¯¥è¢«æåºçåãä¾å¦ï¼å¦ææ¥è¯¢é常以æ¥æèå´ä¸ºç®æ ï¼ä¾å¦ä¸ä¸ªæï¼åå¯ä»¥å° `date_key` ä½ä¸ºç¬¬ä¸ä¸ªæåºé®ãè¿æ ·æ¥è¯¢ä¼åå¨å°±å¯ä»¥åªæ«æä¸ä¸ªæçè¡äºï¼è¿æ¯æ«æææè¡è¦å¿«å¾å¤ã 对äºç¬¬ä¸æåºåä¸å ·æç¸åå¼çè¡ï¼å¯ä»¥ç¨ç¬¬äºæåºåæ¥è¿ä¸æ¥æåºãä¾å¦ï¼å¦æ `date_key` æ¯[å¾3-10](img/fig3-10.png)ä¸ç第ä¸ä¸ªæåºå ³é®åï¼é£ä¹ `product_sk` å¯è½æ¯ç¬¬äºä¸ªæåºå ³é®åï¼ä»¥ä¾¿åä¸å¤©çåä¸äº§åçææéå®é½å°å¨åå¨ä¸ç»åå¨ä¸èµ·ãè¿å°æå©äºéè¦å¨ç¹å®æ¥æèå´å æ产å对éå®è¿è¡åç»æè¿æ»¤çæ¥è¯¢ã æåºé¡ºåºçå¦ä¸ä¸ªå¥½å¤æ¯å®å¯ä»¥å¸®å©å缩åãå¦æ主è¦æåºå没æ太å¤ä¸ªä¸åçå¼ï¼é£ä¹å¨æåºä¹åï¼å®å°å ·æå¾é¿çåºåï¼å ¶ä¸ç¸åçå¼è¿ç»éå¤å¤æ¬¡ãä¸ä¸ªç®åç游ç¨ç¼ç ï¼å°±åæ们ç¨äº[å¾3-11](img/fig3-11.png)ä¸çä½å¾ä¸æ ·ï¼å¯ä»¥å°è¯¥åå缩å°å ååè ââ å³ä½¿è¡¨ä¸ææ°å亿è¡ã 第ä¸ä¸ªæåºé®çå缩æææ强ã第äºå第ä¸ä¸ªæåºé®ä¼æ´æ··ä¹±ï¼å æ¤ä¸ä¼æè¿ä¹é¿çè¿ç»çéå¤å¼ãæåºä¼å 级æ´ä½çå以åºæ¬ä¸éæºç顺åºåºç°ï¼æ以å®ä»¬å¯è½ä¸ä¼è¢«å缩ãä½åå åæåºå¨æ´ä½ä¸ä»ç¶æ¯æ好å¤çã #### å 个ä¸åçæåºé¡ºåº C-Storeä¸å¼å ¥äºè¿ä¸ªæ³æ³çä¸ä¸ªå·§å¦æ©å±ï¼å¹¶å¨åä¸æ°æ®ä»åºVerticaä¸è¢«éç¨ã61,62ããä¸åçæ¥è¯¢åçäºä¸åçæåºé¡ºåºï¼ä¸ºä»ä¹ä¸ä»¥å ç§ä¸åçæ¹å¼æ¥åå¨ç¸åçæ°æ®å¢ï¼æ 论å¦ä½ï¼æ°æ®éè¦å¤å¶å°å¤å°æºå¨ï¼è¿æ ·ï¼å¦æä¸å°æºå¨åçæ éï¼ä½ ä¸ä¼ä¸¢å¤±æ°æ®ãä½ å¯è½è¿éè¦åå¨ä»¥ä¸åæ¹å¼æåºçåä½æ°æ®ï¼ä»¥ä¾¿å¨å¤çæ¥è¯¢æ¶ï¼å¯ä»¥ä½¿ç¨æéåæ¥è¯¢æ¨¡å¼ççæ¬ã å¨ä¸ä¸ªåå¼åå¨ä¸æå¤ä¸ªæåºé¡ºåºæç¹ç±»ä¼¼äºå¨ä¸ä¸ªé¢åè¡çåå¨ä¸æå¤ä¸ªæ¬¡çº§ç´¢å¼ãä½æ大çåºå«å¨äºé¢åè¡çåå¨å°æ¯ä¸è¡ä¿åå¨ä¸ä¸ªå°æ¹ï¼å¨å æ件æèéç´¢å¼ä¸ï¼ï¼æ¬¡çº§ç´¢å¼åªå å«æåå¹é è¡çæéãå¨åå¼åå¨ä¸ï¼é常å¨å ¶ä»å°æ¹æ²¡æä»»ä½æåæ°æ®çæéï¼åªæå å«å¼çåã ### åå ¥åå¼åå¨ è¿äºä¼åå¨æ°æ®ä»åºä¸æ¯ææä¹çï¼å ä¸ºå ¶è´è½½ä¸»è¦ç±åæ人åè¿è¡ç大ååªè¯»æ¥è¯¢ç»æãåå¼åå¨ãå缩åæåºé½æå©äºæ´å¿«å°è¯»åè¿äºæ¥è¯¢ãç¶èï¼ä»ä»¬ç缺ç¹æ¯åå ¥æ´å å°é¾ã 使ç¨Bæ çå°±å°æ´æ°æ¹æ³å¯¹äºå缩çåæ¯ä¸å¯è½çãå¦æä½ æ³å¨æåºè¡¨çä¸é´æå ¥ä¸è¡ï¼ä½ å¾å¯è½ä¸å¾ä¸éåææçåæ件ãç±äºè¡ç±åä¸çä½ç½®æ è¯ï¼å æ¤æå ¥å¿ é¡»å¯¹ææåè¿è¡ä¸è´å°æ´æ°ã 幸è¿çæ¯ï¼æ¬ç« åé¢å·²ç»çå°äºä¸ä¸ªå¾å¥½ç解å³æ¹æ¡ï¼LSMæ ãææçåæä½é¦å è¿å ¥ä¸ä¸ªå åä¸çåå¨ï¼å¨è¿éå®ä»¬è¢«æ·»å å°ä¸ä¸ªå·²æåºçç»æä¸ï¼å¹¶åå¤åå ¥ç¡¬çãå åä¸çåå¨æ¯é¢åè¡è¿æ¯åç并ä¸éè¦ãå½å·²ç»ç§¯ç´¯äºè¶³å¤çåå ¥æ°æ®æ¶ï¼å®ä»¬å°ä¸ç¡¬çä¸çåæ件å并ï¼å¹¶æ¹éåå ¥æ°æ件ãè¿åºæ¬ä¸æ¯Verticaæåçã62ãã æ¥è¯¢éè¦æ£æ¥ç¡¬çä¸çåæ°æ®åæè¿å¨å åä¸çåå ¥ï¼å¹¶å°ä¸¤è ç»åèµ·æ¥ãä½æ¯ï¼æ¥è¯¢ä¼åå¨å¯¹ç¨æ·éèäºè¿ä¸ªç»èãä»åæå¸çè§åº¦æ¥çï¼éè¿æå ¥ãæ´æ°æå é¤æä½è¿è¡ä¿®æ¹çæ°æ®ä¼ç«å³åæ å¨åç»çæ¥è¯¢ä¸ã ### èåï¼æ°æ®ç«æ¹ä½åç©åè§å¾ 并ä¸æ¯æ¯ä¸ªæ°æ®ä»åºé½å¿ å®æ¯ä¸ä¸ªåå¼åå¨ï¼ä¼ ç»çé¢åè¡çæ°æ®åºåå ¶ä»ä¸äºæ¶æä¹è¢«ä½¿ç¨ãç¶èï¼åå¼åå¨å¯ä»¥æ¾èå å¿«ä¸é¨çåææ¥è¯¢ï¼æ以å®æ£å¨è¿ éåå¾æµè¡èµ·æ¥ã51,63ãã æ°æ®ä»åºçå¦ä¸ä¸ªå¼å¾ä¸æçæ¹é¢æ¯ç©åæ±æ»ï¼materialized aggregatesï¼ãå¦åæè¿°ï¼æ°æ®ä»åºæ¥è¯¢é常æ¶åä¸ä¸ªèåå½æ°ï¼å¦SQLä¸çCOUNTãSUMãAVGãMINæMAXãå¦æç¸åçèå被许å¤ä¸åçæ¥è¯¢ä½¿ç¨ï¼é£ä¹æ¯æ¬¡é½éè¿åå§æ°æ®æ¥å¤çå¯è½å¤ªæµªè´¹äºã为ä»ä¹ä¸å°ä¸äºæ¥è¯¢ä½¿ç¨æé¢ç¹ç计æ°ææ»åç¼åèµ·æ¥ï¼ å建è¿ç§ç¼åçä¸ç§æ¹å¼æ¯ç©åè§å¾ï¼Materialized Viewï¼ãå¨å ³ç³»æ°æ®æ¨¡åä¸ï¼å®é常被å®ä¹ä¸ºä¸ä¸ªæ åï¼èæï¼è§å¾ï¼ä¸ä¸ªç±»ä¼¼äºè¡¨ç对象ï¼å ¶å 容æ¯ä¸äºæ¥è¯¢çç»æãä¸åçæ¯ï¼ç©åè§å¾æ¯æ¥è¯¢ç»æçå®é å¯æ¬ï¼ä¼è¢«åå ¥ç¡¬çï¼èèæè§å¾åªæ¯ç¼åæ¥è¯¢çä¸ä¸ªæ·å¾ãä»èæè§å¾è¯»åæ¶ï¼SQLå¼æä¼å°å ¶å±å¼å°è§å¾çåºå±æ¥è¯¢ä¸ï¼ç¶ååå¤çå±å¼çæ¥è¯¢ã å½åºå±æ°æ®åçååæ¶ï¼ç©åè§å¾éè¦æ´æ°ï¼å 为å®æ¯æ°æ®çéè§èåå¯æ¬ãæ°æ®åºå¯ä»¥èªå¨å®æ该æä½ï¼ä½æ¯è¿æ ·çæ´æ°ä½¿å¾åå ¥ææ¬æ´é«ï¼è¿å°±æ¯å¨OLTPæ°æ®åºä¸ä¸ç»å¸¸ä½¿ç¨ç©åè§å¾çåå ãå¨è¯»åç¹éçæ°æ®ä»åºä¸ï¼å®ä»¬å¯è½æ´ææä¹ï¼å®ä»¬æ¯å¦å®é ä¸æ¹åäºè¯»åæ§è½åå³äºä¸ªå«æ åµï¼ã ç©åè§å¾ç常è§ç¹ä¾ç§°ä¸ºæ°æ®ç«æ¹ä½æOLAPç«æ¹ã64ããå®æ¯æä¸å维度åç»çèåç½æ ¼ã[å¾3-12](img/fig3-12.png)æ¾ç¤ºäºä¸ä¸ªä¾åã ![](img/fig3-12.png) **å¾3-12 æ°æ®ç«æ¹ç两个维度ï¼éè¿æ±åèå** æ³è±¡ä¸ä¸ï¼ç°å¨æ¯ä¸ªäºå®é½åªæ两个维度表çå¤é® ââ å¨[å¾3-12](img/fig-3-12.png)ä¸åå«æ¯æ¥æå产åãä½ ç°å¨å¯ä»¥ç»å¶ä¸ä¸ªäºç»´è¡¨æ ¼ï¼ä¸ä¸ªè½´çº¿ä¸æ¯æ¥æï¼å¦ä¸ä¸ªè½´çº¿ä¸æ¯äº§åãæ¯ä¸ªåå æ ¼å å«å ·æ该æ¥æ-产åç»åçææäºå®çå±æ§ï¼ä¾å¦`net_price`ï¼çèéï¼ä¾å¦`SUM`ï¼ãç¶åï¼ä½ å¯ä»¥æ²¿çæ¯è¡ææ¯ååºç¨ç¸åçæ±æ»ï¼å¹¶è·å¾åå°äºä¸ä¸ªç»´åº¦çæ±æ»ï¼æ产åçéå®é¢ï¼æ 论æ¥æï¼æè ææ¥æçéå®é¢ï¼æ 论产åï¼ã ä¸è¬æ¥è¯´ï¼äºå®å¾å¾æ两个以ä¸ç维度ãå¨å¾3-9ä¸æäºä¸ªç»´åº¦ï¼æ¥æã产åãååºãä¿éå客æ·ãè¦æ³è±¡ä¸ä¸ªäºç»´è¶ ç«æ¹ä½æ¯ä»ä¹æ ·åæ¯å¾å°é¾çï¼ä½æ¯åçæ¯ä¸æ ·çï¼æ¯ä¸ªåå æ ¼é½å å«ç¹å®æ¥æ-产å-ååº-ä¿é-客æ·ç»åçéå®é¢ãè¿äºå¼å¯ä»¥å¨æ¯ä¸ªç»´åº¦ä¸æ±åæ±æ»ã ç©åæ°æ®ç«æ¹ä½çä¼ç¹æ¯å¯ä»¥è®©æäºæ¥è¯¢åå¾é常快ï¼å 为å®ä»¬å·²ç»è¢«ææå°é¢å 计ç®äºãä¾å¦ï¼å¦æä½ æ³ç¥éæ¯ä¸ªååºçæ»éå®é¢ï¼ååªéæ¥çåé维度çæ»è®¡ï¼èæ éæ«ææ°ç¾ä¸è¡çåå§æ°æ®ã æ°æ®ç«æ¹ä½ç缺ç¹æ¯ä¸å ·ææ¥è¯¢åå§æ°æ®ççµæ´»æ§ãä¾å¦ï¼æ²¡æåæ³è®¡ç®æå¤å°æ¯ä¾çéå®æ¥èªææ¬è¶ è¿100ç¾å ç项ç®ï¼å ä¸ºä»·æ ¼ä¸æ¯å ¶ä¸çä¸ä¸ªç»´åº¦ãå æ¤ï¼å¤§å¤æ°æ°æ®ä»åºè¯å¾ä¿çå°½å¯è½å¤çåå§æ°æ®ï¼å¹¶å°èåæ°æ®ï¼å¦æ°æ®ç«æ¹ä½ï¼ä» ç¨ä½æäºæ¥è¯¢çæ§è½æåæ段ã ## æ¬ç« å°ç» å¨æ¬ç« ä¸ï¼æ们è¯å¾æ·±å ¥äºè§£æ°æ®åºæ¯å¦ä½å¤çåå¨åæ£ç´¢çãå°æ°æ®åå¨å¨æ°æ®åºä¸ä¼åçä»ä¹ï¼ç¨åå次æ¥è¯¢æ°æ®æ¶æ°æ®åºä¼åä»ä¹ï¼ å¨é«å±æ¬¡ä¸ï¼æ们çå°åå¨å¼æå为两大类ï¼é对**äºå¡å¤çï¼OLTPï¼** ä¼åçåå¨å¼æåé对**å¨çº¿åæï¼OLAPï¼** ä¼åçåå¨å¼æãè¿ä¸¤ç±»ä½¿ç¨åºæ¯ç访é®æ¨¡å¼ä¹é´æå¾å¤§çåºå«ï¼ * OLTPç³»ç»é常é¢åæç»ç¨æ·ï¼è¿æå³çç³»ç»å¯è½ä¼æ¶å°å¤§éç请æ±ã为äºå¤çè´è½½ï¼åºç¨ç¨åºå¨æ¯ä¸ªæ¥è¯¢ä¸é常åªè®¿é®å°éçè®°å½ãåºç¨ç¨åºä½¿ç¨æç§é®æ¥è¯·æ±è®°å½ï¼åå¨å¼æ使ç¨ç´¢å¼æ¥æ¥æ¾æ请æ±çé®çæ°æ®ã硬çæ¥æ¾æ¶é´å¾å¾æ¯è¿éçç¶é¢ã * æ°æ®ä»åºå类似çåæç³»ç»ä¼ä½è°ä¸äºï¼å 为å®ä»¬ä¸»è¦ç±ä¸å¡åæ人å使ç¨ï¼èä¸æ¯æç»ç¨æ·ãå®ä»¬çæ¥è¯¢éè¦æ¯OLTPç³»ç»å°å¾å¤ï¼ä½é常æ¯ä¸ªæ¥è¯¢å¼éé«æï¼éè¦å¨çæ¶é´å æ«ææ°ç¾ä¸æ¡è®°å½ã硬ç带宽ï¼èä¸æ¯æ¥æ¾æ¶é´ï¼å¾å¾æ¯ç¶é¢ï¼åå¼åå¨æ¯é对è¿ç§å·¥ä½è´è½½çæ¥çæµè¡ç解å³æ¹æ¡ã å¨OLTPè¿ä¸è¾¹ï¼æ们è½çå°ä¸¤æ´¾ä¸»æµçåå¨å¼æï¼ * æ¥å¿ç»æå¦æ´¾ï¼åªå 许追å å°æ件åå é¤è¿æ¶çæ件ï¼ä½ä¸ä¼æ´æ°å·²ç»åå ¥çæ件ãBitcaskãSSTablesãLSMæ ãLevelDBãCassandraãHBaseãLuceneçé½å±äºè¿ä¸ªç±»å«ã * å°±å°æ´æ°å¦æ´¾ï¼å°ç¡¬çè§ä¸ºä¸ç»å¯ä»¥è¦åçåºå®å¤§å°ç页é¢ã Bæ æ¯è¿ç§ç念çå ¸èï¼ç¨å¨ææ主è¦çå ³ç³»æ°æ®åºå许å¤éå ³ç³»åæ°æ®åºä¸ã æ¥å¿ç»æçåå¨å¼ææ¯ç¸å¯¹è¾æ°çææ¯ãä»ä»¬ç主è¦æ³æ³æ¯ï¼éè¿ç³»ç»æ§å°å°éæºè®¿é®åå ¥è½¬æ¢ä¸ºç¡¬çä¸ç顺åºåå ¥ï¼ç±äºç¡¬ç驱å¨å¨ååºæ硬ççæ§è½ç¹ç¹ï¼å¯ä»¥å®ç°æ´é«çåå ¥ååéã å ³äºOLTPï¼æ们æåè¿ä»ç»äºä¸äºæ´å¤æçç´¢å¼ç»æï¼ä»¥åé对æææ°æ®é½æ¾å¨å åéèä¼åçæ°æ®åºã ç¶åï¼æ们ææ¶æ¾ä¸äºåå¨å¼æçå é¨ç»èï¼æ¥çäºå ¸åæ°æ®ä»åºçé«çº§æ¶æï¼å¹¶è¯´æäºä¸ºä»ä¹åæå·¥ä½è´è½½ä¸OLTPå·®å«å¾å¤§ï¼å½ä½ çæ¥è¯¢éè¦å¨å¤§éè¡ä¸é¡ºåºæ«ææ¶ï¼ç´¢å¼çéè¦æ§å°±ä¼éä½å¾å¤ãç¸åï¼é常紧åå°ç¼ç æ°æ®åå¾é常éè¦ï¼ä»¥æ大é度å°åå°æ¥è¯¢éè¦ä»ç¡¬ç读åçæ°æ®éãæ们讨论äºåå¼åå¨å¦ä½å¸®å©å®ç°è¿ä¸ç®æ ã ä½ä¸ºä¸ååºç¨ç¨åºå¼å人åï¼å¦æä½ ææ¡äºæå ³åå¨å¼æå é¨çç¥è¯ï¼é£ä¹ä½ å°±è½æ´å¥½å°äºè§£åªç§å·¥å ·æéåä½ çç¹å®åºç¨ç¨åºãå¦æä½ éè¦è°æ´æ°æ®åºçè°æ´åæ°ï¼è¿ç§ç解å¯ä»¥è®©ä½ 设æ³ä¸ä¸ªæ´é«ææ´ä½çå¼å¯è½ä¼äº§çä»ä¹ææã 尽管æ¬ç« ä¸è½è®©ä½ æ为ä¸ä¸ªç¹å®åå¨å¼æçè°åä¸å®¶ï¼ä½å®è³å°å¤§æ¦çä½¿ä½ æäºè¶³å¤çæ¦å¿µä¸è¯æ±å¨å¤å»è¯»æä½ æéæ©çæ°æ®åºçææ¡£ã ## åèæç® 1. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman: *Data Structures and Algorithms*. Addison-Wesley, 1983. ISBN: 978-0-201-00023-8 1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: *Introduction to Algorithms*, 3rd edition. MIT Press, 2009. ISBN: 978-0-262-53305-8 1. Justin Sheehy and David Smith: â[Bitcask: A Log-Structured Hash Table for Fast Key/Value Data](http://basho.com/wp-content/uploads/2015/05/bitcask-intro.pdf),â Basho Technologies, April 2010. 1. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: â[Tree Indexing on Solid State Drives](http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf),â *Proceedings of the VLDB Endowment*, volume 3, number 1, pages 1195â1206, September 2010. 1. Goetz Graefe: â[Modern B-Tree Techniques](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf),â *Foundations and Trends in Databases*, volume 3, number 4, pages 203â402, August 2011. [doi:10.1561/1900000028](http://dx.doi.org/10.1561/1900000028) 1. Jeffrey Dean and Sanjay Ghemawat: â[LevelDB Implementation Notes](https://github.com/google/leveldb/blob/master/doc/impl.html),â *leveldb.googlecode.com*. 1. Dhruba Borthakur: â[The History of RocksDB](http://rocksdb.blogspot.com/),â *rocksdb.blogspot.com*, November 24, 2013. 1. Matteo Bertozzi: â[Apache HBase I/O â HFile](http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/),â *blog.cloudera.com*, June, 29 2012. 1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: â[Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html),â at *7th USENIX Symposium on Operating System Design and Implementation* (OSDI), November 2006. 1. Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil: â[The Log-Structured Merge-Tree (LSM-Tree)](http://www.cs.umb.edu/~poneil/lsmtree.pdf),â *Acta Informatica*, volume 33, number 4, pages 351â385, June 1996. [doi:10.1007/s002360050048](http://dx.doi.org/10.1007/s002360050048) 1. Mendel Rosenblum and John K. Ousterhout: â[The Design and Implementation of a Log-Structured File System](http://research.cs.wisc.edu/areas/os/Qual/papers/lfs.pdf),â *ACM Transactions on Computer Systems*, volume 10, number 1, pages 26â52, February 1992. [doi:10.1145/146941.146943](http://dx.doi.org/10.1145/146941.146943) 1. Adrien Grand: â[What Is in a Lucene Index?](http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal),â at *Lucene/Solr Revolution*, November 14, 2013. 1. Deepak Kandepet: â[Hacking LuceneâThe Index Format]( http://hackerlabs.github.io/blog/2011/10/01/hacking-lucene-the-index-format/index.html),â *hackerlabs.org*, October 1, 2011. 1. Michael McCandless: â[Visualizing Lucene's Segment Merges](http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html),â *blog.mikemccandless.com*, February 11, 2011. 1. Burton H. Bloom: â[Space/Time Trade-offs in Hash Coding with Allowable Errors](http://www.cs.upc.edu/~diaz/p422-bloom.pdf),â *Communications of the ACM*, volume 13, number 7, pages 422â426, July 1970. [doi:10.1145/362686.362692](http://dx.doi.org/10.1145/362686.362692) 1. â[Operating Cassandra: Compaction](https://cassandra.apache.org/doc/latest/operating/compaction.html),â Apache Cassandra Documentation v4.0, 2016. 1. Rudolf Bayer and Edward M. McCreight: â[Organization and Maintenance of Large Ordered Indices](http://www.dtic.mil/cgi-bin/GetTRDoc?AD=AD0712079),â Boeing Scientific Research Laboratories, Mathematical and Information Sciences Laboratory, report no. 20, July 1970. 1. Douglas Comer: â[The Ubiquitous B-Tree](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.6637&rep=rep1&type=pdf),â *ACM Computing Surveys*, volume 11, number 2, pages 121â137, June 1979. [doi:10.1145/356770.356776](http://dx.doi.org/10.1145/356770.356776) 1. Emmanuel Goossaert: â[Coding for SSDs](http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/),â *codecapsule.com*, February 12, 2014. 1. C. Mohan and Frank Levine: â[ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging](http://www.ics.uci.edu/~cs223/papers/p371-mohan.pdf),â at *ACM International Conference on Management of Data* (SIGMOD), June 1992. [doi:10.1145/130283.130338](http://dx.doi.org/10.1145/130283.130338) 1. Howard Chu: â[LDAP at Lightning Speed]( https://buildstuff14.sched.com/event/08a1a368e272eb599a52e08b4c3c779d),â at *Build Stuff '14*, November 2014. 1. Bradley C. Kuszmaul: â[A Comparison of Fractal Trees to Log-Structured Merge (LSM) Trees](http://insideanalysis.com/wp-content/uploads/2014/08/Tokutek_lsm-vs-fractal.pdf),â *tokutek.com*, April 22, 2014. 1. Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, et al.: â[Designing Access Methods: The RUM Conjecture](http://openproceedings.org/2016/conf/edbt/paper-12.pdf),â at *19th International Conference on Extending Database Technology* (EDBT), March 2016. [doi:10.5441/002/edbt.2016.42](http://dx.doi.org/10.5441/002/edbt.2016.42) 1. Peter Zaitsev: â[Innodb Double Write](https://www.percona.com/blog/2006/08/04/innodb-double-write/),â *percona.com*, August 4, 2006. 1. Tomas Vondra: â[On the Impact of Full-Page Writes](http://blog.2ndquadrant.com/on-the-impact-of-full-page-writes/),â *blog.2ndquadrant.com*, November 23, 2016. 1. Mark Callaghan: â[The Advantages of an LSM vs a B-Tree](http://smalldatum.blogspot.co.uk/2016/01/summary-of-advantages-of-lsm-vs-b-tree.html),â *smalldatum.blogspot.co.uk*, January 19, 2016. 1. Mark Callaghan: â[Choosing Between Efficiency and Performance with RocksDB](http://www.codemesh.io/codemesh/mark-callaghan),â at *Code Mesh*, November 4, 2016. 1. Michi Mutsuzaki: â[MySQL vs. LevelDB](https://github.com/m1ch1/mapkeeper/wiki/MySQL-vs.-LevelDB),â *github.com*, August 2011. 1. Benjamin Coverston, Jonathan Ellis, et al.: â[CASSANDRA-1608: Redesigned Compaction](https://issues.apache.org/jira/browse/CASSANDRA-1608), *issues.apache.org*, July 2011. 1. Igor Canadi, Siying Dong, and Mark Callaghan: â[RocksDB Tuning Guide](https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide),â *github.com*, 2016. 1. [*MySQL 5.7 Reference Manual*](http://dev.mysql.com/doc/refman/5.7/en/index.html). Oracle, 2014. 1. [*Books Online for SQL Server 2012*](http://msdn.microsoft.com/en-us/library/ms130214.aspx). Microsoft, 2012. 1. Joe Webb: â[Using Covering Indexes to Improve Query Performance](https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/),â *simple-talk.com*, 29 September 2008. 1. Frank Ramsak, Volker Markl, Robert Fenk, et al.: â[Integrating the UB-Tree into a Database System Kernel](http://www.vldb.org/conf/2000/P263.pdf),â at *26th International Conference on Very Large Data Bases* (VLDB), September 2000. 1. The PostGIS Development Group: â[PostGIS 2.1.2dev Manual](http://postgis.net/docs/manual-2.1/),â *postgis.net*, 2014. 1. Robert Escriva, Bernard Wong, and Emin Gün Sirer: â[HyperDex: A Distributed, Searchable Key-Value Store](http://www.cs.princeton.edu/courses/archive/fall13/cos518/papers/hyperdex.pdf),â at *ACM SIGCOMM Conference*, August 2012. [doi:10.1145/2377677.2377681](http://dx.doi.org/10.1145/2377677.2377681) 1. Michael McCandless: â[Lucene's FuzzyQuery Is 100 Times Faster in 4.0](http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html),â *blog.mikemccandless.com*, March 24, 2011. 1. Steffen Heinz, Justin Zobel, and Hugh E. Williams: â[Burst Tries: A Fast, Efficient Data Structure for String Keys](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499),â *ACM Transactions on Information Systems*, volume 20, number 2, pages 192â223, April 2002. [doi:10.1145/506309.506312](http://dx.doi.org/10.1145/506309.506312) 1. Klaus U. Schulz and Stoyan Mihov: â[Fast String Correction with Levenshtein Automata](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652),â *International Journal on Document Analysis and Recognition*, volume 5, number 1, pages 67â85, November 2002. [doi:10.1007/s10032-002-0082-8](http://dx.doi.org/10.1007/s10032-002-0082-8) 1. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: [*Introduction to Information Retrieval*](http://nlp.stanford.edu/IR-book/). Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at *nlp.stanford.edu/IR-book* 1. Michael Stonebraker, Samuel Madden, Daniel J. Abadi, et al.: â[The End of an Architectural Era (Itâs Time for a Complete Rewrite)](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.3697&rep=rep1&type=pdf),â at *33rd International Conference on Very Large Data Bases* (VLDB), September 2007. 1. â[VoltDB Technical Overview White Paper](https://www.voltdb.com/wptechnicaloverview),â VoltDB, 2014. 1. Stephen M. Rumble, Ankita Kejriwal, and John K. Ousterhout: â[Log-Structured Memory for DRAM-Based Storage](https://www.usenix.org/system/files/conference/fast14/fast14-paper_rumble.pdf),â at *12th USENIX Conference on File and Storage Technologies* (FAST), February 2014. 1. Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker: â[OLTP Through the Looking Glass, and What We Found There](http://hstore.cs.brown.edu/papers/hstore-lookingglass.pdf),â at *ACM International Conference on Management of Data* (SIGMOD), June 2008. [doi:10.1145/1376616.1376713](http://dx.doi.org/10.1145/1376616.1376713) 1. Justin DeBrabant, Andrew Pavlo, Stephen Tu, et al.: â[Anti-Caching: A New Approach to Database Management System Architecture](http://www.vldb.org/pvldb/vol6/p1942-debrabant.pdf),â *Proceedings of the VLDB Endowment*, volume 6, number 14, pages 1942â1953, September 2013. 1. Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor: â[Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems](http://www.pdl.cmu.edu/PDL-FTP/NVM/storage.pdf),â at *ACM International Conference on Management of Data* (SIGMOD), June 2015. [doi:10.1145/2723372.2749441](http://dx.doi.org/10.1145/2723372.2749441) 1. Edgar F. Codd, S. B. Codd, and C. T. Salley: â[Providing OLAP to User-Analysts: An IT Mandate](http://www.minet.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/Cod93.pdf),â E. F. Codd Associates, 1993. 1. Surajit Chaudhuri and Umeshwar Dayal: â[An Overview of Data Warehousing and OLAP Technology](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/sigrecord.pdf),â *ACM SIGMOD Record*, volume 26, number 1, pages 65â74, March 1997. [doi:10.1145/248603.248616](http://dx.doi.org/10.1145/248603.248616) 1. Per-à ke Larson, Cipri Clinciu, Campbell Fraser, et al.: â[Enhancements to SQL Server Column Stores](http://research.microsoft.com/pubs/193599/Apollo3%20-%20Sigmod%202013%20-%20final.pdf),â at *ACM International Conference on Management of Data* (SIGMOD), June 2013. 1. Franz Färber, Norman May, Wolfgang Lehner, et al.: â[The SAP HANA Database â An Architecture Overview](http://sites.computer.org/debull/A12mar/hana.pdf),â *IEEE Data Engineering Bulletin*, volume 35, number 1, pages 28â33, March 2012. 1. Michael Stonebraker: â[The Traditional RDBMS Wisdom Is (Almost Certainly) All Wrong](http://slideshot.epfl.ch/talks/166),â presentation at *EPFL*, May 2013. 1. Daniel J. Abadi: â[Classifying the SQL-on-Hadoop Solutions](https://web.archive.org/web/20150622074951/http://hadapt.com/blog/2013/10/02/classifying-the-sql-on-hadoop-solutions/),â *hadapt.com*, October 2, 2013. 1. Marcel Kornacker, Alexander Behm, Victor Bittorf, et al.: â[Impala: A Modern, Open-Source SQL Engine for Hadoop](http://pandis.net/resources/cidr15impala.pdf),â at *7th Biennial Conference on Innovative Data Systems Research* (CIDR), January 2015. 1. Sergey Melnik, Andrey Gubarev, Jing Jing Long, et al.: â[Dremel: Interactive Analysis of Web-Scale Datasets](http://research.google.com/pubs/pub36632.html),â at *36th International Conference on Very Large Data Bases* (VLDB), pages 330â339, September 2010. 1. Ralph Kimball and Margy Ross: *The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling*, 3rd edition. John Wiley & Sons, July 2013. ISBN: 978-1-118-53080-1 1. Derrick Harris: â[Why Apple, eBay, and Walmart Have Some of the Biggest Data Warehouses Youâve Ever Seen](http://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehouses-youve-ever-seen/),â *gigaom.com*, March 27, 2013. 1. Julien Le Dem: â[Dremel Made Simple with Parquet](https://blog.twitter.com/2013/dremel-made-simple-with-parquet),â *blog.twitter.com*, September 11, 2013. 1. Daniel J. Abadi, Peter Boncz, Stavros Harizopoulos, et al.: â[The Design and Implementation of Modern Column-Oriented Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-column-stores.pdf),â *Foundations and Trends in Databases*, volume 5, number 3, pages 197â280, December 2013. [doi:10.1561/1900000024](http://dx.doi.org/10.1561/1900000024) 1. Peter Boncz, Marcin Zukowski, and Niels Nes: â[MonetDB/X100: Hyper-Pipelining Query Execution](http://www.cidrdb.org/cidr2005/papers/P19.pdf),â at *2nd Biennial Conference on Innovative Data Systems Research* (CIDR), January 2005. 1. Jingren Zhou and Kenneth A. Ross: â[Implementing Database Operations Using SIMD Instructions](http://www1.cs.columbia.edu/~kar/pubsk/simd.pdf),â at *ACM International Conference on Management of Data* (SIGMOD), pages 145â156, June 2002. [doi:10.1145/564691.564709](http://dx.doi.org/10.1145/564691.564709) 1. Michael Stonebraker, Daniel J. Abadi, Adam Batkin, et al.: â[C-Store: A Column-oriented DBMS](http://www.vldb2005.org/program/paper/thu/p553-stonebraker.pdf),â at *31st International Conference on Very Large Data Bases* (VLDB), pages 553â564, September 2005. 1. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, et al.: â[The Vertica Analytic Database: C-Store 7 Years Later](http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf),â *Proceedings of the VLDB Endowment*, volume 5, number 12, pages 1790â1801, August 2012. 1. Julien Le Dem and Nong Li: â[Efficient Data Storage for Analytics with Apache Parquet 2.0](http://www.slideshare.net/julienledem/th-210pledem),â at *Hadoop Summit*, San Jose, June 2014. 1. Jim Gray, Surajit Chaudhuri, Adam Bosworth, et al.: â[Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals](http://arxiv.org/pdf/cs/0701155.pdf),â *Data Mining and Knowledge Discovery*, volume 1, number 1, pages 29â53, March 2007. [doi:10.1023/A:1009726021843](http://dx.doi.org/10.1023/A:1009726021843) ------ | ä¸ä¸ç« | ç®å½ | ä¸ä¸ç« | | ------------------------------------ | ------------------------------- | ---------------------------- | | [第äºç« ï¼æ°æ®æ¨¡åä¸æ¥è¯¢è¯è¨](ch2.md) | [设计æ°æ®å¯éååºç¨](README.md) | [第åç« ï¼ç¼ç ä¸æ¼å](ch4.md) |