è«æè¦ç´ï¼The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Maâ Hongyu Wangâ Lingxiao Ma Lei Wang Wenhui Wang
Shaohan Huang Li Dong Ruiping Wang Jilong Xue Furu WeiâÂ
ãã®ãã¼ã¸ã®å³é¢ã»è¡¨ã®æ¨©å©ã¯å ¨ã¦è«æã®èè ãã«å¸°å±ãããã¾ãã
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- ãã®è«æãä¸è¡ã§ããã¨
- Abstract
- The Era of 1-bit LLMsÂ
- BitNet b1.58
- Result
- Energy
- Discussion and Future Work
- èªå¥èª¬æ
Â
ãã®è«æãä¸è¡ã§ããã¨
BitNet b1.58: ã¡ã¢ãªã¨ã¨ãã«ã®ã¼ãåæ¸ãããã©ã¼ãã³ã¹ãç¶æãè¨èªã¢ãã«ã®æ°æ代ã¸
Â
Abstract
-
ç®çï¼æ°ä¸ä»£ã®1ããã大è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã§ããBitNet b1.58â»ï¼ãç´¹ä»ããé«æ§è½ãã¤ã³ã¹ãå¹çã®è¯ãLLMã®éçºã«åããæ°ããªã¹ã±ã¼ãªã³ã°æ³åã¨è¨ç·´ã¬ã·ããå®ç¾©ãããã¨ã
-
ææ³ï¼BitNet b1.58ã¯ãLLMã®åãã©ã¡ã¼ã¿ï¼ã¾ãã¯éã¿ï¼ã{-1, 0, 1}ã®ã¿ã§è¡¨ç¾ãã1ãããLLMãéçºã
-
çµæï¼BitNet b1.58ã¯ãåãã¢ãã«ãµã¤ãºã¨ãã¬ã¼ãã³ã°ãã¼ã¯ã³ãç¨ããå®å ¨ç²¾åº¦Transformer LLMã¨æ¯è¼ãã¦ãå°æ度ã¨ã¨ã³ãã¿ã¹ã¯ã§åçã®æ§è½ã示ããªãããã¬ã¤ãã³ã·ãã¡ã¢ãªãã¹ã«ã¼ããããã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§é¡èã«ã³ã¹ãå¹çãè¯ããã¨ã示ãããã
- çµè«ï¼1.58ãããLLMã¯ãé«æ§è½ãã¤ã³ã¹ãå¹çã®è¯ãæ°ä¸ä»£ã®LLMãè¨ç·´ããããã®æ°ããªã¹ã±ã¼ãªã³ã°æ³åã¨ã¬ã·ããæä¾ãã1ãããLLMã«æé©åãããç¹å®ã®ãã¼ãã¦ã§ã¢ã®è¨è¨ã«åããæ°ããªæ代ãç¯ãã
The Era of 1-bit LLMsÂ
- è¿å¹´ãAIåéã§ã¯å¤§è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã®ãµã¤ãºã¨è½åãæ¥éã«æé·ããå¤æ§ãªèªç¶è¨èªå¦çã¿ã¹ã¯ã§é¡èãªæ§è½ã示ãã¦ãããããã®ãµã¤ãºã®å¢å ã¯å±éã«ããã課é¡ãçããããé«ãã¨ãã«ã®ã¼æ¶è²»ã«ããç°å¢ããã³çµæ¸ã¸ã®å½±é¿ã«å¯¾ããæ¸å¿µãå¼ãèµ·ããã¦ããã
- ãããã®èª²é¡ã«å¯¾å¦ããä¸ã¤ã®ã¢ããã¼ãã¯ããã¹ããã¬ã¼ãã³ã°éååãç¨ãã¦æ¨è«ã®ããã®ä½ãããã¢ãã«ãä½æãããã¨ã§ãããããã«ããéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®ç²¾åº¦ãä¸ããLLMã®ã¡ã¢ãªã¨è¨ç®è¦æ±ãå¤§å¹ ã«åæ¸ããã
- BitNetãã¯ããã¨ãã1ãããã¢ãã«ã¢ã¼ããã¯ãã£ã®æè¿ã®ç 究ã¯ãæ§è½ãç¶æãã¤ã¤LLMã®ã³ã¹ããåæ¸ããææãªæ¹åæ§ã示ãã¦ãããBitNetã®è¡åä¹ç®ã§ã¯æ´æ°å ç®ã®ã¿ãè¡ããLLMã®ã¨ãã«ã®ã¼ã³ã¹ããå¤§å¹ ã«ç¯ç´ããã
- ãã®ç 究ã§ã¯ãåãã©ã¡ã¼ã¿ãä¸å¤{-1, 0, 1}ãåã1ãããLLMã®ããªã¢ã³ãã§ããBitNet b1.58ãç´¹ä»ããããã«ããã¡ã¢ãªæ¶è²»ãã¹ã«ã¼ãããâ»2ãã¬ã¤ãã³ã·â»3ã®é¢ã§FP16 LLMãã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦å¤§å¹ ã«å¹ççã§ãããã¨ãããã«ã¯ç¹å¾´ãã£ã«ã¿ãªã³ã°ãå¯è½ã«ãã0ã®å°å ¥ã«ãã1ãããLLMã®æ§è½ãå¤§å¹ ã«åä¸ãããªã©ã®è¿½å çãªå©ç¹ã示ãã
BitNet b1.58
- BitNet b1.58ã¯ãnn.LinearãBitLinearã«ç½®ãæããTransformerã§ããBitNetã¢ã¼ããã¯ãã£ã«åºã¥ãã¦ããã1.58ãããã®éã¿ã¨8ãããã®ã¢ã¯ãã£ãã¼ã·ã§ã³ã§ã¼ãããè¨ç·´ãããã
- éã¿ã-1ã0ã+1ã«å¶éããããã«ãabsmeanéååé¢æ°ãæ¡ç¨ãã¦ãããããã¯ãéã¿è¡åããã®å¹³å絶対å¤Î³ã§ã¹ã±ã¼ãªã³ã°ãã次ã«åå¤ã{-1, 0, +1}ã®ä¸ã§æãè¿ãæ´æ°ã«ä¸¸ããï¼Round Clip)ã
- ã¢ã¯ãã£ãã¼ã·ã§ã³ã®éååé¢æ°ã¯BitNetã¨åæ§ã«å®è£ ããã¦ããããéç·å½¢é¢æ°ã®åã«ã¢ã¯ãã£ãã¼ã·ã§ã³ã[0, Qb]ã®ç¯å²ã«ã¹ã±ã¼ãªã³ã°ããã®ã§ã¯ãªãããã¼ã¯ã³ãã¨ã«[âQb, Qb]ã«ã¹ã±ã¼ãªã³ã°ãã¦ã¼ããã¤ã³ãéååãæé¤ããã
LLaMA-alike Components.
- BitNet b1.58ã®ã¢ã¼ããã¯ãã£ã¯ããªã¼ãã³ã½ã¼ã¹ã®LLMã®ããã¡ã¯ãã¹ã¿ã³ãã¼ãã§ããLLaMAã®ã³ã³ãã¼ãã³ããæ¡ç¨ãã¦ãããRMSNormãSwiGLUããã¼ã¿ãªã¼ã¨ã³ããã£ã³ã°ã使ç¨ãããã¹ã¦ã®ãã¤ã¢ã¹ãåãé¤ãã¦ãããããã«ãããBitNet b1.58ã¯ãHuggingfaceãvLLMãllama.cppãªã©ã®äººæ°ã®ãããªã¼ãã³ã½ã¼ã¹ã½ããã¦ã§ã¢ã«æå°éã®åªåã§çµ±åã§ããã
Result
- BitNet b1.58ã¨åç¾ããFP16 LLaMA LLMãæ§ã ãªãµã¤ãºã§æ¯è¼ããRedPajamaãã¼ã¿ã»ããã§1000åãã¼ã¯ã³ã«å¯¾ãã¦äºåè¨ç·´ãè¡ããå ¬å¹³ãªæ¯è¼ãå®æ½ã
- è¨èªã¿ã¹ã¯ã®ç¯å²ã«ãããã¼ãã·ã§ããæ§è½ãè©ä¾¡ããWikiText2ã¨C4ãã¼ã¿ã»ããã®æ¤è¨¼å°æ度ãå ±åããã
- BitNet b1.58ã¯ã3Bã¢ãã«ãµã¤ãºã§å®å ¨ç²¾åº¦ã®LLaMA LLMã¨å°æ度ã®é¢ã§ä¸è´ãã2.71åéããGPUã¡ã¢ãªã3.55åå°ãªã使ç¨ããã
- BitNet b1.58 3.9Bã¯ãLLaMA LLM 3Bãããé¡èã«åªãã¦ããã2.4åéããã¡ã¢ãªæ¶è²»ã¯3.32åå°ãªãããã¨ã³ãã¿ã¹ã¯ã®ç²¾åº¦ã§ã¯ä¸è´ã¾ãã¯ãããä¸åãæ§è½ã示ãã
Â
- ãããã®çµæã¯ãBitNet b1.58ãç¾è¡ã®æå 端LLMã¢ãã«ã«å¯¾ãã¦ãã¬ã¼ãæ¹åï¼æªããªãã¨ããã®ãªãæ¹åï¼ãå®ç¾ãã¦ãããã¨ã示ãã¦ããã
Memory and Latency
-
ã¢ãã«ãµã¤ãºã7Bã13Bã70Bã«æ¡å¤§ããã³ã¹ããè©ä¾¡ããçµæãã¢ãã«ãµã¤ãºãã¹ã±ã¼ã«ããã«ã¤ãã¦ãé度åä¸ãå¢å ããç¹ã«BitNet b1.58 70Bã¯LLaMA LLMãã¼ã¹ã©ã¤ã³ããã4.1åéãã
-
ã¡ã¢ãªæ¶è²»ãåæ§ã®å¾åã示ãã大ããªã¢ãã«ã»ã©ã¡ã¢ãªå¹çãè¯ããªããembedding layerãå®å ¨ç²¾åº¦ã®ã¾ã¾ã ãã大ããªã¢ãã«ã»ã©ãã¢ãã«å ¨ä½ã«å¯¾ãããembedding layerã®å²åãå°ãããªãããã§ããã両æ¹ã®ã¬ã¤ãã³ã·ã¨ã¡ã¢ãªã¯2ãããã«ã¼ãã«ã§æ¸¬å®ããã¦ãããã³ã¹ããããã«åæ¸ããããã®æé©åã®ä½å°ãããã
Energy
-
BitNet b1.58ã¯è¡åä¹ç®ã«ãããç®è¡æ¼ç®ã¨ãã«ã®ã¼æ¶è²»ã71.4ååæ¸ããã¢ãã«ãµã¤ãºãã¹ã±ã¼ã«ããã«ã¤ãã¦FP16 LLaMA LLMãã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦ã¨ãã«ã®ã¼æ¶è²»ã®å¹çãåä¸ããã
ThroughputÂ
- BitNet b1.58 70Bã¯LLaMA LLMã¨æ¯è¼ãã¦æ大11åã®ããããµã¤ãºããµãã¼ãã§ãã8.9åé«ãã¹ã«ã¼ããããå®ç¾ããã
Â
- BitNet b1.58ã¯ãã¢ãã«ã®æ§è½ã¨æ¨è«ã³ã¹ãã«é¢ããæ°ããã¹ã±ã¼ãªã³ã°æ³åãå¯è½ã«ãã¦ãããç°ãªãã¢ãã«ãµã¤ãºéã§ã®ç価æ§ã以ä¸ã®ããã«æä¾ããã
- 13B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã3B FP16 LLMãããå¹ççã§ããã
- 30B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã7B FP16 LLMãããå¹ççã§ããã
- 70B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã13B FP16 LLMãããå¹ççã§ããã
Training with 2T Tokens
- 2Tãã¼ã¯ã³ã§ã®è¨ç·´ã§ã¯ãBitNet b1.58ãStableLM-3Bã®ãã¼ã¿ã¬ã·ãã«å¾ã£ã¦2Tãã¼ã¯ã³ã§è¨ç·´ããWinograndeãPIQAãSciQãLAMBADAãARC-easyã§æ§æããããã³ããã¼ã¯ã§è©ä¾¡ããã
- BitNet b1.58ã¯ããã¹ã¦ã®ã¨ã³ãã¿ã¹ã¯ã§åªããæ§è½ãéæãã1.58ãããLLMãå¼·åãªä¸è¬åè½åãæã£ã¦ãããã¨ã示ãã¦ããã
Discussion and Future Work
1-bit Mixture-of-Experts (MoE) LLMs
- Mixture-of-Expertï¼MoEï¼LLMã¯ãè¨ç®FLOPsãå¤§å¹ ã«åæ¸ãã¤ã¤ãé«ãã¡ã¢ãªæ¶è²»ã¨ãããééä¿¡ã®ãªã¼ãã¼ããããå±éã¨ã¢ããªã±ã¼ã·ã§ã³ãå¶éãããããããã®èª²é¡ã¯1.58ãããLLMã«ãã£ã¦è§£æ±ºå¯è½ã§ãããããã«ãããMoEã¢ãã«ãå±éããããã«å¿ è¦ãªããã¤ã¹æ°ãæ¸å°ãããããã¯ã¼ã¯ãä»ãã¦ã¢ã¯ãã£ãã¼ã·ã§ã³ã転éãããªã¼ãã¼ããããå¤§å¹ ã«åæ¸ãããã
Native Support of Long Sequence in LLMs
- é·ãã·ã¼ã±ã³ã¹ã®ãã¤ãã£ããµãã¼ãã¯ãKVãã£ãã·ã¥â»4ã«ããã¡ã¢ãªæ¶è²»ãé·ãã·ã¼ã±ã³ã¹æ¨è«ã®ä¸»ãªèª²é¡ã§ããããBitNet b1.58ã¯16ããããã8ãããã¸ã®ã¢ã¯ãã£ãã¼ã·ã§ã³ã®åæ¸ã«ãããåããªã½ã¼ã¹ã§ã³ã³ããã¹ãã®é·ãã2åã«ãããã¨ã§ãé·ãã·ã¼ã±ã³ã¹ã®ãµãã¼ãã«åããéè¦ãªã¹ãããã表ãã
LLMs on Edge and Mobile
- 1.58ãããLLMã®ä½¿ç¨ã¯ãã¡ã¢ãªã¨è¨ç®è½åã«å¶éãããã¨ãã¸ããã³ã¢ãã¤ã«ããã¤ã¹ä¸ã§ã®è¨èªã¢ãã«ã®æ§è½ãå¤§å¹ ã«åä¸ãããå¯è½æ§ããããããã«ããããã¾ã§ä¸å¯è½ã ã£ãã¢ããªã±ã¼ã·ã§ã³ãå¯è½ã«ãªããã¨ãã¸ããã³ã¢ãã¤ã«ããã¤ã¹ã®è½åãå¤§å¹ ã«åä¸ããã
New Hardware for 1-bit LLMs
- 1ãããLLMç¨ã®æ°ãããã¼ãã¦ã§ã¢ã«ã¤ãã¦ã¯ãGroqã®ãããªæè¿ã®ç 究ãLLMç¨ã®ç¹å®ãã¼ãã¦ã§ã¢ï¼ä¾ãã°ãLPUï¼ã®æ§ç¯ã«ããã¦ææãªçµæã¨å¤§ããªå¯è½æ§ã示ãã¦ãããBitNetãå¯è½ã«ããæ°ããè¨ç®ãã©ãã¤ã ã«ç¹åãã¦æé©åãããæ°ãããã¼ãã¦ã§ã¢ã¨ã·ã¹ãã ã®è¨è¨ã«åããè¡åãå¼ã³ãããã
Â
ãã®è«æã®ç¤ã¨ãªãBitNetã®è¦ç´
reseachpaper-matome.hatenablog.com
Â
èªå¥èª¬æ
â»1 ãªã1.58? ã»ã»ã»ï½1ï¼0ã-1ï½ã®å¤ããããã1/3ã§åºç¾ããå ´åã®å¹³åæ å ±éã1.58
â»2 ã¹ã«ã¼ãããã»ã»ã»åä½æéãããã«å¦çã¾ãã¯ä¼éã§ãããã¼ã¿ã®é
â»3 ã¬ã¤ãã³ã·ã»ã»ã»ããã·ã¹ãã ããããã¯ã¼ã¯å ã§å¦çããã¼ã¿ãä¼éãããã®ã«è¦ããæéé 延ã®ãã¨
â»4 KVãã£ãã·ã¥ã»ã»ã»Key-Valueï¼ãã¼-å¤ï¼ãã£ãã·ã¥ã®ç¥ã§ããã¼ã¿ããã¼ã¨å¤ã®ãã¢ã¨ãã¦ä¿åããä¸ç¨®ã®ãã¼ã¿ã¹ãã¬ã¼ã¸ã¾ãã¯ãã£ãã·ã¥ã¡ã«ããºã