é ããã«ã³ãã¢ãã«ã«ã¤ãã¦
é ããã«ã³ãã¢ãã«
ç³»åãã¼ã¿ã«å¯¾ãã¦ã次æ°ããã¤ãã«ã³ãæ§ã®ä»®å®ã«å¶éãããããªããã¤èªç±ãã©ã¡ã¼ã¿ã®æ°ãå¶éãããã¢ãã«ãä½ããã¨ãèãããããã¯æ½å¨å¤æ°ãå°å ¥ãããã¨ã§ãå®ç¾ããããå³ã®ããã«ãã«ã³ãé£éãæ§æããã®ãæ½å¨å¤æ°ã§ããã¨ä»®å®ãããã¨ã§ç¶æ 空éã¢ãã«ã¨å¼ã°ããã°ã©ãæ§é ãå¾ãããã
ãã®ã¢ãã«ã®åæåå¸ã¯ä»¥ä¸ã§ä¸ããããã
ããæ½å¨å¤æ°ãé¢æ£å¤æ°ã§ããå ´åããã®ã¢ãã«ãé ããã«ã³ãã¢ãã«ã¨å¼ã¶ã
æ½å¨å¤æ°ã¯éå»ã®æ å ±ããè¦ç´ããã¦ããããã®æ å ±ãå ã«ãã¦æ¬¡ã®ç¶æ ã®é·ç§»ãäºæ¸¬ãè¡ããããããäºæ¸¬ã¯ãã¹ã¦ã®éå»ã®è¦³æ¸¬å¤ã«ä¾åãããä¾ãã°ã天æ°ã®ä¾ã§ä¾ããã¨ãæ°è±¡ç¶æ ï¼é«æ°å§ãä½æ°å§ãªã©ï¼ãæ½å¨å¤æ°ã¨ããããç´æ¥çãªè¦³æ¸¬ãã¼ã¿ï¼ä¾ï¼é£ç¶ããæ´ãã®æ¥ï¼ãããéæ¥çã«ãé ãç¶æ ï¼é«æ°å§ï¼ã®é·ç§»ãã¿ã¼ã³ãæ¨å®ãããã®æ å ±ãå ã«ææ¥ã®å¤©æ°ãäºæ¸¬ããã
ããã§æ½å¨å¤æ°$z$ã¯1対K符å·åæ³ï¼é«æ°å§ãä½æ°å§ã®ç¶æ ãããå ´åãé«æ°å§[1,0] ä½æ°å§[0,1]ã¨è¡¨ãæ¹æ³ï¼ã«ããK次å ã®2å¤å¤æ°ã§è¡¨ããã¨ã«ãããæå»ï½ã«ãããæ½å¨å¤æ°$z_n$ã®ç¶æ ã¯ããã®1ã¤åã®æå»ã®ç¶æ $z_{n-1}$ã«ä¾åããããã®ç¶æ ã®é·ç§»ã表ãæ¡ä»¶ä»ãåå¸ã¯é·ç§»ç¢ºçï¼transition probability)è¡å$A$ã§è¡¨ãããã
æå»$n-1$ã§$j$ã®ç¶æ ãããæå»$n$ã§$k$ã®ç¶æ ã«ãªãé·ç§»ç¢ºçã¯$A_{jk}\equiv p(z_{n,k}=1|z_{n-1,j}=1)$ã§å®ç¾©ããããé·ç§»ç¢ºçè¡åAã¯$KÃK$ã®è¡åã¨ãªããã$\sum_k A_{jk}=1$ãªã®ã§ããã©ã¡ã¼ã¿ã®æ°ã¯$Kï¼Kï¼1ï¼$ã¨ãªãã
é·ç§»ç¢ºçè¡åãç¨ãã¦ãæ¡ä»¶ä»ãåå¸ã¯ä»¥ä¸ã®å½¢ã§ãããã
$$ p(z_n | z_{n-1}, A) = \prod_{k=1}^{K} \prod_{j=1}^{K} A_{jk}^{z_{n-1,j},z_{n,k}} $$
æåã®æ½å¨ãã¼ã$z_1$ã¯ããã®åã®æå»ãæããªãã®ã§ããã®åå¸ã¯åæç¶æ åå¸$\pi$ã«ãã£ã¦ä¸ããããã
$$ p(z_1 | \pi) = \prod_{k=1}^{K} \pi_k^{z_{1k}} $$
Ïã®è¦ç´ ã®åè¨ã¯1ã§ããã K=3ã®æã®ç¶æ é·ç§»ã表ãå³ã¯ä»¥ä¸ã®ããã«ãªãã 確çã¢ãã«ãæå®ããããã観測å¤æ°ã®æ¡ä»¶ä»ã確çåå¸$p(x_n|z_n, \phi)$ãå®ç¾©ãããããã§$\phi$ã¯åå¸ãæ¯é ãããã©ã¡ã¼ã¿ã®éåã¨ãªããåºå確çï¼emission probability)ã¨å¼ã°ãããåºå確çã¯ä»¥ä¸ã®å½¢å¼ã§è¡¨ãããã
$$ p(x_n | z_n, \phi) = \sum_{k=1}^{K} p(x_n | \phi_k) z_{nk} $$
ãã®ã¨ãæ½å¨å¤æ°ãæ¯é ãããã¹ã¦ã®æ¡ä»¶ä»ãåå¸ãåãé·ç§»ç¢ºçè¡åAãå ±æãããã¹ã¦ã®åºååå¸ãåä¸ã®ãã©ã¡ã¼ã¿\phiãå ±æãã¦ããã¨ãããåä¸ãªã¢ãã«ãèããã¨ãæ½å¨å¤æ°ã¨è¦³æ¸¬å¤æ°ã®åæåå¸ã¯ä»¥ä¸ã®ããã«ãªãã HMMã®ç®çã¯è¦³æ¸¬çµæ$ X=x_1,\ldots,x_N$ããæªç¥ã®ãã©ã¡ã¼ã¿$\thetaï¼{Ï,A,Ï}$ãæé©åãããã¨ã§ããã尤度é¢æ°ã¯åæåå¸ã®å¼ãæ½å¨å¤æ°ã«ã¤ãã¦å¨è¾ºåãããã¨ã§å¾ãããã $$ p\left(X\middle|\theta\right)=\sum_{Z}{p\left(X,Z\middle|\theta\right)} $$ ãã®å°¤åº¦é¢æ°ã®æ大åã«ã¯EMã¢ã«ã´ãªãºã ãç¨ãããã¨ã«ãªããããã«ã¤ãã¦ã¯ä»å¾è¨äºãä½æäºå®ã
åã®è¨äºããã«ã³ãã¢ãã«ã«ã¤ãã¦
å³é¢ã¯ä»¥ä¸ããå¼ç¨ã https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
ãã«ã³ãã¢ãã«ã«ã¤ãã¦
ãã«ã³ãã¢ãã«
æç³»å${x_1, \ldots, x_n}$ã®ãããªç³»åãã¼ã¿ãæãç°¡åã«æ±ãæ¹æ³ã¯ãç³»åã§ããã¨ããæ§è³ªãç¡è¦ãã¦ãããã観測å¤ãç¬ç«ååå¸ã«å¾ããã®ã¨ãã¦æ±ããã¨ã§ããï¼ä¸å³ï¼ããããããã®æ¹æ³ã¯ããã¼ã¿ã®é åºã«é¢ä¿ãããã¿ã¼ã³ãæãããã¨ãã§ããªãã
ä¾ãã°ãææ¥ãé¨ãéãããç¥ãããå ´åãããã¾ã§ã«1000æ¥éã®è¦³æ¸¬ãã¼ã¿ãããããã¡100æ¥é¨ãéã£ã¦ããã¨ãããããã観測ãã¼ã¿ãç¬ç«ååå¸ã«å¾ãã¨ããã¨ã100/1000 ãã¤ã¾ãã¯1/10ã¨ããé »åº¦ãææ¥ã®é¨ã®éã確çã¨ãã¦äºæ¸¬ãããã ãããããããå®éã«ã¯é¨ã¯é£ç¶ãã¦éããã¨ãå¤ããä»æ¥ãé¨ãéã£ããã©ãããç¥ããã¨ã¯ãææ¥ãé¨ãéããäºæ¸¬ããããã«å½¹ç«ã¤ã
ãã®ãããªãã¨ã確çã¢ãã«ã§è¡¨ç¾ããããã®æ¹æ³ã¨ãã¦ãã«ã³ãã¢ãã«ï¼Markov modelï¼ãèãããããNåã®è¦³æ¸¬ç³»åã®åæåå¸ã¯ä»¥ä¸ã®å½¢ã§è¡¨ç¾ã§ãã
$$ \begin{split} p(x_1, \ldots, x_N) = & p(x_1) p(x_2|x_1)p(x_3|x_1,x_2)â¦p(x_n | x_1, \ldots, x_{n-1})\\=&p(x_1)\prod_{n=2}^{N} p(x_n | x_1, \ldots, x_{n-1}) \end{split} $$
ããã§ã$ p(x_n | x_1, \ldots, x_{n-1})$ ã¯ããè¦³æ¸¬å¤ $x_n$ã¯$x_1, \ldots, x_{n-1}$ã«ãã£ã¦æ¡ä»¶ä»ãããã¦ãããã¨ã表ãã
ææ¥ã®å¤©æ°ãäºæ¸¬ããéã«ãä»æ¥ã®å¤©æ°ã®æ å ±ã®ã¿ãå½±é¿ããå ´åãã¤ã¾ãæãè¿ã観測å¤ä»¥å¤ã®ãã¹ã¦ã®éå»ã®è¦³æ¸¬å¤ãç¬ç«ããäºæ¸¬ã«å½±é¿ãä¸ããªãã¨ããã¨ãNåã®è¦³æ¸¬ç³»åã®åæåå¸ã¯ä»¥ä¸ã®ããã«ãªãã
$$ \begin{split} p(x_1, \ldots, x_N) =& p(x_1) p(x_2|x_1)p(x_3|x_2)â¦p(x_n | x_{n-1})\\=& p(x_1)\prod_{n=2}^{N} p(x_n |x_{n-1}) \end{split} $$
ãã®å ´åããã観測å¤$x_n$ã¯$x_{n-1}$ã«ã®ã¿æ¡ä»¶ã¥ãããã¦ããã以ä¸ã®ãããªã°ã©ãã£ã«ã«ã¢ãã«ã§å³ç¤ºãããã
ã»ã¨ãã©ã®ãã«ã³ãã¢ãã«ã®å¿ç¨ã«ããã¦$p(x_n |x_{n-1})$ãã¿ãªåä¸ã§ããã¨ããå¶ç´ã課ããã¦ããããã¨ãã°ãä»æ¥ãé¨ã®å ´åãææ¥ã®é¨ã®ç¢ºçã10ï¼ ä¸ããã¨ãã£ãæ¡ä»¶ãããã¨ããããããã1å¹´ãéãã¦ãã£ã¨æãç«ã¤ã¨ä»®å®ãã¦ããã¨ãããã¨ã ãããããã¢ãã«ãåä¸ãã«ã³ãé£éï¼homogeneous Markov chain)ã¨å¼ã¶ã å®éã¯ãæ¢ é¨ã®ææã®æ¹ãä»æ¥ã®å¤©æ°ãææ¥ã®å¤©æ°ã«ä¸ããå½±é¿ã大ããã®ããããããæ¡ä»¶ä»ã確çã¯åä¸ã§ã¯ãªããããããªãããã²ã¨ã¾ãåä¸ãã«ã³ãé£éãä»®å®ããã¢ãã«ãå¤ãã
ããéå»ã®æ å ±ãäºæ¸¬ã«å©ç¨ããä¾ã¨ãã¦ãæ¨æ¥ãä»æ¥ã¨ï¼æ¥åã®æ å ±ãææ¥ã®å¤©æ°ã®äºæ¸¬ã«ç¨ããã¨ããããã®å ´åãNåã®è¦³æ¸¬ç³»åã®åæåå¸ã¯ä»¥ä¸ã®ããã«ãªãã
$$ \begin{split} p(x_1, \ldots, x_N) =& p(x_1) p(x_2|x_1)p(x_3|x_2, x_1)â¦p(x_n | x_{n-1}, x_{n-2})\\=& p(x_1)p(x_2|x_1)\prod_{n=3}^{N} p(x_n |x_{n-1}, x_{n-2}) \end{split} $$
ãã®ã¢ãã«ã2次ãã«ã³ãé£éã¨å¼ã³ãã°ã©ãã£ã«ã«ã¢ãã«ã¯ä»¥ä¸ã®ããã«å³ç¤ºãããã
åæ§ã«Mæ¥åã®å¤©æ°æ å ±ãäºæ¸¬ã«ç¨ãããã¨ãã§ããM次ã®ãã«ã³ãé£éã«æ¡å¼µãããã¨ãã§ãããéå»ã®æ å ±ãå¤ãåãå ¥ãããã¨ã§ãäºæ¸¬ç²¾åº¦ãåä¸ãããå¯è½æ§ãããããä¸æ¹ã§ã¢ãã«ã®ãã©ã¡ã¼ã¿æ°ãææ°çã«å¢å¤§ããã¢ãã«ãè¤éã«ãªããããå¯è½æ§ãããã
å³é¢ã¯ä»¥ä¸ããå¼ç¨ã https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
次ã®è¨äºãé ããã«ã³ãã¢ãã«ã«ã¤ãã¦
è«æè¦ç´ï¼BitNet: Scaling 1-bit Transformers for Large Language Models
BitNet: Scaling 1-bit Transformers forLarge Language Models
Hongyu Wangâ â¡ Shuming Maâ Li Dongâ Shaohan Huangâ Huaijie Wang§ Lingxiao Maâ Fan Yangâ Ruiping Wangâ¡ Yi Wu§ Furu Weiâ â
â Microsoft Research â¡ University of Chinese Academy of Sciences § Tsinghua University arxiv.org
Abstract
-
ç®çï¼å¤§è¦æ¨¡è¨èªã¢ãã«ã®å±éã«ããã課é¡ã¨ãé«ã¨ãã«ã®ã¼æ¶è²»ã«ããç°å¢ã¸ã®å½±é¿ã«å¯¾å¦ãããããã¹ã±ã¼ã©ãã«ã§å®å®ãã1ãããTransformerã¢ã¼ããã¯ãã£ã§ããBitNetãå°å ¥ãããã¨ã
- ææ³ï¼nn.Linear層ã®ä»£ããã«BitLinearãå°å ¥ãã1ãããéã¿ãã¼ãããè¨ç·´ãããã¨ã§ã大è¦æ¨¡è¨èªã¢ãã«ç¨ã«è¨è¨ãããBitNetãéçºã
- çµæï¼è¨èªã¢ããªã³ã°ã«ãããå®é¨çµæãããBitNetã¯ç«¶äºåã®ããæ§è½ãéæããæå
端ã®8ãããéååæ¹æ³ããã³FP16 Transformerãã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦ãã¡ã¢ãªãããããªã³ãã¨ã¨ãã«ã®ã¼æ¶è²»ã大å¹
ã«åæ¸ãããã¨ã示ããããããã«ãBitNetã¯ãã«ãã¬ã·ã¸ã§ã³Transformerã¨åæ§ã®ã¹ã±ã¼ãªã³ã°æ³åã示ããå¹çæ§ã¨æ§è½ã®å©ç¹ãç¶æããªãããããã«å¤§ããªè¨èªã¢ãã«ã¸ã®æå¹ãªã¹ã±ã¼ãªã³ã°ã®å¯è½æ§ã示åãã¦ããã
Introduction
- 大è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã®æ¥éãªæé·ã¯ããã¾ãã¾ãªã¿ã¹ã¯ã§é¡èãªæ¹åãããããã¦ããããé«ãæ¨è«ã³ã¹ãã¨ã¨ãã«ã®ã¼æ¶è²»ã«ããããããã®ã¢ãã«ããã¹ããããã¨ã¯è²»ç¨ããããã
- ã¢ãã«ã®ãµã¤ãºã大ãããªãã«ã¤ãã¦ãã¢ãã«ãã©ã¡ã¼ã¿ã¸ã®ã¢ã¯ã»ã¹ã¨å¦çã«å¿ è¦ãªã¡ã¢ãªå¸¯åå¹ ã主è¦ãªããã«ããã¯ã¨ãªããå ¨ä½çãªæ¨è«æ§è½ãå¶éãã¦ããã
- åæ£ã·ã¹ãã ããã«ãããã¤ã¹ãã©ãããã©ã¼ã ä¸ã§ãããã®ã¢ãã«ãå±éããéãããã¤ã¹ééä¿¡ã®ãªã¼ãã¼ããããæ¨è«ã®é 延ã¨ã¨ãã«ã®ã¼æ¶è²»ã«å¤§ããªå½±é¿ãä¸ãã¦ããã
- ã¢ãã«éååã¯ã大è¦æ¨¡ã¢ãã«ã®ã¡ã¢ãªãããããªã³ãã¨è¨ç®ã³ã¹ããå¤§å¹ ã«åæ¸ããªããã競äºåã®ããæ§è½ãç¶æã§ããææãªè§£æ±ºçã§ããã
- æ¢åã®éååã¢ããã¼ãã®å¤ãã¯ããã¬ã¼ãã³ã°å¾ã«é©ç¨å¯è½ãªãããç°¡åã«ä½¿ç¨ã§ãããããããã¢ãã«ã¯éåå表ç¾ã§è¨ç·´ä¸ãæé©åããã¦ããªãããã精度ãä½ä¸ãããããä¸æ¹ã§ãéååèªèãã¬ã¼ãã³ã°ï¼quantization-aware trainingï¼ã¯ãåæ段éããéååãèæ ®ãã¦ã¢ãã«ããã¬ã¼ãã³ã°ãããããããè¯ã精度ãå®ç¾ããããã
- æ¬ç 究ã¯ã1ããã大è¦æ¨¡è¨èªã¢ãã«ã®éååèªèãã¬ã¼ãã³ã°ã調æ»ããæåã®ä½æ¥ã§ãããBitNetã¨ãã1ãããTransformerã¢ã¼ããã¯ãã£ãææ¡ããã¡ã¢ãªã¨è¨ç®ã®ä¸¡æ¹ã®é¢ã§å¹ççã«ã¹ã±ã¼ã«ãããã¨ãç®æãã¦ããã
- BitNetã¯ãä½ç²¾åº¦ã®ãã¤ããªéã¿ã¨éååãããã¢ã¯ãã£ãã¼ã·ã§ã³ã使ç¨ããªããããã¬ã¼ãã³ã°ä¸ã«æé©åå¨ã®ç¶æ ã¨å¾é ã«ã¯é«ç²¾åº¦ãç¶æããã
- BitNetã®å®è£ ã¯åç´ã§ãããTransformerå ã®ç·å½¢å°å½±ï¼PyTorchã®nn.Linearãªã©ï¼ã®ç½®æã®ã¿ãè¦æ±ãããã¾ããPagedAttentionãFlashAttentionãæ¨æ¸¬ãã³ã¼ãã£ã³ã°ãªã©ãä»ã®å¤§è¦æ¨¡è¨èªã¢ãã«ã®å éæ¹æ³ã¨è£å®ãããã
- è¨èªã¢ããªã³ã°ã®ãã³ããã¼ã¯ã«ããã¦BitNetãè©ä¾¡ããæå 端ã®éååæ¹æ³ããã³FP16 Transformerã¨æ¯è¼ãããå®é¨çµæã¯ãBitNetããã¼ãã¬ãã·ãã£ã¨ä¸æµã¿ã¹ã¯ã®ç²¾åº¦ã®ä¸¡æ¹ã§ç«¶äºåã®ããæ§è½ãéæããã¡ã¢ãªãããããªã³ãã¨ã¨ãã«ã®ã¼æ¶è²»ããã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦å¤§å¹ ã«åæ¸ãããã¨ã示ãã¦ããã
BitNet
- BitNetã¯ãã»ã«ãã¢ãã³ã·ã§ã³ã¨ãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ã®ãããã¯ãç©ã¿éãããTransformerã¨åãã¬ã¤ã¢ã¦ãã使ç¨ãã¦ãããå¾æ¥ã®è¡åä¹ç®ã®ä»£ããã«ã1ãããã¢ãã«éã¿ã使ç¨ããBitLinearãæ¡ç¨ãã¦ããã
- ãã®ä»ã®ã³ã³ãã¼ãã³ãã¯ä»¥ä¸ã®çç±ããéååããã«é«ç²¾åº¦ï¼ä¾ï¼8ãããï¼ãç¶æãã¦ããã
- æ®å·®æ¥ç¶ã¨ã¬ã¤ã¤ã¼æ£è¦åã¯ã大è¦æ¨¡è¨èªã¢ãã«ã«ãããè¨ç®ã³ã¹ããå°ããç¡è¦ã§ãã
- QKVï¼query, key, valueï¼ã®å¤æã³ã¹ãã¯ãã¢ãã«ã大ãããªãã«ã¤ããè¡åä¹ç®é¨ã«æ¯ã¹å°ãããªãããã
- å ¥åºåã®ã¨ã³ããã£ã³ã°ã«ã¯ãè¨èªã¢ãã«ãé«ç²¾åº¦ã®ç¢ºçã使ç¨ãã¦ãµã³ããªã³ã°ãè¡ãå¿ è¦ãããããã精度ãä¿æããå¿ è¦ãããã
BitLinear
- éã¿ã符å·é¢æ°ãç¨ãã¦+1ã¾ãã¯â1ã«äºå¤åããäºå¤ååã«éã¿ãã¼ãå¹³åã«ãããã¨ã§ãéãããæ°å¤ç¯å²å ã§ã®å®¹éãå¢å ããããäºå¤åå¾ã«ã¯ãå®æ°å¤ã¨äºå¤åãããéã¿éã®$l2$ã¨ã©ã¼ãæ¸ããããã«ã¹ã±ã¼ãªã³ã°ä¿æ°$β$ã使ç¨ãããéã¿$W \in \mathbb{R}^{n \times m}$ã®äºå¤åã¯ä»¥ä¸ã®ããã«å®å¼åãããã
ããã§
- ã¢ã¯ãã£ãã¼ã·ã§ã³ã¯bããã精度ã«ããã«éååããããabsmaxéååã使ç¨ããå ¥åè¡åã®çµ¶å¯¾æ大å¤ã§å²ããã¨ã«ãããã¢ã¯ãã£ãã¼ã·ã§ã³ãç¯å²$[âQ_b, Q_b] (Qb = 2^{bâ1})$ã«ã¹ã±ã¼ã«ããã
$$ x_e = \text{Quant}(x) = \text{Clip}\left(\frac{x \times Q_b}{\gamma}, -Q_b + \epsilon, Q_b - \epsilon\right), \quad $$
ããã§$\epsilon$ã¯ã¯ã¯ãªããã³ã°ãå®è¡ããéã«ãªã¼ãã¼ããã¼ãé²ãããã®å°ããªæµ®åå°æ°ç¹æ°
- éç·å½¢é¢æ°ï¼ä¾ï¼ReLUï¼ã®åã®ã¢ã¯ãã£ãã¼ã·ã§ã³ã«ã¤ãã¦ã¯ãå ¥åã®æå°å¤ãå¼ããã¨ã§ç¯å²$[0, Q_b]$ã«ã¹ã±ã¼ã«ãããã¹ã¦ã®å¤ãéè² ã«ããã
ãã®ç 究ã§ã¯ãã¢ã¯ãã£ãã¼ã·ã§ã³ã8ãããã«éååããããä½ã精度ã¯å°æ¥ã®ç 究ã¨ãããã¾ãããã¬ã¼ãã³ã°ä¸ã¯ãã³ã½ã«ãã¨ã«ãæ¨è«ä¸ã¯ãã¼ã¯ã³ãã¨ã«éååãè¡ããå®å®æ§ã¨å¹çæ§ã確ä¿ããã
ä¸è¨ã®éååæ¹ç¨å¼ãç¨ããã¨ãè¡åä¹ç®ã¯ä»¥ä¸ã®ããã«è¨è¿°ã§ãã
$$ y=\tilde{W}\tilde{x} $$
ï¼ï½ã¯ï¼ãã«ãï¼tildeï¼çå¤ã«ã»ã¼çããï¼
ããã§ã$W$ã¨$x$ã®è¦ç´ ã¯äºãã«ç¬ç«ã§ãããåãåå¸ãå ±æãã¦ããã¨ä»®å®ããããã®å¾ãåºå$y$ã®åæ£ã¯ä»¥ä¸ã®ããã«æ¨å®ãããï¼
$$ \text{Var}(y) = n\text{Var}(\tilde{w}\tilde{x}) $$
$$ = n\mathbb{E}[\tilde{w}^2]\mathbb{E}[\tilde{x}^2] \quad $$
- ãã«ãã¬ã·ã¸ã§ã³ï¼å ¨ç²¾åº¦ï¼è¨ç®ã§ã¯ãKaimingåæåãXavieråæåãªã©ã®æ¨æºçãªåæåæ¹æ³ã使ç¨ãããã¨ã§ãåºåã®åæ£Var(y)ã1ã®ã¹ã±ã¼ã«ã«ä¿ã¡ããã¬ã¼ãã³ã°ãå®å®ãããã
- éååã«ãã精度ã®ä½ä¸ãé²ããåºåã®åæ£ãç¶æããããã«ãæ´»æ§åéååã®åã«LayerNormï¼ã¬ã¤ã¤ã¼æ£è¦åï¼é¢æ°ãå°å ¥ãããããã«ãããåºå$y$ã®åæ£ã¯ã¨æ¨å®ãããããã¯ãã«ãã¬ã·ã¸ã§ã³ã®åºåã®åæ£$Var(y)$ã¨åã大ããã«ãªãã¾ãã
- Transformerã¢ãã«ã®æèã§ã¯ããã®ããã»ã¹ã¯SubLNï¼ãµãã¬ã¤ã¤ã¼æ£è¦åï¼ã¨ãã¦å®è£ ããã¦ãããããã¯ãBitLinearã®å°å ¥ã«ããã1ãããéã¿ã¨éååãããã¢ã¯ãã£ãã¼ã·ã§ã³éã®è¡åä¹ç®ãå¯è½ã«ãã¾ãã
- BitLinearã¯ãSubLNããã³éååæ¹æ³ãç¨ãã¦æ¬¡ã®ããã«å®å¼åããã
$$ y = \tilde{W}\tilde{f} = \tilde{W}Quant(LN(x)) à \frac{βγ}{ Q_b} $$
ããã§ã$β$ã¯ã¹ã±ã¼ãªã³ã°ä¿æ°ã$γ$ã¯æ£è¦åã®ã¹ã±ã¼ã«ãã¯$Q_b$éååã®ãããæ°ã表ããSubLNæä½å¾ãæ´»æ§åã¯absmaxé¢æ°ã§éååããã1ãããéã¿ã¨éååãããæ´»æ§åéã§è¡åä¹ç®ãå®è¡ããã¾ããåºåæ´»æ§åã¯${β, γ}$ã§åã¹ã±ã¼ã«ãããå ã®ç²¾åº¦ã«ééååãããã
Model parallelism with Group Quantization and Normalization
大è¦æ¨¡è¨èªã¢ãã«ã®ã¹ã±ã¼ã«ã¢ããã«ã¯ã¢ãã«ä¸¦åæ§ãéè¦ã§ãããããã¯è¤æ°ã®ããã¤ã¹ä¸ã§ã®è¡åä¹ç®ãåå²ããæè¡ã§ããããã ããå ¨ã¦ã®ãã©ã¡ã¼ã¿Î±ãβãγãηã¯ãã³ã½ã«å ¨ä½ããè¨ç®ããããããç¬ç«æ§ã®åæãç ´ãã
ä¸ã¤ã®è§£æ±ºçã¨ãã¦ãåãã©ã¡ã¼ã¿ã«å¯¾ãã¦all-reduceæä½ãå°å ¥ãããã¨ãèããããããã¢ãã«ãæ·±ããªãã«ã¤ãã¦éä¿¡ã®éãå¢å ããå¦çãé ããªãã
ãã®åé¡ã解決ããããã«ãéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ãã°ã«ã¼ãã«åå²ããåã°ã«ã¼ãã®ãã©ã¡ã¼ã¿ãç¬ç«ãã¦æ¨å®ãããã¨ã§ã追å ã®éä¿¡ãªãã«ãã©ã¡ã¼ã¿ããã¼ã«ã«ã§è¨ç®ã§ããæ°ããã¢ããã¼ããææ¡ããããã®ææ³ã¯ã°ã«ã¼ãéååã¨å¼ã°ããã
å ·ä½çã«ã¯ãéã¿è¡å $W \in \mathbb{R}^{n \times m}$ããã¼ãã£ã·ã§ã³æ¬¡å ã«æ²¿ã£ã¦ G ã°ã«ã¼ãã«åå²ããåã°ã«ã¼ãã ã®ãµã¤ãºãæã¤ããã«ããã次ã«ãåã°ã«ã¼ãã®ãã©ã¡ã¼ã¿ãç¬ç«ãã¦æ¨å®ããï¼
ããã§ã$W^{(g)}$ ã¯éã¿è¡åã® g çªç®ã®ã°ã«ã¼ãã表ããåæ§ã«ãå ¥åè¡å $x\in \mathbb{R}^{n \times m}$ ã G ã°ã«ã¼ãã«åå²ããåã°ã«ã¼ãã®ãã©ã¡ã¼ã¿ãè¨ç®ããï¼
- LNï¼Layer Normalizationï¼ã«ã¤ãã¦ã¯ãã°ã«ã¼ãæ£è¦åæè¡ãé©ç¨ãã¦ãåã°ã«ã¼ãã®å¹³åã¨åæ£ãç¬ç«ã«è¨ç®ã§ããã
$$ \text{LN}(x^{(g)}) = \frac{x^{(g)} - \mathbb{E}(x^{(g)})}{\sqrt{\text{Var}(x^{(g)}) + \epsilon}} $$
- ãã®æ¹æ³ã«ããã追å ã®éä¿¡ãå¿ è¦ã¨ããã«ãããå¹ççãªã¢ãã«ä¸¦åæ§ãå®ç¾ãããã
Model Training
Straight-through estimator
- ãããã¢ãã«ã®ãã¬ã¼ãã³ã°ã«ã¯ãããã¯ãããã²ã¼ã·ã§ã³ä¸ã®å¾é ãè¿ä¼¼ããããã«ã¹ãã¬ã¼ãã¹ã«ã¼æ¨å®å¨ï¼STEï¼ã使ç¨ããããã®æ¹æ³ã¯ãããã¯ã¯ã¼ããã¹ä¸ã®éå¾®åå¯è½ãªé¢æ°ï¼ä¾ï¼Signé¢æ°ãClipé¢æ°ï¼ãåé¿ããéååã¢ãã«ã®ãã¬ã¼ãã³ã°ãå¯è½ã«ããã
Mixed precision training
- éã¿ã¨æ´»æ§åã¯ä½ç²¾åº¦ã«éååãããããå¾é ã¨ãªããã£ãã¤ã¶ã®ç¶æ ã¯ãã¬ã¼ãã³ã°ã®å®å®æ§ã¨ç²¾åº¦ãä¿è¨¼ããããã«é«ç²¾åº¦ã®ã¾ã¾ä¿åããããå¦ç¿å¯è½ãªãã©ã¡ã¼ã¿ã®ããã«ãé«ç²¾åº¦ãã©ã¼ãããã®æ½å¨éã¿ãä¿æãããã©ã¡ã¼ã¿æ´æ°ãèç©ãããæ½å¨éã¿ã¯ãã©ã¯ã¼ããã¹ä¸ã«äºå¤åãããæ¨è«ããã»ã¹ã«ã¯ä½¿ç¨ãããªãã
Large learning rate
- æé©åã®èª²é¡ã®ä¸ã¤ã¯ãæ½å¨éã¿ã®å°ããªæ´æ°ã1ãããéã¿ã«ã»ã¨ãã©éããçããããªããã¨ã§ãããããã¯ãã¤ã¢ã¹ã®ããã£ãå¾é ã¨æ´æ°ãçããããç¹ã«ãã¬ã¼ãã³ã°ã®åæ段éã§åé¡ã¨ãªãããã®èª²é¡ã«å¯¾å¦ãããããå¦ç¿çãå¢å ããããã¨ãæé©åãå éããæãåç´ã§æè¯ã®æ¹æ³ã§ãããã¨ãçºè¦ãããBitNetã¯å¤§ããªå¦ç¿çããåæã«ããã¦å©çãå¾ãããåãå¦ç¿çã§FP16 Transformerã¯ãã¬ã¼ãã³ã°ã®éå§æã«çºæ£ããã
Computational Efficiency
- BitNetã®è¨ç®å¹çã¯ãç®è¡æ¼ç®ã®ã¨ãã«ã®ã¼ã¨ã¡ã¢ãªãããããªã³ãã®ä¸¡æ¹ã®è¦³ç¹ã§è©ä¾¡ãããã
- [Hor14, ZZL22]ã«ãããã¨ãã«ã®ã¼ã¢ãã«ã«ããã°ãç°ãªãç®è¡æ¼ç®ã®ã¨ãã«ã®ã¼æ¶è²»ã¯ä»¥ä¸ã®ããã«æ¨å®ãããã
ããã©Transformerã«ãããã¨ãã«ã®ã¼æ¶è²»
$mÃn$ ã¨$nÃp$ ã®æ¬¡å ãæã¤è¡åä¹ç®ã§ã¯ãã¨ãã«ã®ã¼æ¶è²»ã¯å ç®ã¨ä¹ç®ã§æ¬¡ã®ããã«è¨ç®ããã
BitNetã«ãããã¨ãã«ã®ã¼æ¶è²»
BitNetã§ã¯ã1ãããã®éã¿ã使ç¨ãããããè¡åä¹ç®ã®ã¨ãã«ã®ã¼æ¶è²»ã¯å ç®æä½ã«ãã£ã¦æ¯é ããããä¹ç®æä½ã¯åºåãã¹ã±ã¼ã©ã¼$β$ã¨$γ/Q_b$ ã§ã¹ã±ã¼ãªã³ã°ããããã«ã®ã¿é©ç¨ããããããä¹ç®ã®ã¨ãã«ã®ã¼æ¶è²»ã¯ E${\text{mul}} = (m \times p + m \times n) \times \hat{E}_{\text{mul}}$ã¨ãã¦è¨ç®ã§ããããã¯Transformerã«æ¯ã¹ã¦èããå°ããã
Comparison with FP16 Transformers
Setup
- BitNetãç¨ããæ§ã ãªã¹ã±ã¼ã«ã®èªå·±å帰è¨èªã¢ãã«ã125Mãã30Bã®ç¯å²ã§ãã¬ã¼ãã³ã°ãããã¢ãã«ã¯Pileãã¼ã¿ã»ãããCommon Crawlã¹ãããã·ã§ãããRealNewsãCC-Storiesãã¼ã¿ã»ããããæ§æãããè±èªã³ã¼ãã¹ä¸ã§ãã¬ã¼ãã³ã°ãããããã¼ã¿ã®åå¦çã«ã¯Sentencpieceãã¼ã¯ãã¤ã¶ã¼ã使ç¨ããèªå½ãµã¤ãºã¯16Kã§ãããBitNetã«å ãã¦ãå ¬å¹³ãªæ¯è¼ã®ããã«åããã¼ã¿ã»ããã¨è¨å®ã§Transformerãã¼ã¹ã©ã¤ã³ããã¬ã¼ãã³ã°ããã
Inference-Optimal Scaling Law
- ãã¥ã¼ã©ã«è¨èªã¢ãã«ã¯ããã©Transformerã¢ã¼ããã¯ãã£ã§äºæ¸¬å¯è½ã«ã¹ã±ã¼ã«ãããã¨ã証æããã¦ãããæ失ã¯ãã¬ã¼ãã³ã°ã«ä½¿ç¨ãããè¨ç®éã®ã¹ãä¹åã«å¾ã£ã¦ã¹ã±ã¼ã«ãããããã«ãããè¨ç®äºç®ã®æé©ãªå²ãå½ã¦ã決å®ããå°ããã¢ãã«ãã大è¦æ¨¡è¨èªã¢ãã«ã®æ§è½ãäºæ¸¬ã§ããã
- ãã¤ãã©ã¤ãºãTransformerã®ã¹ã±ã¼ãªã³ã°æ³åã調æ»ãããããBitNetã¨FP16 Transformerãã¼ã¹ã©ã¤ã³ã®ãã©ã¡ã¼ã¿ã«ã¦ã³ãã«å¯¾ããã¹ã±ã¼ãªã³ã°æ²ç·ããããããããBitNetã®æ失ã¹ã±ã¼ãªã³ã°ã¯FP16 Transformerã«ä¼¼ã¦ãããã¹ãä¹åï¼$L(N) = aNb + c$ï¼ã«å¾ã(ä¸å³ã§ã125Mãã6.7Bã®ã¢ãã«çµæã§ã¹ãä¹åããã£ããã£ã³ã°ãããã13Bããã³30Bãã®ãã¾ããBitNetã¨FP16 Transformeréã®ã®ã£ããã¯ã¢ãã«ãµã¤ãºã大ãããªãã«ã¤ãã¦å°ãããªãã
- Inference-Optimal Scaling Lawãå°å ¥ããã¨ãã«ã®ã¼æ¶è²»ã«å¯¾ããæ失ãäºæ¸¬ãããããã¯ã¢ãã«ã®ä½¿ç¨éã«å¿ãã¦ã¹ã±ã¼ã«ããæ¨è«ã¨ãã«ã®ã¼ã³ã¹ãã«ç¦ç¹ãå½ã¦ããã¬ã¼ãã³ã°ã³ã¹ããããå¹ççãªã¹ã±ã¼ãªã³ã°ãæä¾ãããBitNetã¯åºå®ãããè¨ç®äºç®ã§é¡èã«è¯ãæ失ãéæããFP16ã¢ãã«ã¨åãæ§è½ãå¾ãããã®æ¨è«ã³ã¹ãã¯å¤§å¹ ã«å°ããã
Results on Downstream Tasks
- BitNetã®ã¹ã±ã¼ãªã³ã°ã«ä¼´ãè½åã«ã¤ãã¦ããæ失ã¨åæ§ã«é¢å¿ããããHellaswagãWinograndeãWinogradãStoryclozeãå«ã4ã¤ã®ä¸æµã¿ã¹ã¯ã§ã0ã·ã§ããã¨4ã·ã§ããã®çµæããã¹ããã解éå¯è½ãªææ¨ã§ã¹ã±ã¼ãªã³ã°ã«ä¼´ãè½åãè©ä¾¡ããï¼ä¸å³ï¼ãBitNetã¨FP16 Transformerã®å¹³åçµæãå ±åããè¨ç®äºç®ãå¢ããã«ã¤ãã¦ä¸æµã¿ã¹ã¯ã®æ§è½ãã¹ã±ã¼ã«ãããã¨ã示ãããã
- ä½ãããTransformerã®ãã¬ã¼ãã³ã°ã«ããã主è¦ãªèª²é¡ã¯æé©åã®å®å®æ§ã§ãããããBitNetã¨FP16ãã¼ã¹ã©ã¤ã³ã®å®å®æ§ãã¹ãããç°ãªããã¼ã¯å¦ç¿çã§ã®ã¢ãã«ã·ãªã¼ãºã®ãã¬ã¼ãã³ã°ã«ãã£ã¦è¡ããBitNetã¯å¤§ããªå¦ç¿çã§åæã§ããããFP16 Transformerã¯ã§ããªããã¨ã示ããï¼ä¸å³ï¼ãBitNetã®ãã¬ã¼ãã³ã°å®å®æ§ãããåªãã¦ãããã¨ã示ãã¦ããã
<
- BitNetã¯å¦ç¿çã®å¢å ããæ©æµãåããPPL(perplexity)ã®è¦³ç¹ã§ããè¯ãåæãéæã§ãããã¨ã示ãããã
Comparison with Post-training Quantization
- BitNetãAbsmaxãSmoothQuantãGPTQãQuIPãå«ãæå 端ã®éååæ¹æ³ã¨æ¯è¼ããããããã®æ¹æ³ã¯FP16 Transformerã¢ãã«ä¸ã§ã®ãã¬ã¼ãã³ã°å¾éååã§ãããBitNetã¨åããã¬ã¼ãã³ã°è¨å®ããã³ãã¼ã¿ã«å¾ããAbsmaxã¨SmoothQuantã¯éã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®ä¸¡æ¹ãéååããGPTQã¨QuIPã¯éã¿ã®ç²¾åº¦ã®ã¿ãä¸ããã
- éã¿ã®ã¿ã®éååï¼GPTQã¨QuIPï¼ã«ã¤ãã¦ã¯ãW4A16ã¨W2A16ã§å®é¨ãè¡ããéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®éååï¼Absmaxã¨SmoothQuantï¼ã«ã¯ãFP16 TransformerãW8A8ãW4A4ãW1A8ã«éååãããBitNetã®å®è£ ã¯ãã¤ããªéã¿8ãããã¢ã¯ãã£ãã¼ã·ã§ã³ï¼W1A8ï¼ã§ããããã¼ã¹ã©ã¤ã³ãããä½ããåçã®ãããæ°ã§ããã
- WinograndeãWinogradãStoryclozeãHellaswagã®4ã¤ã®ãã³ããã¼ã¯ãã¼ã¿ã»ããã«ãããæ§ã ãªãã¼ã¹ã©ã¤ã³ã¢ããã¼ãã«å¯¾ããBitNetã®ææ¡æ¹æ³ã®ã¼ãã·ã§ããæ§è½ã®è©³ç´°ãªæ¯è¼åæãæ示ããã
- å ¬å¹³ãªæ¯è¼ã®ããã«ããã¹ã¦ã®ã¢ãã«ã¯6.7Bã®ã¢ãã«ãµã¤ãºãæã¡ã16ãã1ã«è³ãã¾ã§ã®ããã¤ãã®éã¿ãããã¬ãã«ã§è©ä¾¡ããããè©ä¾¡ææ¨ã«ã¯ãä¸æµã¿ã¹ã¯ã®ã¼ãã·ã§ãã精度ã«å ãã¦ãåæ¹æ³ã®æ§è½ãå æ¬çã«ç解ããããã®æ¤è¨¼ã»ããä¸ã®è¨èªã¢ãã«ãã¼ãã¬ãã·ãã£ãå«ã¾ããã
- BitNetã¯ãç¹ã«ä½ãããã¬ãã«ã§ããã¼ã¹ã©ã¤ã³ã¢ããã¼ãã¨æ¯è¼ãã¦è¿ãæ§è½ã¬ãã«ãéæãã¦ãããBitNetã®ã¼ãã·ã§ããã¹ã³ã¢ã¯8ãããã¢ãã«ã«å¹æµããããæ¨è«ã³ã¹ãã¯ã¯ããã«ä½ãã
- 4ãããã¢ãã«ã«ããã¦ã¯ãéã¿ã®ã¿ãéååããæ¹æ³ãéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®éååå¨ãããæ§è½ãè¯ããããã¯ãã¢ã¯ãã£ãã¼ã·ã§ã³ãéååãããã¨ãããå°é£ã§ããããã§ããã1ãããã¢ãã«ã§ããBitNetã¯ãéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®éååæ¹æ³ããã³éã¿ã®ã¿ã®æ¹æ³ãããèããåªããçµæãéæãã¦ããã
- ä½ãããã¢ãã«ã«é¢ãã¦ãBitNetã¯ãã¹ã¦ã®ãã¼ã¹ã©ã¤ã³ã«å¯¾ãã¦ä¸è²«ãã¦åªããã¹ã³ã¢ãæã£ã¦ãããããã¯ããã¬ã¼ãã³ã°å¾ã®éååæ¹æ³ãããéååèªèãã¬ã¼ãã³ã°ã¢ããã¼ãã®å©ç¹ã証æãã¦ããã1.3Bãã6.7Bã¾ã§ã¢ãã«ãµã¤ãºãã¹ã±ã¼ã«ã¢ããããéã®ãç§ãã¡ã®æ¹æ³ã¨ãã¼ã¹ã©ã¤ã³ã®ã¼ãã·ã§ãã精度ã¨ãã¥ã¼ã·ã§ãã精度ã®ä¸¡æ¹ãè¦ç´ããå³6ã¯ããã®å©ç¹ãç°ãªãã¹ã±ã¼ã«ã§ä¸è²«ãã¦ãããã¨ã証æãã¦ããã
Ablation Studies
- 以ä¸ã«BitNetã®ã¢ãã¬ã¼ã·ã§ã³ã¹ã¿ãã£ãããã¤ãã®ä»£æ¿ã¢ããã¼ãã¨ã®æ¯è¼çµæã示ããæ´»æ§åéååã¢ããã¼ãã®é¸æã¨ã¢ãã«ãã¬ã¼ãã³ã°ã®å®å®åæè¡ã®å¹æãæ¤è¨¼ããã
- BitNetã¯æ´»æ§åã®éååã«absmaxã使ç¨ãããã¬ã¼ãã³ã°ã®å®å®æ§ã®ããã«SubLNã使ç¨ãããéååã®ä»£æ¿æ¡ã¨ãã¦ãå¦ç¿å¯è½ãªãã©ã¡ã¼ã¿ã§ã¹ã±ã¼ã«ãåçã«èª¿æ´ããelasticé¢æ°ããããå®é¨ã§ã¯ãabsmaxãelasticé¢æ°ãããåªããæ§è½ã示ããã¨ããããã
- ããã«ãabsmaxé¢æ°ã¯ããå®å®ãããã¬ã¼ãã³ã°ããããããBitNetã«å¯¾ãã¦ãã大ããªå¦ç¿çãå¯è½ã«ãããSubLNãPre-LNããã³BMTã¢ã¼ããã¯ãã£ã¨æ¯è¼ãããPre-LNã¯GPTã®ããã©ã«ãã¢ã¼ããã¯ãã£ã§ãããBMTã¯ãã¤ãã©ã¤ãºãã¢ãã«ã®å®å®æ§ãæ¹åãããã¨ã証æããã¦ãããå®é¨ã§ã¯ãSubLNãPre-LNã¨BMTã®ä¸¡æ¹ãä¸åããã¨ã示ãããããã£ã¦ãBitNetã®å®è£ ã«ã¯absmaxã¨SubLNãé¸æããã
Conclusion and Future Work
- BitNetã大è¦æ¨¡è¨èªã¢ãã«ç¨ã®æ°ãã1ãããTransformerã¢ã¼ããã¯ãã£ãç´¹ä»ããããã®ã¢ããã¼ãã¯ã大è¦æ¨¡è¨èªã¢ãã«ãå¹ççã«æ±ããã¨ãã§ããã¹ã±ã¼ã©ãã«ã§å®å®ããè¨è¨ãç®æãã¦ããã
- å®é¨çµæã¯ãBitNetããã¼ãã¬ãã·ãã£ã¨ä¸æµã¿ã¹ã¯ã®ããã©ã¼ãã³ã¹ã®ä¸¡æ¹ã«ããã¦ç«¶äºåã®ããæ§è½ãéæãããã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦ã¡ã¢ãªãããããªã³ãã¨ã¨ãã«ã®ã¼æ¶è²»ãå¤§å¹ ã«åæ¸ãããã¨ã示ãã¦ãããããã«ãBitNetã¯ãã«ãã¬ã·ã¸ã§ã³ãã©ã³ã¹ãã©ã¼ãã¼ã¨åæ§ã®ã¹ã±ã¼ãªã³ã°æ³åã«å¾ããããã©ã¼ãã³ã¹ã¨å¹çã®é¢ã§æ½å¨çãªå©ç¹ãæã£ã¦ããã«å¤§ããªè¨èªã¢ãã«ã«å¹æçã«ã¹ã±ã¼ã«ã¢ããã§ãããã¨ã示ãã¦ããã
- å°æ¥çã«ã¯ãã¢ãã«ãµã¤ãºã¨ãã¬ã¼ãã³ã°ã¹ãããã®é¢ã§BitNetãã¹ã±ã¼ã«ã¢ãããããã¨ãç®æãã¦ãããã¾ãã大è¦æ¨¡è¨èªã¢ãã«ã®ãã¬ã¼ãã³ã°ã«ããã¦BitNetãä»ã®ã¢ã¼ããã¯ãã£ï¼ä¾ï¼RetNetï¼ã«é©ç¨ãããã¨ã«ãé¢å¿ãããã
ãã®è«æã®çºå±ç1.58bitã®è¦ç´ã¯ãã¡ã reseachpaper-matome.hatenablog.com
è«æè¦ç´ï¼GPT Takes the Bar Exam
GPT Takes the Bar ExamÂ
Michael Bommarito II, Daniel Martin Katz 2022
ã©ã¤ã»ã³ã¹
CC BY 4.0 Deed | Attribution 4.0 International | Creative Commons
- GPT Takes the Bar ExamÂ
- Abstract
- Introduction
- DATA
- Methods
- Results
- Conclusion and Future Work
- ãã¾ããGPTï¼ã«ãããµã³ãã«åé¡ã®è§£èª¬ã¨çãï¼æ¥æ¬èªï¼
Â
Abstract
- ç 究ã®ç®ç
- ã¢ã¡ãªã«åè¡å½ã®æ³æ¹è³æ ¼è©¦é¨ï¼ãã¼è©¦é¨ï¼ã®å¤è¢é¸æå¼ã»ã¯ã·ã§ã³ï¼MBEï¼ã«ãããOpenAIã®text-davinci-003ã¢ãã«ï¼GPT-3.5ã¨ãå¼ã°ããï¼ã®æ§è½ãå®é¨çã«è©ä¾¡ãããã¨ã
- ææ³
- GPT-3.5ã®ã¼ãã·ã§ããæ§è½ã«å¯¾ãã¦ããã¤ãã¼ãã©ã¡ã¼ã¿ã®æé©åã¨ããã³ããã¨ã³ã¸ãã¢ãªã³ã°ãé©ç¨ãããã®å½±é¿ãè©ä¾¡ãã¾ããMBEã®å®å ¨ãªç·´ç¿è©¦é¨ã«ãããæ£è§£çã¨ãã¨ããã³ã¹ããã³ãã¼ãã®ç§ç®ã§ã®åæ ¼çã測å®ã
- çµæ
- GPT-3.5ã¯ããã¹ãããã³ããã¨ãã©ã¡ã¼ã¿ã¼ãç¨ããå ´åãMBEç·´ç¿è©¦é¨ã§ã®æ£è§£çã50.3%ã«éãã25%ã®åºæºæ¨æ¸¬çãå¤§å¹ ã«ä¸åããã¨ããã³ã¹ã¨ãã¼ãã®ä¸¡æ¹ã§åæ ¼çãéæãããã¾ããGPT-3.5ã®é¸æè¢ã®ã©ã³ãã³ã°ã¯æ£è§£ã¨é«ãç¸é¢ã示ããä¸ä½2ã¤ããã³ä¸ä½3ã¤ã®é¸æè¢ããããã71%ã88%ã®å²åã§æ£è§£ã§ãããã¨ã示ããã
- çµè«
- GPT-3.5ã®MBEã»ã¯ã·ã§ã³ã«ãããæ§è½ã¯ãLLMãè¿ãå°æ¥ãã¼è©¦é¨ã®MBEé¨åã«åæ ¼ããå¯è½æ§ãé«ããã¨ãå¼·ã示åãã¦ããããã ããLLMã¨GPTã®æ°ããç§å¦çç解ã¨ææ権ã®æ§è³ªã«ããããããã®çµæã®è§£éã¯éå®ããã¦ããã
Introduction
æ³å¾ã·ã¹ãã ã®è¤éãã«ã¤ãã¦
- æ³å¾ã·ã¹ãã ã®è¤éããå¢ãã¦ããã社ä¼ãæ±ããæ³çãµã¼ãã¹ã®éã質ãã¢ã¯ã»ã·ããªãã£ã®åä¸ã®ããã«æè¡ã®æ¯æ´ãå¿ è¦ã¨ãªã£ã¦ããã
- 人工ç¥è½ãããã»ã¹ã¨ã³ã¸ãã¢ãªã³ã°ã¯ãæ³å¾ã·ã¹ãã ã®éå°é家ããã³å°é家ã®ä¸¡æ¹ã«å¯¾ãã¦æ°åå¹´ã«ãããæ¯æ´ãã¦ããã
- ããããªããæ³çè¨èªã®è¤éãã¨æ³çç¥èã®åºå¤§ãããæ³çãªåé¡ã®ãã¥ã¢ã³ã¹ãç解ããã·ã¹ãã ã®éçºãå°é£ã«ãã¦ããã
- æ³å¾ã¯è¨èªã®ä½¿ç¨ã«å¤§ããä¾åãã¦ãããæ³çææ¸ã¯é常ã«å¤§éã«çæããã¦ãããæ³çè¨èªã¯è¤éã§ãããæ³å¾å°é家ã¯ãã®è¨èªãç解ãçæããããã«ã»ã¼10å¹´éã®æè²ã¨å°éçãã¬ã¼ãã³ã°ãåãã¦ããã
- æ³çè¨èªã®è¤éãã¯ãç¹ã«é«åº¦ã«è¦ç¯åãããæ £ç¿ã¨å³å¯ã«æ£ç¢ºãªãã¬ã¼ãºã«ãããã®ã§ãããé常ã®è¨èªã¨ã¯å¤§ããç°ãªãã
æ©æ¢°å¦ç¿ã«ããè¨èªã¢ãã«ã®çºå±
- è¿å¹´ãèªç¶è¨èªå¦çã¨è¨ç®ã®é²æ©ã«ãããæ©æ¢°å¦ç¿æè¡ã®ããã©ã¼ãã³ã¹ãå¤§å¹ ã«åä¸ãã¦ããã
- ãã©ã³ã¹ãã©ã¼ãã¼ã¢ã¼ããã¯ãã£ã®å°å ¥ã¯ãç¹ã«ããã¹ããç»åã®ã¢ããªãã£ã«ããã¦é©å½ããããããæåãã¦ããã
- OpenAIã®GPTã¢ãã«ã¯ãç¹ã«æåã§ã¢ã¯ã»ã¹ãããã大è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã§ãããGPT-3ã¯1750åã®ãã©ã¡ã¼ã¿ã¼ãæã¤èªå·±å帰è¨èªã¢ãã«ã§ããã
- OpenAIã®ã¢ãã«ã¸ã®ã¢ã¯ã»ã¹ã¯ãåæ¥çããã³å«çççç±ãããOpenAIã®APIãéãã¦ã®ã¿æä¾ããã¦ãããããã¹ãå®äºãã³ã¼ãå®äºãç»åçæãåãè¾¼ã¿çæã®ã¨ã³ããã¤ã³ããæä¾ãã¦ããã
- GPT-3.5ãChatGPTã¯ã¼ãã·ã§ããããã¥ã¼ã·ã§ããã®ã¿ã¹ã¯ã«ããã¦ããã¾ã§ã«ãªãæ§è½ã示ãã¦ãããããã¡ã¤ã³åºæã®ã¢ãã«ã§ã¯ãªããMultistate Bar Examination (MBE)ã®ãããªæ³ç試é¨ã«ããã¦æå 端ã®LLMãæåãããã¯æªè§£æããã
DATA
- MBEã®è³ªåã¯ãæ³çç¥èã¨èªè§£åã®ä¸¡æ¹ã試ãããã«è¨è¨ããã¦ãããè±èªã®ä¸ç´ã¬ãã«ã®æå³è«çããã³çµ±èªè«çç解ãè¦æ±ããã
- MBEã®è³ªåã¯ç´æ¥çãªæ³çåé¡ãåºãã®ã§ã¯ãªãããã¹ãåé¨è ã«æ¶ç©ºã®ç¶æ³ãæ示ãã詳細ã«é£¾ãä»ããããäºå®ã®è¨è¿°ãæä¾ããããããã®è©³ç´°ã®ä¸ã«ã¯éè¦ãªãã®ãããã°ãèªè ãæãããããã ãã«è¿½å ããããã®ãããã
- 以ä¸ã¯å ¬éããã¦ãããµã³ãã«è³ªåã§ãããåè»ã«ãã£ã¦è»ãè¡çªããäºæ ã«é¢ãã¦ã交差ç¹è¿ãã«15å¹´éä½ãã§ããä½æ°ã®è¨¼è¨ã®è¨±å®¹æ§ã«ã¤ãã¦åããã¦ããã
Question: A man sued a railroad for personal injuries suffered when his
car was struck by a train at an unguarded crossing. A major issue is
whether the train sounded its whistle before arriving at the crossing.
The railroad has offered the testimony of a resident who has lived near
the crossing for 15 years. Although she was not present on the occasion
in question, she will testify that, whenever she is home, the train always
sounds its whistle before arriving at the crossing.
Is the residentâs testimony admissible?
(A) No, due to the residentâs lack of personal knowledge regarding the
incident in question.
(B) No, because habit evidence is limited to the conduct of persons,
not businesses.
(C) Yes, as evidence of a routine practice.
(D) Yes, as a summary of her present sense impressions.Â
- Bar試é¨ã®MBEé¨åã¯ãä¸è¨ã®ãµã³ãã«ã®ãããªç´200ã®è³ªåããæ§æããããå®éã®è©¦é¨ã§ã¯ã8ã¤ã®ã«ãã´ãªãã25ã®è³ªåãåºããããã®ãã¡7ã¤ã¯ç¹å®ã®æ³å¾åéã«å¯¾å¿ãã1ã¤ã¯ãã¹ãè¨è¨ã®å®é¨ç¨ã§ããã
- ä¸é¨ã®è³ªåã¯ãå·ã®æ³æ¹ä¼ãNCBEã«ãã£ã¦æçµã¹ã³ã¢ããé¤å¤ãããå ´åããããåã ã®å·ã®æ³æ¹ä¼ã¨NCBEã¯ãå·å å¤ã®åé¨è ã®ããã©ã¼ãã³ã¹ãè©ä¾¡ããä¸é¨ã®è³ªåãåé¤ããçã®ã¹ã³ã¢ã調æ´ãã¦ç®¡è½åºåéã®ä¸è²«æ§ãç¶æããã
- NCBEã¯è©¦é¨è¨è¨ã¨æºåã®ä¸ç°ã¨ãã¦ã試é¨ã®ããã©ã¼ãã³ã¹ã«é¢ããçµ±è¨æ å ±ãç¶æãã¦ãããå¹³åçãªå¦çã4åä¸1å以ä¸ã誤çããé£æ度ã表ããæããã§ããã
- ãã®ç 究ã®ããã«ãMBEé¨åã®æ¨æºçãªè©¦é¨æºåè³æãNCBEããè³¼å ¥ããç·´ç¿åé¡ã¨æ¨¡æ¬è©¦é¨ãå«ãããããã®è³æã¯åé å¸ã§ããªãããæ¬è«æã®çµæãåç¾ãããç 究è ã¯ãNCBEã®ãªã³ã©ã¤ã³ã¹ãã¢ããç´300USDã§ãããã®ãã¼ã¿ãè³¼å ¥ã§ããã
Methods
- å®é¨è©ä¾¡ã§ã¯ãtext-davinci-003ããã¹ãå®äºAPIã«å¯¾ãã¦ã¼ãã·ã§ããããã³ããã使ç¨ããããã®ã»ã¯ã·ã§ã³ã§ã¯ãããã³ããã®è¨è¨ãå復ãé¢é£ããAPIãã¤ãã¼ãã©ã¡ã¼ã¿ãããã³ã¢ã¼ãã®ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã®è©¦ã¿ã«ã¤ãã¦è©³è¿°ããã
Prompt Engineering and ResponsesÂ
- ããã³ããã¨ã³ã¸ãã¢ãªã³ã°ã¨ã¯ãLLMãæä¾ãããããã³ããã«é常ã«ææã§ããããããã®ãããªããã³ãããä½æãããæè¡ããæãããã®ç 究ã§ã¯ãããã³ããã¨ã³ã¸ãã¢ãªã³ã°ã«å¤§ããåãçµãã ã
- ãã¹ããããããã³ããã¿ã¤ãã«ã¯ã次ã®ãã®ãããï¼
- 1. åä¸é¸æã®ã¿
- 2. åä¸é¸æã¨ãã®çç±ã®èª¬æ
- 3. ä¸ä½2ã¤ã®é¸æã®ã¿
- 4. ä¸ä½2ã¤ã®é¸æã¨ãã®çç±ã®èª¬æ
- 5. ä¸ä½2ã¤ã®é¸æã¨åããã³ãã
- 6. ãã¹ã¦ã®é¸æè¢ã®é ä½ä»ã
- 7. ä¸ä½3ã¤ã®é¸æè¢ã®é ä½ä»ã
- ãããã®ããã³ããéã§çµæã«å¤§ããªéãã¯æ¦ãè¦ãããªãã£ããã以ä¸ã®ããã«ä¸ä½3ã¤ã®é¸æãé ä½ä»ãããæå¾ã®ããã³ããæ¦ç¥ã®ã¿ããã¢ãã«ã®æ£ç¢ºæ§ãå¤§å¹ ã«åä¸ãããã
- GPT-3.5ã®ããã層ã«ç´æ¥æ´å¯ããªãããããªããã®ããã³ããã®å¤æ´ãä»ã®ããã³ããã¨ã¯ç°ãªãæ¹æ³ã§ã¢ãã«ã®æ¯ãèãã«å½±é¿ãä¸ããã®ãã«ã¤ãã¦ããã«ã³ã¡ã³ããããã¨ã¯ã§ããªãã
- ãã®ããã³ããããæãä¸æ£è§£ãæé¤ããé帰çµæ§è½ã¨ã確çç帰çµã¨è¨æ¶ãæé©ã«çµã¿åããããã®ã§ããã¨æ¨æ¸¬ãããã
- ãã¹ã¦ã®æ¨¡æ¬è©¦é¨ã«ããã¦ãããã³ããã¨å®å ¨ãªJSONã¬ã¹ãã³ã¹ï¼OpenAI APIãªã¯ã¨ã¹ãIDãå«ãï¼ãè¨é²ããããããã¹ãå®äºã¬ã¹ãã³ã¹ã®åè¡ã¯è§£æãããæ¡ç¹ã¾ãã¯è³ªçåæã®ããã«ä¿åãããã
- ããå°æ°ã®ã±ã¼ã¹ï¼< 1%ï¼ã§ã¯ããMy first choice is (D)ãã®ãããªèªç¶è¨èªããã©ã¼ãããã®ããªã¨ã¼ã·ã§ã³ãå«ã¾ãã¦ããããããã®ããªã¨ã¼ã·ã§ã³ã¯ãã¼ãµã¼ã®ä¾å¤ã±ã¼ã¹ãéãã¦å¦çããããã¬ã¹ãã³ã¹ã¯äººéã«ãã£ã¦æåã§å¤æ´ããããè©ä¾¡ãããããããã¨ã¯ãªãã£ãã
- æè¡çãªè¦³ç¹ããããããã®ããã³ããã¯ãã¹ã¦ãã¢ãã«ã声æãçå®ãéçå®ããè©ä¾¡ããå¿ è¦ãããå¾æ¥ã®ããã¹ã帰çµã¿ã¹ã¯ã«é¢é£ãã¦ãããã¼ãã·ã§ãã試é¨ã·ãã¥ã¬ã¼ã·ã§ã³ã§ã¯ã帰çµåé¡ã«é¢ããæ¢åã®ç 究ã¨ã¯ç°ãªãã仮説ã主張ãã¾ãã¯ç¥èã®ä½ç³»ã®ãã¬ã¼ãã³ã°ãã»ã¨ãã©å¶å¾¡ã§ããªãã
- GPTå ã«åå¨ãããæ示çã¾ãã¯æ示çãªä»»æã®ç¥èã°ã©ããç¶æ ã¢ãã«ã«ã¤ãã¦ã®æ´å¯ããªããã¾ããããã¤ãã®ã±ã¼ã¹ã§ã¯ã帰çµã®è¦³ç¹ããè¤æ°ã®é¸æãæ£ããå¯è½æ§ããããåé¨è ã¯è©¦é¨è¨è¨ã®ç¥èã«åºã¥ãã¦é¸æãé ä½ä»ãããå¿ è¦ãããããã®ãã¹ãã«ã¯ãåç´ãªäºé 帰çµ/é帰çµåé¡ããããæ¤ç´¢ã¨é¢é£æ§ã¹ã³ã¢ãªã³ã°ã«ä¼¼ãè¦ç´ ãå«ã¾ãã¦ããã
(Hyper)parameters for GPT-3Â
- æ©æ¢°å¦ç¿ã¨è¨ç®ç 究ã®çµæã¯ãä¸è¬çã«ã¢ãã«ã®ãã©ã¡ã¼ã¿ã¼ããã¤ãã¼ãã©ã¡ã¼ã¿ã¼ã«é常ã«ææã§ããããã®ç 究ã§ã¯ãä¸è¨ã®ããã«ããã³ãããå¤åããããã¨ã«å ããã¢ãã«ã®ã温度ãã®ãããªãã¤ãã¼ãã©ã¡ã¼ã¿ã¼ãã¢ãã«ã®æ§è½ã«ã©ã®ããã«å½±é¿ããããè©ä¾¡ããã
- è©ä¾¡ãããã©ã¡ã¼ã¿ã¼ã«ã¯ä»¥ä¸ãå«ã¾ããï¼1. 温度ï¼ãµã³ããªã³ã°ã®æ¸©åº¦ï¼0.0ã¯æ±ºå®è«çãé«ãã»ã©ãã©ã³ãã ãï¼ã2. top pï¼æ ¸ãµã³ããªã³ã°ç¢ºçï¼ã3. best ofï¼ãµã¼ãã¼å´ã§[N]åã®å®äºãçæãããã¼ã¯ã³ãã¨ã®æé«ã®ãã°ç¢ºçãæã¤ãã®ããæè¯ãã¨ãã¦è¿ãï¼ã4. max tokensï¼çæãããã¼ã¯ã³ã®æ大æ°ï¼ã
- 温度ã¯{0.0, 0.25, 0.5, 0.75, 1.0}ãtop pã¯{0.75, 1.0}ãbest ofã¯{1, 2, 4}ãmax tokensã¯èª¬æãªãã®ããã³ããã§ã¯{16, 32}ã説æããã®ããã³ããã§ã¯{128, 256, 1024}ã§ãã¹ãããã
Fine-tuning
- GPT-3.5ã®ãããªLLMã大ããªé¢å¿ãéããä¸å ã¯ããã®ã¼ãã·ã§ããã¾ãã¯ãã¥ã¼ã·ã§ããã®æ§è½ãé常ã«åªãã¦ããããã§ãããããã«ãããããããä¸é¨ã®ç¶æ³ã§ã¯ãLLMã®ä¸é¨ã¾ãã¯å ¨ã¦ã®å±¤ãåãã¬ã¼ãã³ã°ãããã¨ã§æ§è½ãåä¸ããå¯è½æ§ãããã
- OpenAIã¯APIãéãã¦åãã¬ã¼ãã³ã°ãããã¡ã¤ã³ãã¥ã¼ãã³ã°ãã®æ©è½ãæä¾ãã¦ãããå¦ç¿çãããããµã¤ãºãªã©ã®ãã¬ã¼ãã³ã°ããã»ã¹ãããç¨åº¦å¶å¾¡ãããã¨ãã§ããã200åã®æªå ¬éã®æ¨¡æ¬MBEãã¼è©¦é¨åé¡ãç¨ãã¦text-davinci-003ã®ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã試ã¿ããããã¹ã¦ã®ã±ã¼ã¹ã§ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã¢ãã«ã¯text-davinci-003èªä½ã®æ§è½ãå¤§å¹ ã«ä¸åã£ãã
- é«å質ãªãã¼ã¿ã®ä¸è¶³ã¨è©ä¾¡ã®ãããGPTã¢ãã«ã®ãã¡ã¤ã³ãã¥ã¼ãã³ã°ããã以ä¸è¿½æ±ããªãã£ãããããã®çµæã¯ãä»è ã«ãã£ã¦è¦³å¯ãããLLMã®ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãªã¹ã¯ãå¯è½æ§ããããã¨ã示ãã¦ããã
Results
- ç·è¨ã§107åã®ãµã³ãã«è©¦é¨ãå®æ½ããä¸ä½3ã¤ã®é¸æè¢ã®é ä½ä»ãï¼ããã³ããã¹ã¿ã¤ã«ï¼7ï¼ãæãè¯ãæ§è½ã示ããããã®ããã³ããã«ã¤ãã¦41åã®ãµã³ãã«ã©ã³ããã©ã¡ã¼ã¿ã¼çµã¿åããã§åéããã
- GPTã¯å ¨ä½ã®å¤è¢é¸æå¼è©¦é¨ã«ã¯ã¾ã åæ ¼ãã¦ããªããã25%ã®åºæ¬ã©ã³ãã ãã£ã³ã¹çãå¤§å¹ ã«ä¸åããå°ãªãã¨ã2ã¤ã®ã«ãã´ãªã¼ï¼ã¨ããã³ã¹ã¨ãã¼ãï¼ã§å¹³ååæ ¼çã«éãã¦ããã
- å ¨ã«ãã´ãªã¼å¹³åã§ãGPTã¯äººéã®ãã¹ãåé¨è ã«ç´17%é ãã¦ãããããããã¨ããã³ã¹ããã¼ããæ°äºè¨´è¨ã«ããã¦ã¯ãã®å·®ã¯ç¡è¦ã§ãããä¸æ¡ã§ãããã¨ããã³ã¹ã«é¢ãã質åã§ã¯æ¢ã«äººéã¨åçã§ããã
- æ²æ³æ³ãä¸åç£æ³ãå¥ç´æ³ãåæ³ã®æ®ãã®ã«ãã´ãªã¼ã§ã¯ãå·®ã¯ããé¡èã§ãããåæ³ã®å ´åã«ã¯36%ã¾ã§ä¸æãã¦ããããã®æ§è½ã®å·®ã¯ãGPTã®ãã¬ã¼ãã³ã°ãã¼ã¿ããæ¬ å¦ãã¦ããç¥èé åãã¾ãã¯ã¢ãã«ã®å§ç¸®ããã¡ã¤ã³ãã¥ã¼ãã³ã°ä¸ã«åé¤ãããå¯è½æ§ãããã
- GPTã®çãã®ã©ã³ã¯ã¨æ£è§£ã®ç¸é¢ãä½ãå ´åããã®æ³å¾é åã«é¢ããç¥èãçã«æ¬ å¦ãã¦ããã¨èãããããä¸æ¹ã§ãäºçªç®ã¾ãã¯ä¸çªç®ã®é¸æè¢ãæ£ãããªããã¨ãå¤ãå ´åãåé¡ã®è¨è¨ãæ§è½ã®ä½ä¸ã«è²¬ä»»ãããã¨æ¨æ¸¬ã§ãããGPTã®ç¬¬äºããã³ç¬¬ä¸ã®ãã¹ãã¢ã³ãµã¼ã¯æ£è§£ã¨é«ãç¸é¢ã示ãã¦ãããå ¨ã«ãã´ãªã¼ã§ããã2ã®åçã50%ã®åºæ¬ã©ã³ãã ãã£ã³ã¹çãä¸åãã7ã¤ã®ã«ãã´ãªã¼ä¸5ã¤ã§NCBEå ±åå¹³åãè¶ ãã¦ããã
Conclusion and Future Work
- ãã®ç 究ã§ã¯ãNCBEã®ã¢ãã«ãã¼è©¦é¨ã®MBEé¨åã«ãããGPT-3.5ã®å®é¨çè©ä¾¡ãè¨é²ãããGPT-3.5ã¯ããã¹ã¦ã®ããã³ããã¨ãã¤ãã¼ãã©ã¡ã¼ã¿å¤ã«ããã¦ãã©ã³ãã ãªæ¨æ¸¬ã®åºæºçãå¤§å¹ ã«ä¸åã£ãã
- ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãªãã§ãGPT-3.5ã¯ãã¼ã®2ã¤ã®ã«ãã´ãªã¼ã§åæ ¼çãéæãã1ã¤ã®ã«ãã´ãªã¼ã§äººéã®ãã¹ãåé¨è ã¨åçã«ãªã£ããå¯è½ãªé¸æè¢ã®é ä½ä»ãã¯ãã©ã³ãã ãã£ã³ã¹ãè¶ ãã¦æ£è§£ã¨å¼·ãç¸é¢ãã¦ãããæ³çé åã«å¯¾ããä¸è¬çãªç解ã確èªãã¦ããã
- GPT-3.5ã¯ããã®ã¿ã¹ã¯ã«ããã¦ç§ãã¡ã®æå¾ ãå¤§å¹ ã«ä¸åãæ§è½ã示ãããGPTã®ç解ãåè£åçéã®é¸ææ¹æ³ã«ã¤ãã¦ã®è§£éè½åã¯éããã¦ããããé¡ä¼¼ã®åé¡ã®æ´å²ã¯LLMãéããªããã¼è©¦é¨ã«åæ ¼ããå¯è½æ§ãé«ããã¨ãå¼·ã示åãã¦ããã
- GPT-4ãLAIONã®Bloomãã¡ããªã¼ã®ã¢ãã«ã«é¢é£ããé¸è©±ç証æ ã«åºã¥ãã¨ããããä»å¾0ã18ã¶æ以å ã«çºçããå¯è½æ§ãé常ã«é«ããGPT-JãGPT-NeoãBloomãã¡ããªã¼ã®ã¢ãã«ãç¨ããå®é¨è¨è¨ã®åç¾ã¨ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãç¶ããäºå®ã§ãããã¾ããMBEã¯ãã¼è©¦é¨ã®ä¸é¨ã§ãããå°æ¥ã®ç 究ã§ã¯GPT-3.5ããã³ä»ã®ã¢ãã«ãã¨ãã»ã¤ï¼MEEï¼ããã³ç¶æ³å¤æï¼MPTï¼ã»ã¯ã·ã§ã³ã«ãé©ç¨ãã¦è©ä¾¡ããäºå®ã§ããã
Â
ãã¾ããGPTï¼ã«ãããµã³ãã«åé¡ã®è§£èª¬ã¨çãï¼æ¥æ¬èªï¼
åé¡æã«ã¤ãã¦
Question: A man sued a railroad for personal injuries suffered when his
car was struck by a train at an unguarded crossing. A major issue is
whether the train sounded its whistle before arriving at the crossing.
The railroad has offered the testimony of a resident who has lived near
the crossing for 15 years. Although she was not present on the occasion
in question, she will testify that, whenever she is home, the train always
sounds its whistle before arriving at the crossing.
Is the residentâs testimony admissible?
(A) No, due to the residentâs lack of personal knowledge regarding the
incident in question.
(B) No, because habit evidence is limited to the conduct of persons,
not businesses.
(C) Yes, as evidence of a routine practice.
(D) Yes, as a summary of her present sense impressions.Â
ï¼ä»¥ä¸Chatgpt解çï¼
ãã®åé¡ã¯ãããç·æ§ãç¡äººã®äº¤å·®ç¹ã§åè»ã«èªåè»ãè¡çªãããå人çãªæ害ãåãããã¨ã«é¢ãã¦ééä¼ç¤¾ã訴ããã±ã¼ã¹ã«ã¤ãã¦ã§ãã主ãªäºç¹ã¯ãåè»ã交差ç¹ã«å°çããåã«è¦ç¬ãé³´ããããã©ããã§ãã
ãã®ã±ã¼ã¹ã§ãééä¼ç¤¾ã¯15å¹´éãã®äº¤å·®ç¹ã®è¿ãã«ä½ãã§ããä½æ°ã®è¨¼è¨ãæä¾ãã¦ãã¾ãããã®ä½æ°ã¯ãåé¡ã®äºæ ãçºçããæã«ã¯ç¾å ´ã«ãã¾ããã§ããããèªå® ã«ããæã¯ãã¤ãåè»ã交差ç¹ã«å°çããåã«è¦ç¬ãé³´ããã¦ããã¨è¨¼è¨ãã¾ãã
åé¡ã¯ããã®ä½æ°ã®è¨¼è¨ãæ³å»·ã§è¨±å®¹ããããã©ããã§ãã
証è¨ã許容ããããã©ãããå¤æããã«ã¯ã証æ ã®è¦åãç¹ã«ç¿æ £ãã«ã¼ãã³ã®å®è·µã«é¢é£ãã証è¨ã®è¦åãç解ããå¿ è¦ãããã¾ããããã§ã®ãã¤ã³ãã¯ããã®ä½æ°ãç¹å®ã®äºæ ã«ã¤ãã¦ç´æ¥ã®ç¥èãæã£ã¦ããªãã¦ããé·å¹´ã«ããã観å¯ããåè»ã®ä¸è²«ããè¡åãã¿ã¼ã³ï¼ã¤ã¾ãã交差ç¹ã«å°çããåã«è¦ç¬ãé³´ããã¨ããã«ã¼ãã³ï¼ã«åºã¥ãã¦è¨¼è¨ãããã¨ãã§ãããã©ããã§ãã
åããããã説æããã¨ããã®åé¡ã¯ãéå»ã®ä¸è²«ããè¡åï¼ãã®å ´åã¯åè»ãè¦ç¬ãé³´ããç¿æ £ï¼ãåºã«ãç¹å®ã®äºä»¶ã«ããã¦åãè¡åãåãããã¨æ¨æ¸¬ãã証è¨ããè£å¤ã§è¨¼æ ã¨ãã¦ä½¿ãããã©ãããåãã¦ãã¾ãã
Â
çã
ãã®åé¡ã®æ£è§£ã¯ãï¼C)ä½æ°ã®è¨¼è¨ããã¯ããã«ã¼ãã³ã®å®è·µã¨ãã¦ã®è¨¼æ ãã¨ãã¦è¨±å®¹ããããã¨ãããã®ã§ãããããç解ããããã«ã証æ æ³ã®ããã¤ãã®åºæ¬ååãè¦ã¦ã¿ã¾ãããã
-
ç´æ¥çãªç¥èã®å¿ è¦æ§: é常ã証è¨ããããã«ã¯ããã®åºæ¥äºã«ã¤ãã¦ã®ç´æ¥çãªç¥èãå¿ è¦ã§ãããããããã®ã±ã¼ã¹ã§ã¯ãä½æ°ã¯ç¹å®ã®äºæ ã«ã¤ãã¦ç´æ¥çãªç¥èãæã£ã¦ãã¾ããã彼女ã¯äºæ ãèµ·ãã£ãæã«ç¾å ´ã«ããããã§ã¯ããã¾ããã
-
ç¿æ £ã¾ãã¯ã«ã¼ãã³ã®å®è·µã«é¢ãã証æ : 証æ æ³ã¯ãå人ã¾ãã¯çµç¹ã®ç¿æ £ãã«ã¼ãã³ã®å®è·µã«åºã¥ãè¡åã証æããããã«ããã®ãããªç¿æ £ãã«ã¼ãã³ã®è¨¼æ ã許容ãããã¨ãããã¾ããããã¯ããã®ç¿æ £ãã«ã¼ãã³ãä¸è²«ãã¦è¡ããã¦ããå ´åãç¹å®ã®å ´é¢ã§åæ§ã®è¡åãåãããå¯è½æ§ãé«ãã¨æ¨æ¸¬ããããã§ãã
ãã®ã±ã¼ã¹ã§è¨ãã°ãä½æ°ãéå»15å¹´éã«ããã£ã¦ãèªå® ã«ããæã¯å¸¸ã«åè»ã交差ç¹ã«å°çããåã«è¦ç¬ãé³´ããã¦ããã¨ããä¸è²«ããè¡åãç®æãã¦ããã¨ãã証è¨ã¯ãééä¼ç¤¾ããã®äº¤å·®ç¹ã§åè»ãè¿ã¥ãéã«è¦ç¬ãé³´ããã¨ããã«ã¼ãã³ã®å®è·µãæã£ã¦ãããã¨ã示ããã®ã§ãã
ãã®çç±ããã彼女ã®è¨¼è¨ã¯ãã«ã¼ãã³ã®å®è·µãã¨ãã¦è¨¼æ ã«ãªãå¾ãã®ã§ããã¤ã¾ãã彼女ã¯ç¹å®ã®äºæ ã«ã¤ãã¦ã®ç´æ¥çãªç¥èãæã£ã¦ããªãã¦ããåè»ã交差ç¹ã«å°çããåã«è¦ç¬ãé³´ããã¨ããééä¼ç¤¾ã®ä¸è²«ããè¡åãã¿ã¼ã³ã«ã¤ãã¦è¨¼è¨ãããã¨ãã§ããããã訴è¨ã®æèã§éè¦ãªæ å ±ã¨ãªãå¯è½æ§ãããã¾ãã
ãã®çè«ã«åºã¥ããä½æ°ã®è¨¼è¨ã¯ãééä¼ç¤¾ãç¹å®ã®æ¥ã«è¦ç¬ãé³´ããããã©ãããç´æ¥çã«ã¯è¨¼æãã¾ããããåè»ã交差ç¹ã«è¿ã¥ãéã«è¦ç¬ãé³´ããã¨ããä¸è²«ããã«ã¼ãã³ããããã¨ã示ã証æ ã¨ãã¦è¨±å®¹ããã¾ããããããé¸æè¢(C)ãæ£ããçç±ã§ãã
Â
Â
Â
Â
è«æè¦ç´ï¼The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Maâ Hongyu Wangâ Lingxiao Ma Lei Wang Wenhui Wang
Shaohan Huang Li Dong Ruiping Wang Jilong Xue Furu WeiâÂ
ãã®ãã¼ã¸ã®å³é¢ã»è¡¨ã®æ¨©å©ã¯å ¨ã¦è«æã®èè ãã«å¸°å±ãããã¾ãã
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- ãã®è«æãä¸è¡ã§ããã¨
- Abstract
- The Era of 1-bit LLMsÂ
- BitNet b1.58
- Result
- Energy
- Discussion and Future Work
- èªå¥èª¬æ
Â
ãã®è«æãä¸è¡ã§ããã¨
BitNet b1.58: ã¡ã¢ãªã¨ã¨ãã«ã®ã¼ãåæ¸ãããã©ã¼ãã³ã¹ãç¶æãè¨èªã¢ãã«ã®æ°æ代ã¸
Â
Abstract
-
ç®çï¼æ°ä¸ä»£ã®1ããã大è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã§ããBitNet b1.58â»ï¼ãç´¹ä»ããé«æ§è½ãã¤ã³ã¹ãå¹çã®è¯ãLLMã®éçºã«åããæ°ããªã¹ã±ã¼ãªã³ã°æ³åã¨è¨ç·´ã¬ã·ããå®ç¾©ãããã¨ã
-
ææ³ï¼BitNet b1.58ã¯ãLLMã®åãã©ã¡ã¼ã¿ï¼ã¾ãã¯éã¿ï¼ã{-1, 0, 1}ã®ã¿ã§è¡¨ç¾ãã1ãããLLMãéçºã
-
çµæï¼BitNet b1.58ã¯ãåãã¢ãã«ãµã¤ãºã¨ãã¬ã¼ãã³ã°ãã¼ã¯ã³ãç¨ããå®å ¨ç²¾åº¦Transformer LLMã¨æ¯è¼ãã¦ãå°æ度ã¨ã¨ã³ãã¿ã¹ã¯ã§åçã®æ§è½ã示ããªãããã¬ã¤ãã³ã·ãã¡ã¢ãªãã¹ã«ã¼ããããã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§é¡èã«ã³ã¹ãå¹çãè¯ããã¨ã示ãããã
- çµè«ï¼1.58ãããLLMã¯ãé«æ§è½ãã¤ã³ã¹ãå¹çã®è¯ãæ°ä¸ä»£ã®LLMãè¨ç·´ããããã®æ°ããªã¹ã±ã¼ãªã³ã°æ³åã¨ã¬ã·ããæä¾ãã1ãããLLMã«æé©åãããç¹å®ã®ãã¼ãã¦ã§ã¢ã®è¨è¨ã«åããæ°ããªæ代ãç¯ãã
The Era of 1-bit LLMsÂ
- è¿å¹´ãAIåéã§ã¯å¤§è¦æ¨¡è¨èªã¢ãã«ï¼LLMï¼ã®ãµã¤ãºã¨è½åãæ¥éã«æé·ããå¤æ§ãªèªç¶è¨èªå¦çã¿ã¹ã¯ã§é¡èãªæ§è½ã示ãã¦ãããããã®ãµã¤ãºã®å¢å ã¯å±éã«ããã課é¡ãçããããé«ãã¨ãã«ã®ã¼æ¶è²»ã«ããç°å¢ããã³çµæ¸ã¸ã®å½±é¿ã«å¯¾ããæ¸å¿µãå¼ãèµ·ããã¦ããã
- ãããã®èª²é¡ã«å¯¾å¦ããä¸ã¤ã®ã¢ããã¼ãã¯ããã¹ããã¬ã¼ãã³ã°éååãç¨ãã¦æ¨è«ã®ããã®ä½ãããã¢ãã«ãä½æãããã¨ã§ãããããã«ããéã¿ã¨ã¢ã¯ãã£ãã¼ã·ã§ã³ã®ç²¾åº¦ãä¸ããLLMã®ã¡ã¢ãªã¨è¨ç®è¦æ±ãå¤§å¹ ã«åæ¸ããã
- BitNetãã¯ããã¨ãã1ãããã¢ãã«ã¢ã¼ããã¯ãã£ã®æè¿ã®ç 究ã¯ãæ§è½ãç¶æãã¤ã¤LLMã®ã³ã¹ããåæ¸ããææãªæ¹åæ§ã示ãã¦ãããBitNetã®è¡åä¹ç®ã§ã¯æ´æ°å ç®ã®ã¿ãè¡ããLLMã®ã¨ãã«ã®ã¼ã³ã¹ããå¤§å¹ ã«ç¯ç´ããã
- ãã®ç 究ã§ã¯ãåãã©ã¡ã¼ã¿ãä¸å¤{-1, 0, 1}ãåã1ãããLLMã®ããªã¢ã³ãã§ããBitNet b1.58ãç´¹ä»ããããã«ããã¡ã¢ãªæ¶è²»ãã¹ã«ã¼ãããâ»2ãã¬ã¤ãã³ã·â»3ã®é¢ã§FP16 LLMãã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦å¤§å¹ ã«å¹ççã§ãããã¨ãããã«ã¯ç¹å¾´ãã£ã«ã¿ãªã³ã°ãå¯è½ã«ãã0ã®å°å ¥ã«ãã1ãããLLMã®æ§è½ãå¤§å¹ ã«åä¸ãããªã©ã®è¿½å çãªå©ç¹ã示ãã
BitNet b1.58
- BitNet b1.58ã¯ãnn.LinearãBitLinearã«ç½®ãæããTransformerã§ããBitNetã¢ã¼ããã¯ãã£ã«åºã¥ãã¦ããã1.58ãããã®éã¿ã¨8ãããã®ã¢ã¯ãã£ãã¼ã·ã§ã³ã§ã¼ãããè¨ç·´ãããã
- éã¿ã-1ã0ã+1ã«å¶éããããã«ãabsmeanéååé¢æ°ãæ¡ç¨ãã¦ãããããã¯ãéã¿è¡åããã®å¹³å絶対å¤Î³ã§ã¹ã±ã¼ãªã³ã°ãã次ã«åå¤ã{-1, 0, +1}ã®ä¸ã§æãè¿ãæ´æ°ã«ä¸¸ããï¼Round Clip)ã
- ã¢ã¯ãã£ãã¼ã·ã§ã³ã®éååé¢æ°ã¯BitNetã¨åæ§ã«å®è£ ããã¦ããããéç·å½¢é¢æ°ã®åã«ã¢ã¯ãã£ãã¼ã·ã§ã³ã[0, Qb]ã®ç¯å²ã«ã¹ã±ã¼ãªã³ã°ããã®ã§ã¯ãªãããã¼ã¯ã³ãã¨ã«[âQb, Qb]ã«ã¹ã±ã¼ãªã³ã°ãã¦ã¼ããã¤ã³ãéååãæé¤ããã
LLaMA-alike Components.
- BitNet b1.58ã®ã¢ã¼ããã¯ãã£ã¯ããªã¼ãã³ã½ã¼ã¹ã®LLMã®ããã¡ã¯ãã¹ã¿ã³ãã¼ãã§ããLLaMAã®ã³ã³ãã¼ãã³ããæ¡ç¨ãã¦ãããRMSNormãSwiGLUããã¼ã¿ãªã¼ã¨ã³ããã£ã³ã°ã使ç¨ãããã¹ã¦ã®ãã¤ã¢ã¹ãåãé¤ãã¦ãããããã«ãããBitNet b1.58ã¯ãHuggingfaceãvLLMãllama.cppãªã©ã®äººæ°ã®ãããªã¼ãã³ã½ã¼ã¹ã½ããã¦ã§ã¢ã«æå°éã®åªåã§çµ±åã§ããã
Result
- BitNet b1.58ã¨åç¾ããFP16 LLaMA LLMãæ§ã ãªãµã¤ãºã§æ¯è¼ããRedPajamaãã¼ã¿ã»ããã§1000åãã¼ã¯ã³ã«å¯¾ãã¦äºåè¨ç·´ãè¡ããå ¬å¹³ãªæ¯è¼ãå®æ½ã
- è¨èªã¿ã¹ã¯ã®ç¯å²ã«ãããã¼ãã·ã§ããæ§è½ãè©ä¾¡ããWikiText2ã¨C4ãã¼ã¿ã»ããã®æ¤è¨¼å°æ度ãå ±åããã
- BitNet b1.58ã¯ã3Bã¢ãã«ãµã¤ãºã§å®å ¨ç²¾åº¦ã®LLaMA LLMã¨å°æ度ã®é¢ã§ä¸è´ãã2.71åéããGPUã¡ã¢ãªã3.55åå°ãªã使ç¨ããã
- BitNet b1.58 3.9Bã¯ãLLaMA LLM 3Bãããé¡èã«åªãã¦ããã2.4åéããã¡ã¢ãªæ¶è²»ã¯3.32åå°ãªãããã¨ã³ãã¿ã¹ã¯ã®ç²¾åº¦ã§ã¯ä¸è´ã¾ãã¯ãããä¸åãæ§è½ã示ãã
Â
- ãããã®çµæã¯ãBitNet b1.58ãç¾è¡ã®æå 端LLMã¢ãã«ã«å¯¾ãã¦ãã¬ã¼ãæ¹åï¼æªããªãã¨ããã®ãªãæ¹åï¼ãå®ç¾ãã¦ãããã¨ã示ãã¦ããã
Memory and Latency
-
ã¢ãã«ãµã¤ãºã7Bã13Bã70Bã«æ¡å¤§ããã³ã¹ããè©ä¾¡ããçµæãã¢ãã«ãµã¤ãºãã¹ã±ã¼ã«ããã«ã¤ãã¦ãé度åä¸ãå¢å ããç¹ã«BitNet b1.58 70Bã¯LLaMA LLMãã¼ã¹ã©ã¤ã³ããã4.1åéãã
-
ã¡ã¢ãªæ¶è²»ãåæ§ã®å¾åã示ãã大ããªã¢ãã«ã»ã©ã¡ã¢ãªå¹çãè¯ããªããembedding layerãå®å ¨ç²¾åº¦ã®ã¾ã¾ã ãã大ããªã¢ãã«ã»ã©ãã¢ãã«å ¨ä½ã«å¯¾ãããembedding layerã®å²åãå°ãããªãããã§ããã両æ¹ã®ã¬ã¤ãã³ã·ã¨ã¡ã¢ãªã¯2ãããã«ã¼ãã«ã§æ¸¬å®ããã¦ãããã³ã¹ããããã«åæ¸ããããã®æé©åã®ä½å°ãããã
Energy
-
BitNet b1.58ã¯è¡åä¹ç®ã«ãããç®è¡æ¼ç®ã¨ãã«ã®ã¼æ¶è²»ã71.4ååæ¸ããã¢ãã«ãµã¤ãºãã¹ã±ã¼ã«ããã«ã¤ãã¦FP16 LLaMA LLMãã¼ã¹ã©ã¤ã³ã¨æ¯è¼ãã¦ã¨ãã«ã®ã¼æ¶è²»ã®å¹çãåä¸ããã
ThroughputÂ
- BitNet b1.58 70Bã¯LLaMA LLMã¨æ¯è¼ãã¦æ大11åã®ããããµã¤ãºããµãã¼ãã§ãã8.9åé«ãã¹ã«ã¼ããããå®ç¾ããã
Â
- BitNet b1.58ã¯ãã¢ãã«ã®æ§è½ã¨æ¨è«ã³ã¹ãã«é¢ããæ°ããã¹ã±ã¼ãªã³ã°æ³åãå¯è½ã«ãã¦ãããç°ãªãã¢ãã«ãµã¤ãºéã§ã®ç価æ§ã以ä¸ã®ããã«æä¾ããã
- 13B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã3B FP16 LLMãããå¹ççã§ããã
- 30B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã7B FP16 LLMãããå¹ççã§ããã
- 70B BitNet b1.58ã¯ãã¬ã¤ãã³ã·ãã¡ã¢ãªä½¿ç¨éãã¨ãã«ã®ã¼æ¶è²»ã®é¢ã§ã13B FP16 LLMãããå¹ççã§ããã
Training with 2T Tokens
- 2Tãã¼ã¯ã³ã§ã®è¨ç·´ã§ã¯ãBitNet b1.58ãStableLM-3Bã®ãã¼ã¿ã¬ã·ãã«å¾ã£ã¦2Tãã¼ã¯ã³ã§è¨ç·´ããWinograndeãPIQAãSciQãLAMBADAãARC-easyã§æ§æããããã³ããã¼ã¯ã§è©ä¾¡ããã
- BitNet b1.58ã¯ããã¹ã¦ã®ã¨ã³ãã¿ã¹ã¯ã§åªããæ§è½ãéæãã1.58ãããLLMãå¼·åãªä¸è¬åè½åãæã£ã¦ãããã¨ã示ãã¦ããã
Discussion and Future Work
1-bit Mixture-of-Experts (MoE) LLMs
- Mixture-of-Expertï¼MoEï¼LLMã¯ãè¨ç®FLOPsãå¤§å¹ ã«åæ¸ãã¤ã¤ãé«ãã¡ã¢ãªæ¶è²»ã¨ãããééä¿¡ã®ãªã¼ãã¼ããããå±éã¨ã¢ããªã±ã¼ã·ã§ã³ãå¶éãããããããã®èª²é¡ã¯1.58ãããLLMã«ãã£ã¦è§£æ±ºå¯è½ã§ãããããã«ãããMoEã¢ãã«ãå±éããããã«å¿ è¦ãªããã¤ã¹æ°ãæ¸å°ãããããã¯ã¼ã¯ãä»ãã¦ã¢ã¯ãã£ãã¼ã·ã§ã³ã転éãããªã¼ãã¼ããããå¤§å¹ ã«åæ¸ãããã
Native Support of Long Sequence in LLMs
- é·ãã·ã¼ã±ã³ã¹ã®ãã¤ãã£ããµãã¼ãã¯ãKVãã£ãã·ã¥â»4ã«ããã¡ã¢ãªæ¶è²»ãé·ãã·ã¼ã±ã³ã¹æ¨è«ã®ä¸»ãªèª²é¡ã§ããããBitNet b1.58ã¯16ããããã8ãããã¸ã®ã¢ã¯ãã£ãã¼ã·ã§ã³ã®åæ¸ã«ãããåããªã½ã¼ã¹ã§ã³ã³ããã¹ãã®é·ãã2åã«ãããã¨ã§ãé·ãã·ã¼ã±ã³ã¹ã®ãµãã¼ãã«åããéè¦ãªã¹ãããã表ãã
LLMs on Edge and Mobile
- 1.58ãããLLMã®ä½¿ç¨ã¯ãã¡ã¢ãªã¨è¨ç®è½åã«å¶éãããã¨ãã¸ããã³ã¢ãã¤ã«ããã¤ã¹ä¸ã§ã®è¨èªã¢ãã«ã®æ§è½ãå¤§å¹ ã«åä¸ãããå¯è½æ§ããããããã«ããããã¾ã§ä¸å¯è½ã ã£ãã¢ããªã±ã¼ã·ã§ã³ãå¯è½ã«ãªããã¨ãã¸ããã³ã¢ãã¤ã«ããã¤ã¹ã®è½åãå¤§å¹ ã«åä¸ããã
New Hardware for 1-bit LLMs
- 1ãããLLMç¨ã®æ°ãããã¼ãã¦ã§ã¢ã«ã¤ãã¦ã¯ãGroqã®ãããªæè¿ã®ç 究ãLLMç¨ã®ç¹å®ãã¼ãã¦ã§ã¢ï¼ä¾ãã°ãLPUï¼ã®æ§ç¯ã«ããã¦ææãªçµæã¨å¤§ããªå¯è½æ§ã示ãã¦ãããBitNetãå¯è½ã«ããæ°ããè¨ç®ãã©ãã¤ã ã«ç¹åãã¦æé©åãããæ°ãããã¼ãã¦ã§ã¢ã¨ã·ã¹ãã ã®è¨è¨ã«åããè¡åãå¼ã³ãããã
Â
ãã®è«æã®ç¤ã¨ãªãBitNetã®è¦ç´
reseachpaper-matome.hatenablog.com
Â
èªå¥èª¬æ
â»1 ãªã1.58? ã»ã»ã»ï½1ï¼0ã-1ï½ã®å¤ããããã1/3ã§åºç¾ããå ´åã®å¹³åæ å ±éã1.58
â»2 ã¹ã«ã¼ãããã»ã»ã»åä½æéãããã«å¦çã¾ãã¯ä¼éã§ãããã¼ã¿ã®é
â»3 ã¬ã¤ãã³ã·ã»ã»ã»ããã·ã¹ãã ããããã¯ã¼ã¯å ã§å¦çããã¼ã¿ãä¼éãããã®ã«è¦ããæéé 延ã®ãã¨
â»4 KVãã£ãã·ã¥ã»ã»ã»Key-Valueï¼ãã¼-å¤ï¼ãã£ãã·ã¥ã®ç¥ã§ããã¼ã¿ããã¼ã¨å¤ã®ãã¢ã¨ãã¦ä¿åããä¸ç¨®ã®ãã¼ã¿ã¹ãã¬ã¼ã¸ã¾ãã¯ãã£ãã·ã¥ã¡ã«ããºã