èæ¯
ã¯ããã¾ãã¦ãJXé信社ã§ã¤ã³ã¿ã¼ã³ããã¦ããåç°ã§ãã
è¿å¹´æ·±å±¤å¦ç¿ã§ã¯ã¢ãã«ãè¥å¤§åããå¾åã«ããã¾ãã2020å¹´ã«open aiã示ããScaling Lawsï¼[2001.08361] Scaling Laws for Neural Language Modelsï¼ ã®è¡æã¯è¨æ¶ã«æ°ãããMLP-Mixerã示ããããã«ãã¢ãã«ã大ããããã°Attentionæ§é ãCNNã§ãããä¸å¿ è¦ã¨ãã説ãããã¾ããï¼[2105.01601] MLP-Mixer: An all-MLP Architecture for Visionï¼
ããã大ããªæ·±å±¤å¦ç¿ã¢ãã«ãå©ç¨ãããã¨ããã¨ããã°ãã°ä»¥ä¸ã®ãããªåé¡ã«æ©ã¾ããã¾ãã
- æ¨è«é度ãåé¡ã§ãããã¯ãã«å®è£ ä¸å¯è½
- GPU/TPUã¯ã³ã¹ãä¸å³ãã
- ãããã¯ãã®æ§è³ªä¸ãããå¦çãä¸å¯è½ï¼å¹ççã«GPU/TPUãå©ç¨ã§ããªãï¼
ä¾ãã°JXé信社ã®å¼·ã¿ã¯ãéå ±æ§ãã«ããããããããå¦çãå°é£ã§ãããå¹ççãªGPU/TPUå©ç¨ãå°é£ã§ãã
ããããæ©æ¢°å¦ç¿ã¢ãã«ã®ç²¾åº¦ã¯ãããã¯ãã®UXã¨ç´çµãããããããªãã¨ãCPUä¸ã§å¤§ããªã¢ãã«ãé«éã«æ¨è«ãããããã¨ããã¢ããã¼ã·ã§ã³ãçºçãã¾ãã
æ¬è¨äºã¯ä»¥ä¸ã®ãããªèæ¯ãã大ããªNLPã¢ãã«ã®ä»£è¡¨æ ¼ã§ããBERTãå©ç¨ãã¦åé«éåææ³ãæ¤è¨¼ãã¾ãã ããã«å¤ãã®é«éåææ³ã§ã¯æ¨è«é度ã¨ç²¾åº¦ã®ãã¬ã¼ããªããåå¨ãããã®ãã¬ã¼ããªãã«æ³¨ç®ãã¦æ¤è¨¼ãè¡ãã¾ãã
å®éã«èªåã¯ä¸è¨ã§ç´¹ä»ããæ¹æ³ãçµã¿åãããçµæãBERTã®æ¨è«é度ãæå¤§ç´10åã¾ã§åä¸ãããé«éã«åä½ããããã¨ã«æåãã¾ããï¼
ã¾ã¨ã
ä»åæ¤è¨¼ããåé«éåææ³ã®åè©ä¾¡ã¯ä»¥ä¸ã«ãªãã¾ãã
ï¼â > â > â > â³ ã®é ã§è¯ãï¼
ãã ããã¿ã¹ã¯ã«ãã£ã¦åææ³ã®æå¹æ§ã大ããå¤ããã®ã§å®éã«é«éåãå³ãéã«ã¯ããã®é½åº¦ä¸å¯§ãªæ¤è¨¼ãå¿ è¦ã§ãã
åææ³ã®èª¬æã¨å®è£ ã³ã¼ã
以ä¸ããç°¡åã«åé«éåææ³ã®æ¦è¦ã¨å®è£ ã³ã¼ãã解説ãã¾ãã
- pruning, quantization, distillation, torchscriptã¯NLP以å¤ã§ãå©ç¨å¯è½ãªææ³
- max_lengthã¯NLPã¢ãã«ã§ããã°å©ç¨å¯è½ãªææ³ã§ã
- åçãªmax_lengthã¯ããããµã¤ãº==1ã§æ¨è«ããã¨ãã«å©ç¨å¯è½ãªææ³ã§ãã
quantizationï¼éååï¼
éååã¨ã¯ãæµ®åå°æ°ç¹ç²¾åº¦ãããä½ããããå¹ ã§è¨ç®ãè¡ã£ããããã³ã½ã«ãæ ¼ç´ãããããæè¡ã®ãã¨ã§ããfloat32ããint8ã¸å¤æãããã¨ãä¸è¬çã§ãã
ããã§ã¯pytorchå ¬å¼ãåèã«ãã¾ããã
Pytorchã§ã¯ä»¥ä¸ã®ä¸ç¨®é¡ã®éååãç¨æããã¦ãããä»åã¯æãç°¡åãªdynamic quantizationãå¦ç¿æ¸ã¿ã¢ãã«ã«é©å¿ãã¾ãã
- dynamic quantizationï¼åçéååï¼...weightsã®ã¿éååããæ´»æ§åã¯floatã§èªã¿æ¸ããè¡ããå¦ç¿æ¸ã¿ã¢ãã«ã«ãã®ã¾ã¾é©å¿ããè¨ç®ãè¡ãã
- static quantizationï¼åçéååï¼...weightsã¨æ´»æ§åã両æ¹éååãããå¦ç¿å¾ã«ãã£ãªãã¬ã¼ã·ã§ã³ãå¿ è¦ã§ããã
- quantization aware training ...weightsã¨æ´»æ§åã両æ¹éååããããã¬ã¼ãã³ã°ä¸ããéååããããªãã
å®è£ ã³ã¼ãã¯ä»¥ä¸ã«ãªãã¾ãã 以ä¸ã®ã³ã¼ãã§ã¯ãBERTã®nn.Linearã®éã¿ãfloat32âint8ã«å¤æãã¦ãã¾ãã
def quantize_transform(model: nn.Module)ã-> nn.Module:: model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.qint8 ) return model
distillationï¼è¸çï¼
è¸çã¯å¤§ããªã¢ãã«ãæå¸«ã¢ãã«ã¨ããæå¸«ã¢ãã«ããå°ããªã¢ãã«ã使ããææ³ã§ãã ç¹ã«BERTã®è¸ççã¢ãã«ã¯DistilBERTï¼https://arxiv.org/pdf/1910.01108.pdfï¼ ã¨ãã¦ç´¹ä»ããã¦ãã¾ãã
BERT-baseã¯transformerã12層å©ç¨ãã¦ãã¾ãããDistilBERTã¯ãã®ååã®6層ã®transformerãæã£ãæ§é ã«ãªã£ã¦ãã¾ãã
ã¾ããæå¤±é¢æ°ã¯ä»¥ä¸ã®ä¸ã¤ããæ§æããã¦ãããè§£éã¨ãã¦ãmasked language taskï¼åèªç©´åãåé¡ï¼ ãããªããªãããæå¸«ã¢ãã«ã¨è¿ãåºåã¨éã¿ãç²å¾ãããã¨æãããã¨ãã§ãã¾ãã
- BERTã®outputã¨ã®è¿ã
- masked language taskã§ã®æå¤±
- BERTã®ãã©ã¡ã¼ã¿ã¨ã®ã³ãµã¤ã³é¡ä¼¼åº¦
ä»åã®å®é¨ã§ã¯ãã³ãã¤ãã ã³ãå ¬éãã¦ããæ¥æ¬èªçdistillbertã¢ãã«ãå©ç¨ãã¾ããã https://huggingface.co/bandainamco-mirai/distilbert-base-japanese
huggingfaceã®transformersãå©ç¨ãããã¨ã§ã¨ã¦ãç°¡åã«ä½¿ããã¨ãã§ãã¾ãã
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("bandainamco-mirai/distilbert-base-japanese") model = AutoModel.from_pretrained("bandainamco-mirai/distilbert-base-japanese")
pruningï¼åªå®ï¼
ã¢ãã«ã®éã¿ã®ä¸å®å²åã§0ã«ããææ³ã§ãã¢ãã«ãã¹ãã¼ã¹ã«ãããã¨ãã§ãã¾ãã
ããã§ãpytorchå ¬å¼ã®tutorialã«æ²¿ã£ã¦å®è£ ãã¾ãã
ã©ã®éã¿ãåªå®ãããã¯ãã¾ãã¾ãªç ç©¶ãããã¾ãããããã§ã¯ä¸è¨tutorialã§ç´¹ä»ããã¦ããL1ãã«ã åºæºã§åãææ³ãç¨ãã¾ããã絶対å¤ãå°ããéã¿ã¯éè¦åº¦ãä½ãã¨èãããããã0ã«ãã¦ãã¾ãã¨ããçºæ³ã¯ã¨ã¦ãç´æçã§ãã
å®è£ ã³ã¼ãã¯ä»¥ä¸ã«ãªãã¾ãã
import torch.nn.utils.prune as prune PRUNE_RATE = 0.2 def prune_transform(model: nn.Module) -> nn.Module: for name, module in model.named_modules(): if isinstance(module, torch.nn.Linear): prune.l1_unstructured(module, name='weight', amount=PRUNE_RATE) prune.remove(module, "weight") return model
ä¸è¨ã®ã³ã¼ãã§ã¯ã¢ãã«ä¸ã®nn.Linearã®éã¿ã®ãã¡ã絶対å¤ãå°ãããã®ãã20%ã0ã«ç½®ãæããã¨ããå¦çã«ãªãã¾ãã
ä»åã¯è¤æ°ã®PRUNE_RATEã§æ¨è«é度ã¨ç²¾åº¦ã®å¤åãå®é¨ãã¾ããã
torchscriptï¼Jitï¼
TorchScriptã¯ãPyTorchã®ã³ã¼ãããã·ãªã¢ã©ã¤ãºå¯è½ã§æé©åå¯è½ãªã¢ãã«ã使ããææ³ã§ããPython以å¤ã®C++çã®ã©ã³ã¿ã¤ã ã§å®è¡å¯è½ã«ãªãã¾ãã
Pytorhã¯define by runæ¹å¼ãæ¡ç¨ãã¦ãããåçã«è¨ç®ã°ã©ãã使ãã¾ããå¦ç¿æã«ã¯éå¸¸ã«æç¨ãªãã®å½¢å¼ã§ããããããã¯ã·ã§ã³ä¸ã®æ¨è«æã«ãããæ©æµã¯ã»ã¨ãã©ããã¾ããã
ããã§ãå ã«ãã¼ã¿ãæµãã¦ã³ã³ãã¤ã«ãã¦ãã¾ãã¾ããï¼å®è¡æã³ã³ãã¤ã©ã使ããï¼ã¨ããã®ã大ã¾ããªçºæ³ã§ãã
ãã詳細ãªè§£èª¬ã¯ä»¥ä¸ã®è¨äºãé常ã«ããããããã§ãã
ç°¡åã«è§£èª¬ããã¨ã - Torchscriptã¯ä¸é表ç¾ã³ã¼ã - ãã®ä¸é表ç¾ã¯å é¨çã«æé©åããã¦ãããå®è¡æã« pytorchã®å®è¡æã³ã³ãã¤ã©ã§ããPyTorch JIT compilationãå©ç¨ããã - PyTorch JIT compilationã¯pythonã©ã³ã¿ã¤ã ããç¬ç«ãã¦ãããå®è¡æã®æ å ±ãç¨ãã¦ä¸é表ç¾ãæé©åãã
å®è£ ã³ã¼ãã¯ä»¥ä¸ã«ãªãã¾ãã torchscriptã«ã¯traceã¨scriptã®äºã¤ã®ä½ææ¹æ³ãããã¾ãããããã§ã¯å¾ããã§ãç°¡åã«ä½æã§ããtraceãç¨ãã¾ãã
def torchscript_transform(model): model = torch.jit.trace(model, (SANPLE_INTPUT)) return model
max_length
inputã®max_lengthãå¶éãã¦å ¥åãã¼ã¿ã軽ããã¾ãã transformersã§åå¦çãè¡ãå ´åã以ä¸ã®ãããªå®è£ ã«ãªãã¾ãã
from transformers import BertTokenizer MAX_LENGTH = 512 tokenizer = BertTokenizer.from_pretrained("hoge_pretrain") data = tokenizer.encode_plus( TEXT, add_special_tokens=True, max_length=MAX_LENGTH, padding="max_length", truncation=True, return_tensors="pt", )
do_not_pad
ãã®ææ³ã¯batch_size==1ã§æ¨è«ããå ´åã«å©ç¨å¯è½ãªææ³ã§ãã
é常batchæ¨è«ãããããã«å ¥åãã¼ã¿ã®paddingãå¿ è¦ã§ãããbatch_size==1ã®ç¶æ³ä¸ã§ã¯ããã£ã³ã°ãè¡ããã«æ¨è«ãããã¨ãã§ãã¾ãã
å®è£ ã¯ä»¥ä¸ã«ãªãã¾ããpadding弿°ã«'do_not_pad'ãè¨å®ããã ãã§ãã
from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("hoge_pretrain") data = tokenizer.encode_plus( TEXT, add_special_tokens=True, max_length=512, padding="do_not_pad", truncation=True, return_tensors="pt", )
å®é¨æ¹æ³
ä»åã®å®é¨ã¯ç²¾åº¦ã¨é度ã®ãã¬ã¼ããªããæ¸¬å®ãããã¨ã主ç¼ã§ãããããä¸å¯§ã«ç²¾åº¦ã®èª¿æ»ãè¡ãã¾ãã
ç°å¢
å®è¡ç°å¢ã¯google colabã§çµ±ä¸ãã¦ããã¾ãã
Dataset
å¾è¿°ãã¾ããããã¼ã¿ã»ããã«ãã£ã¦æå¹ãªææ³ãå¤ããããç¹æ§ãç°ãªãè¤æ°ã®ãµã³ãã«ã¿ã¹ã¯ãç¨æãã¾ããã
- 䏿ãé·ããã¼ã¿ã»ãã(livedoorãããã¯åé¡)
- 䏿ãçããã¼ã¿ã»ãã(twitterææ åé¡ããã¸ãã¬ã®2å¤åé¡ã§æ¤è¨¼)
modelã«ã¤ãã¦
以ä¸ã®è¨å®ã§fine tune
- epoch: 30ï¼patient==5ï¼
- optimizer: Adam(lr1==0.00005,lr2==0.0001)
- max_lengthã¯twitter taskã§ã¯128, livedoorã§ã¯512
base model :æ±å大å¦ï¼cl-tohoku/bert-base-japanese-whole-word-masking · Hugging Faceï¼
- æçµ4層ã®CLSãã¼ã¯ã³ã®ãã¯ãã«ãconcatãã¦å©ç¨ï¼åèï¼Google QUEST Q&A Labeling | Kaggleï¼
- æçµ4層ã¨classification headãfine tune
- æçµ4層ã®å¦ç¿çã¯lr1
- classification headã®å¦ç¿çã¯lr2
distli model : bandai namcoï¼bandainamco-mirai/distilbert-base-japanese · Hugging Faceï¼
- æçµ3層ã®CLSãã¼ã¯ã³ã®ãã¯ãã«ãconcatãã¦å©ç¨
- æçµ3層ã¨classification headãfine tune
- æçµ3層ã®å¦ç¿çã¯lr1
- classification headã®å¦ç¿çã¯lr2
精度è©ä¾¡æ¹æ³
- ã¾ã8:2ã§train/testã«åå²
- trainã®ã¿ãå©ç¨ãã5fold stratified cross validationï¼å ¨ã¦ã®å®é¨ã§foldã¯åºå®ï¼ã§ã¢ãã«ãå¦ç¿
- 5ã¤ã®ã¢ãã«ã§ããããtestã«å¯¾ãã¦æ¨è«ãaverageãããã®ãtestã®äºæ¸¬å¤ã¨ããã
- cvã¨testã®acc & f1 macroã§æ¯è¼
é度è©ä¾¡æ¹æ³
- testã»ããããã©ã³ãã ã«500åã®ãã¼ã¿ããµã³ããªã³ã°ã(å ¨ã¦ã®å®é¨ã§å ±é)ãbatch_size==1ã§æ¨è«
- åãã¼ã¿ã«å¯¾ããæ¨è«æéã®å¹³åå¤ã¨æ¨æºåå·®ã§è©ä¾¡
çµæ
ã¾ããåææ³ã«å¯¾ããtest scoreã¨é度ã®plotã¯ä»¥ä¸ã®ããã«ãªãã¾ããã ã°ã©ãã®è¦æ¹ã§ããã以ä¸ã®éãã§ãã
- ä¸çªå·¦ããã¼ã¹ã©ã¤ã³
- 赤ã¨é»è²ã®ãã¼ã¯ç²¾åº¦ã表ãã¦ãã䏿¹åã®æ¹ãè¯ã
- éãç¹ã¯æ¨è«æéã§ä¸æ¹åã®æ¹ãè¯ã
- ã¨ã©ã¼ãã¼ã¯æ¨æºåå·®
twitterææ åé¡
livedoorãããã¯åé¡
詳細ãªçµæã¯ä»¥ä¸ã«ãªãã¾ãã
twitterææ åé¡
ææ³ | cv acc (f1-macro) | test acc (f1-macro) | 平忍è«é度(s) | æ¨æºåå·®(s) |
---|---|---|---|---|
BASELINE | 0.8295 (0.8193) | 0.8363 (0.8256) | 0.2150 | 0.0050 |
quantization | 0.8223 (0.8092) | 0.8283 (0.8150) | 0.1700 | 0.0048 |
distillation | 0.8388 (0.8313) | 0.8292 (0.8220) | 0.1547 | 0.0076 |
max_length:64 | 0.8212 (0.8103) | 0.8250 (0.8138) | 0.1156 | 0.0036 |
do_not_pad | 0.8295 (0.8193) | 0.8363 (0.8256) | 0.0987 | 0.0290 |
torchscript | 0.8295 (0.8193) | 0.8363 (0.8256) | 0.1847 | 0.0080 |
pruning: 0.2 | 0.8327 (0.8226) | 0.8283 (0.8173) | 0.2124 | 0.0043 |
pruning: 0.4 | 0.8095 (0.7972) | 0.8229 (0.8100) | 0.1925 | 0.0041 |
pruning: 0.6 | 0.7097 (0.6787) | 0.7597 (0.7198) | 0.1925 | 0.0044 |
pruning: 0.8 | 0.5809 (0.5024) | 0.6220 (0.3834) | 0.1912 | 0.0046 |
livedoorãããã¯åé¡
ææ³ | cv acc (f1-macro) | test acc (f1-macro) | 平忍è«é度(s) | æ¨æºåå·®(s) |
---|---|---|---|---|
BASELINE | 0.9238 (0.9180) | 0.9348 (0.9285) | 0.7500 | 0.0079 |
quantization | 0.9022 (0.8962) | 0.9246 (0.9199) | 0.6565 | 0.0068 |
distillation | 0.8581 (0.8494) | 0.8723 (0.8646) | 0.5128 | 0.0079 |
max_length:256 | 0.8691 (0.8630) | 0.8676 (0.8605) | 0.4511 | 0.0062 |
do_not_pad | 0.9238 (0.9180) | 0.9348 (0.9285) | 0.7012 | 0.0926 |
torchscript | 0.9238 (0.9180) | 0.9348 (0.9285) | 0.7222 | 0.0083 |
pruning: 0.2 | 0.9204 (0.9144) | 0.9355 (0.9302) | 0.7633 | 0.0083 |
pruning: 0.4 | 0.8674 (0.8624) | 0.8900 (0.8846) | 0.7682 | 0.0084 |
pruning: 0.6 | 0.1973 (0.1176) | 0.2057 (0.1025) | 0.7496 | 0.1045 |
pruning: 0.8 | 0.1360 (0.0950) | 0.1140 (0.0227) | 0.7287 | 0.0075 |
èå¯
ããããã®ææ³ã«ã¤ãã¦ããæ§è½ããããããã表示ããããBASEã®ç²¾åº¦ã¨é度ã1ã¨ããåææ³ã®æ§è½ãèå¯ãã¦ããã¾ãã
twitterææ åé¡
livedoorãããã¯åé¡
quantizationï¼éååï¼
ã©ã¡ãã®ã¿ã¹ã¯ã«ããã¦ãæ®ã©ç²¾åº¦ãè½ã¨ããã«æ¨è«æéã10~20%ã»ã©åæ¸ãããã¨ãå¯è½ã§ãã å®è£ ã容æã§ãããããé«éåã®éã«ã¯ã¾ã試ãã¦ã¿ããææ³ã§ãã
distillationï¼è¸çï¼
精度é¢ã§ã¯ã¿ã¹ã¯ã«ãã£ã¦å¤§ããçµæãç°ãªããã¨ããããã¾ãã twitterãã¼ã¿ã«å¯¾ãã¦ã¯æ®ã©ç²¾åº¦ä½ä¸ãè¦ããã¾ããããlivedoorãã¼ã¿ã«å¯¾ãã¦ã¯ããç¨åº¦ã®ç²¾åº¦ä½ä¸ãèªãããã¾ãã
æ¨è«æéã«ã¤ãã¦ã¯ç´30%ã»ã©åæ¸ã§ãã¦ãããã¿ã¹ã¯ã«ãã£ã¦ã¯éå¸¸ã«æå¹ãªé¸æè¢ã«ãªãå¾ã¾ãã
max_length
ã©ã¡ãã®ã¿ã¹ã¯ã§ãæ¨è«æéã40%~45ï¼ ã»ã©åæ¸ã§ãã¦ãããé«éåã«ããã¦æãå®å®ãã¦å¯ä¸ããã¨ããã¾ãã
é常ã«ã¤ã³ãã¯ãã大ããã»ã³ã·ãã£ããªãã©ã¡ã¼ã¿ã§ãããããããç¨åº¦éåº¦ãæ±ããããã·ãã¥ã¨ã¼ã·ã§ã³ã®å ´åãã¾ãåãã«ãã¥ã¼ãã³ã°ãã¹ããã©ã¡ã¼ã¿ã§ãã
do_not_pad
ãã®ææ³ã¯ãã¼ã¿ã»ããã«ãã£ã¦å¤§ãã广ãç°ãªãçµæã¨ãªãã¾ãããã精度ã¯ä¸å¤ã§ããããããããå¦çãä¸å¯è½ãªç¶æ³ä¸ã§ã¯ç©æ¥µçã«å©ç¨ãã¹ãã§ãã
ç¹ã«æå¤§æåæ°ãå°ãªããæåæ°ã®åæ£ã大ããã¨èãããããã¤ãã¿ã¼ãã¼ã¿ã»ããã§ã¯do_not_padã®å½±é¿ã¯å¤§ãããè¯50%ã®æ¨è«æéãã»ã¼ããããã¨ãã§ãã¾ããã
torchscript
精度ãè½ã¨ããã«ãå°ãã§ã¯ããã¾ããæ¨è«é度ãåä¸ããããã¨ãã§ãã¾ãã
ã¾ããtorchscriptã¯ãã®ä»ã«ãå¤ãã®ã¡ãªãããæãã¦ããï¼Python以å¤ã®ã©ã³ã¿ã¤ã ã§å®è¡å¯è½ãæ¨è«æã«ãããã¯ã¼ã¯ã®å®ç¾©ãä¸è¦ãªã©ï¼ããããã¯ã·ã§ã³ã«ãããã¤ããéã¯ONNXçã¨ä¸¦ã¶é¸æè¢ã¨ãªãã¾ãã
Pruning
ä»åã®å®é¨ã§ã¯ããªãå¾®å¦ãªçµæã§ããã
twittterãã¼ã¿ã»ããã§ã¯pruningï¼0.4ã§10%ã»ã©ã®æ¨è«æé忏ãéæãã¾ãããããã®ä»ã®ææ³ã®ãã¬ã¼ããªãã¨æ¯è¼ããã¨ã³ã¹ãããã©ã¼ãã³ã¹ãä½ãå°è±¡ã§ãããã®ä»ã®ææ³ãå ¨ã¦é©å¿ããå¾ãããã§ãé«éåãå¿ è¦ãªãã°æ¤è¨ãããã¨ãã£ããã®ã«ãªãã§ãããã
ã¾ããlivedoorãã¼ã¿ã»ããã«ããã¦ã¯ã¾ããã®ä½éåã«å¯ä¸ããçµæã¨ãªã£ã¦ãã¾ãã¾ããã
ã¾ã¨ã
æ¬è¨äºã§ã¯NLPã¢ãã«ãé«éã«CPUä¸ã§åä½ããããããåé«éåææ³ã«ã¤ãã¦æ¤è¨¼ãã¦ãã¾ããã ã¿ã¹ã¯ã«ãã£ã¦åææ³ã®ããã©ã¼ãã³ã¹ã大ããç°ãªããããå¿ è¦ãªç²¾åº¦ã¨é度ãè¦æ¥µããå¾ãæé©ãªé«éåææ³ã®çµã¿åãããæ¨¡ç´¢ãããã¨ãéè¦ã§ãã
ãã®ä»ã«ãæå¹ãªé«éåææ³ãããã°æãã¦ãã ããã¨å¹¸ãã§ãã
åè
- [2001.08361] Scaling Laws for Neural Language Models
- [2105.01601] MLP-Mixer: An all-MLP Architecture for Vision
- Quantization — PyTorch 1.9.0 documentation
- https://arxiv.org/pdf/1910.01108.pdf
- bandainamco-mirai/distilbert-base-japanese · Hugging Face
- Pruning Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation
- PyTorch JIT and TorchScript. A path to production for PyTorch models | by Abhishek Sharma | Towards Data Science
- Google QUEST Q&A Labeling | Kaggle