ã¯ããã«
Kaggleéæ´
ããã¯ã°ã©ã¦ã³ã
Kaggle ã¨ã®åºä¼ã
æãåºæ·±ãã³ã³ããã¡
Kaggle ã®åãçµã¿æ¹
Kaggle ã§å¾ããããã®
ä»å¾
ãããã«
ã¯ããã«
ããã«ã¡ã¯ãkaerururu ã§ãã
Kaggle ãå§ãã¦ç´6å¹´åãç®æ¨ã ã£ã Competitions Grandmaster ã«ãªããã¨ãã§ãã¾ããã
ä¸ã¤ã®ç¯ç®ã¨ãã¦ãæ¯ãè¿ãè¨äºãæ¸ãã¦ã¿ããã¨æãã¾ãã
www.kaggle.com
Kaggleéæ´
ç¥è¡¨
æ¥ä»
Event
2018/08/07
åãµã (Titanic)
2018/12/19
åã¡ãã« (é
)
2019/02/14
Expert
2019/04/10
åéã¡ãã«
2019/06/29
Master
2024/12/20
Grandmaster
éã¡ãã«ãç²å¾ããã³ã³ã
ããã¯ã°ã©ã¦ã³ã
çµæ¸å¦é¨å
æ°åã¯éèç³»ã®å¶æ¥è·
社ä¼äººã«ãªã£ã¦ããããã°ã©ãã³ã°ãå§ãã
転è·ãéããCADDi ã«å¨ç±ä¸
(ex) DMM.com â ã¹ããã¯ãã¼ã¯ â éèç³»ã®ã·ã¹ãã ä¼ç¤¾
Kaggle ã¨ã®åºä¼ã
çµæ¸å¦é¨ãåæ¥ããéèç³»ã®å¶æ¥è·ã«æ°åå
¥ç¤¾ãã¾ããã
æ°è¦é¡§å®¢éææ¥åã§ãåæã次ã
ã¨åå注ã決ãã¦ããä¸ããã¾ããããªãæ¥ã
ãéã£ã¦ãã¾ãããå¶æ¥è·ããå¥ã®æ¥çè·æ¥ã¸ã®è»¢èº«ãèãã¦ããã¨ããããã°ã©ãã³ã°ã¨åºä¼ãã¾ãããæ¸ãããããããã¨ããçç±ã§ Python ãé¸ã³ã¾ããã
ãã³ã¼ããæ¸ããèªåã§æ¸ããããã°ã©ã ãåããã·ã³ãã«ãªãã¨ã§ãããæåãã¾ããã
å½æTwitter ã§ã¯ããé§ãåºãã¨ã³ã¸ãã¢ã¨ç¹ãããããã®ããã·ã¥ã¿ã° ã®ãã¨èªåã¨ä¼¼ããããªå¥ã®æ¥çåºèº«ã®äººãã¡ã転è·æ´»åã®ããã®ãã¼ããã©ãªãª ã¨ã㦠Webã¢ããªãä½æãããã¨ãæµè¡ã£ã¦ãã¾ããã
ã©ããä½ããªã AI ã使ãã¨é¢ç½ããã ã¨æã£ã¦ãä½ãããã®ç»åã post ããã¨ç»åã«ãã£ã¦ä½ãããã®ã¬ã¹ãã³ã¹ãããã¹ãã§è¿ã£ã¦ãããã㪠LINE BOT ãä½æãã¾ãããããã¤ãã® Qiita è¨äºãåèã«ãHeroku (ã¤ã³ãã©)+LINE API +Keras ã®æ§æã§ãç»åã¯èªåã§ã¹ã¯ã¬ã¤ãã³ã° ãã¦ãããã®ã使ç¨ãã¾ããã
ä½ã¨ãªã DeepLearning model ãä½æããæ¹æ³ã¯ããã£ãã®ã§ãããAI ã«ãããããããã®ãã¼ã¿ã®å
¥æã®ä»æ¹ãããããã¨ãªãã¾ãããç¡æã§ä½¿ããããã°ãã¼ã¿ ãªãããªã¨æ¢ãã¦è¦ã¤ããã®ã Kaggle ã§ããã
æãåºæ·±ãã³ã³ããã¡
åå ãã¦ä½ãããã®ãµããããã³ã³ãã¯å
¨é¨ã§ 60ãã¡ãã«ãç²å¾ããã³ã³ã㯠22 ããã¾ããã
æ°ãå¤ãã®ã§ãé+å°è±¡ã«æ®ã£ã¦ããã³ã³ãã«çµã£ã¦æãåºèªããã¾ãã
PLAsTiCC Astronomical Classification ð¥ (103/1089)
æé é¡ã«ãã観測ãã¼ã¿ããã観測対象ã®å¤©ä½ãå±ããã¯ã©ã¹ãåé¡ããã³ã³ãã§ããé称 plasticc ã³ã³ãã
å
¬éãã¼ãããã¯ããã¼ã¹ã«åãçµã¿ã¾ããã天ä½ã«é¢ãããã¡ã¤ã³ ç¥èã¯å½ç¶ã®ããã«ãªãã£ãã®ã§ãããweb ãªã©ã§å¤©ä½ã«é¢ããå
¬å¼ã調ã¹ã¤ã¤æå
ã®ãã¼ã¿ããããããã¦ä½ã£ãç¹å¾´é㧠CV 㨠LB ãæ¹åãã¦ããã®ãã²ããã楽ããå¤éãå®é¨ãã¦ããã®ãä»ã§ãè¦ãã¦ãã¾ãã
å½æ㯠Notebook ã³ã³ãã¨ãããã®ããªãããã¼ã«ã«ã§è¨ç®ã㦠CSV ãæåºããã°ããã£ãã®ã§ãããã«ã¼ã«ãããããã£ã¦ããã Notebook ç¸ãã§åãçµãã§ãã¾ããã
ãã®ä½ãè¾ãã¨ããã¨ãæç³»åã®æ
å ±ããã£ã¦ tsfresh ã¨ããã©ã¤ãã©ãªã§ä½ãããç¹å¾´éãå¼·ãã£ãã®ã§ãããNotebook ã§å®è¡ããã¨ããã¡ã¢ãªè½ã¡ãã¦ä½¿ããªãã£ãã§ããä»ã§ã¯ç¬ã話ã§ãã
ãã®ã³ã³ãã§ä½ã¨ãé
ã¡ãã«ãç²å¾ãã¾ãããæªçµé¨ãªããããã®æ績ã¨ããæ°ãè²·ã£ã¦ MLã¨ã³ã¸ãã¢ã¨ãã¦è¿ãã¦ãããã®ãã¹ããã¯ãã¼ã¯ã§ããã(大æè¬ã§ãããã)
PetFinder.my Adoption Prediction ð¥ (1/2023)
(å½æã®LBã¹ã¯ã·ã§ãã¤ã¼ãããªãã£ã)
ãã¬ã¼ã·ã¢ã®ãããã·ã§ããã§ç¬ã»ç«ãå¼ãåãããéããäºæ¸¬ããã³ã³ãã§ãããé称 Petã³ã³ãã
ã¯ããã¦éã¡ãã«ãåã£ãã³ã³ãã§ãåãã¦åªåããã³ã³ãã«ãªãã¾ããã
(ã³ã³ãã«ãªãã¾ãããã¨ããã®ã¯ Private 2nd 㧠finish ããã®ã§ããã1st ãã¼ã ã®ä¸æ£ããã¬ã¦ãã¼ã ã㨠ban ã«ãªã£ã¦ç¹°ãä¸ãã£ãããã§ã...ç¬)
æåã½ãã§ãå
¬éNotebook ãåèã«ç¹å¾´éãããããããéåæåã®é
åã«å
¥ã£ãããã㧠Twitter ã§ããåãã®ãã£ã u++ san, ynktk-san, takuoko-san ã«ä¸ç·ã«ããã¾ãããï¼ã¨å£°ãããã¦ãã¼ã ãã¼ã¸ãã¦ãããã¾ããããã¼ã ã«ãªã£ã¦ããã¯ãã¼ã ã¡ã¤ããå¼·ããã¦ã©ãã©ã weight ãæ¸ã£ã¦ãã£ã¦ç¦ãã¾ãããæå¾éåä»è¿ã§ãã¼ã¸ãã gege-san ãç§ã Notebook ã«æ稿ããç¹å¾´éã使ç¨ãã¦ãããããweight ãå°ãæ»ãã¨ãã£ããã¨ãããã¾ããã
ãã®ãã¼ã ã¯ã³ã³ããçµãã£ãå¾ã slack ã§æ
å ±äº¤æãç¶ãã¦ãããããã¾ã«ã飯è¡ã£ãããæ¸ç±ã®ã¬ãã¥ã¼ã«åå ããã¦ããã£ãããKaggle Championship ã®æ±ºåã§ã¹ãã¤ã³ã«ä¸ç·ã«ãã£ãããã¨ã«ããè²ã
ãªçµé¨ãããã¦ãããã¾ããããããã¨ããããã¾ãã
å½æã®ããã°è¨äº
Freesound Audio Tagging 2019 ð¥ (7/880)
ç°å¢é³ãã¼ã¿ã«å¯¾ãã¦ãã¨ã³ã¸ã³é³ãããç·æ§ã®æ声ããªã©ã®ã¿ã°ãä»ããã¿ã¹ã¯ã§ããã«ãã©ãã« åé¡ã®ã³ã³ãã§ãããé称 freesound ã³ã³ãã
çè«ãå¦ã¶ã¹ãåå ããåå¼·ä¼ã§ææ°æåãã arai-san ã¨ã¯ãããããã¼ã ã§åå ãã¾ããããã®å¾ plasstic ã³ã³ãã®åçä¼ã§ç¥ãåã£ã uratatsu-san 㨠3人ã§åãçµããã¨ã«ãªãã¾ããã
wav ã§æä¾ãããé³å£°ãã¼ã¿ã§ããããã¡ã«ã¹ãã¯ããã°ã©ã ã«å¤æã㦠2DCNN ã§è§£ããã¼ã¹ã©ã¤ã³ Notebook ãå
¬éããã¦ããããã¡ããåèã«éå§ãã¾ãããããããªãããã¯ã¼ã¯ã augmentation ãå®è£
ãã¦ã¿ãã PyTorch ã«æ
£ãããã¨ãã§ãã¾ããã
ãã®é 㯠GPU Notebook 使ç¨ä¸éãä¸äººä¸åã«ã¤ã 6 åã¨ãã ã£ãã®ã§ã¿ããªã§ Fold ãåæ
ãã¦åããããã¦ã¾ããã
精度é¢ã§ã¯ãã¾ãè²¢ç®ã§ããªãã£ãã®ã§æããã£ãã§ããããããã楽ããåãçµããè¯ãçµé¨ã§ããã
ã¾ãããã®ã¡ã³ãã¼ã§ã¯ Bengali.AI Handwritten Grapheme Classification (é称ãã³ã¬ã«èª ã³ã³ã) ãä¸ç·ã«åãçµãã§å¤§ shake down ãçµé¨ãã¾ãããPrivate ã«ãããªã unseen data ã¸ã®å¯¾å¿ããã£ã¦ããªãã£ãã®ã§ããããã®çµé¨ã Public 㨠Private ã®ãã¼ã¿å·®ç°ã¸ã®æèãå¼·ããããã£ããã«ãªã£ãã¨æãã¾ãã
ãã®ã³ã³ãã®å¾ããããã Kaggle ã® GPU ã使ãæ¾é¡ã§ãªããªããå人㧠GPU ãã·ã³ãè²·ãæ¹ãå¢ããã¨æãã¾ããç§ã¯ã±ãã£ã¦ 2080 ti 1ææ¿ã (11GB) ã®ãã·ã³ãè²·ãã¾ãããï¼å¾ã§ Titan RTX ãè³¼å
¥ãã¦æ¿ãæ¿ãããã2æç®ãè³¼å
¥ããããã¦ããã®ã§æãåã£ã¦æè³ããã°è¯ãã£ãã¨å¾ææ°å³ã§ããï¼
å½æã®ããã°è¨äº
PetFinder.my - Pawpularity Contest ð¥ (5/3537)
ä¿è·åç©ã®é親æ¢ãå£ä½ããã¹ãã§ãä¸ããããç¬ç«ç»åã«å¯¾ãã¦ç¬èªã«å®ç¾©ãããé£ç¶å¤ã§ããå¯æãã¹ã³ã¢(Pawpularity) ãäºæ¸¬ããã³ã³ãã§ãããé称 Pet2 ã³ã³ãã
trainãã¼ã¿ãå°ãªããã¿ã¼ã²ããããã¤ã¸ã¼ (é¡ä¼¼ç»åã§ç°ãªã) ã§ä¸ãããã¦ãããç»åã ãã ã¨éè¦ç´ ãå¼·ãã£ãã®ã§ãããéå»ã³ã³ãã¨è¢«ãã®ç»åããã£ã¦ãé¡ä¼¼åº¦ã§ãããããã¦éå»ã³ã³ãç±æ¥ã®ç¹å¾´éãå ããã¨ã¹ã³ã¢ãè·³ãä¸ããç¹å¾´ãããã¾ããã
ãªã¼ã¯ã«æ°ã¥ããé
ã¡ãã«ããµããµããã¦ããã¨ããã2ä½ã«ãããã¼ã ã¡ã¤ããã¡ããèªããã¦åå ãã¾ããã(2ä½ããèªããããæããªããã)
éå»ã®åä¸ãã¹ãã®ã³ã³ãã§åªåããçµé¨ãã声ãããã¦ããããããã§ãã
ãã¼ã ã§ã¯ã主ã«éå»ã³ã³ãç»åã¨ã³ã³ãç»åã®ãããã³ã°ä»¶æ°ã® probing, ãããã³ã°ã®ããã®é¡ä¼¼åº¦è¨ç®é¨åã®é«éåãã¢ã³ãµã³ãã«ãªã©ãæ
å½ãã¾ããã
æ快㪠yuki -san, johann-san, cpptake-san, shokupan-san ã¨æ¥ã
ãããªãã£ãã ãç¹ãã§ããããåãçµãã¾ããã
ãã®é ã«ã¯ Titan RTX ã¸ã®æè£
ãçµãã¦ããããã¾ãã¹ãã¬ã¹ãªãç»åã®ãããã¯ã¼ã¯ã使ã£ã¦ãããã¨æãã¾ãã
å½æã®ããã°è¨äº
Feedback Prize - English Language Learning ð¥ (13/2654)
è±èªå¦ç¿è
ã«ãã£ã¦æ¸ãããã¨ãã»ã¤ããææ³ãèªå½ãªã©ã®6ã¤ã®åºæºã§è©ä¾¡ããã³ã³ãã§ãããé称 feedback3 ã³ã³ãã
ãã®æã3ã¤ã®éã¡ãã«ãæã£ã¦ããã4ã¤ç®ã¯ã½ãã§åããã¨æ±ºãã¦ã½ãåå ç¸ããç¶ãã¦ãã¾ããããã¡ããã¡ãããã©ãã£ãã§ãã
ä¸ã¯å¤§ deberta-v3 æ代ã§ãããTitan RTX 1ææ¿ãã® GPU ãã·ã³ã§ã¯å°ãæéããã£ã¡ãããªã¨ãæãåã£ã¦å¢è¨ãã¾ãããTitan RTX 2ææ¿ãã® VRAM 48GB ã§ãããã®æ§æã¯ã¾ã ã¾ã ç¾å½¹ã§ä½¿ç¨ãã¦ãããLLM ã® Qwen2.5 32B ã qlora ãã¥ã¼ãã³ã°ãããã¨ã¾ã§ã§ãã¾ãã(72B ã¯ããã¡ã¢ãªã«ä¹ããªãã£ãã)
次㮠shaky ãªè¦ç´ ãæã£ãã³ã³ã㧠Public LB ãæ£ç´æ©è½ãã¦ããªãã£ãã§ãã
å°ãªãå¦ç¿ãã¼ã¿
è©ä¾¡ææ¨ãè¤æ° column ã® RMSE å¹³å
ãã¤ã¹ã³ã¢ Public Notebook
LB 表示ãå°æ°ç¹2æ¡ã¾ã§
Trust CV ã¨å¤æ§æ§ç¢ºä¿ã«å
¨æ¯ããã¦ä½ã¨ã Public/Private ã¨ãã«éåã«æ®ããã®ã§å¬ããã£ãã§ãã
takoi-san ã® CommonLit ã®ã½ãªã¥ã¼ã·ã§ã³ ã¯ä½åº¦ãè¦è¿ãã¾ããã
å½æã®ããã°è¨äº
HuBMAP - Hacking the Human Vasculature ð¥ (33/1021) (26 shake down ð»)
ãããã㯠3é£ shake down ã®æããã³ã³ããã¡ã§ãã
ã¾ãã¯é称 hubmap2023 ã³ã³ããããè
èçµç¹ã¹ã©ã¤ãããå¾®å°è¡ç®¡æ§é ã®ã¤ã³ã¹ã¿ã³ã¹ ãã»ã°ã¡ã³ãã¼ã·ã§ã³ããã³ã³ãã§ããã
ãã¼ã¿ã»ãã ã«æ¬¡ã®ãããªç¹å¾´ããã£ã¦ãshake ã®å¯è½æ§ãäºåã«ããã£ã¦ãã¾ããã
åè¨5æã®å
¨ã¹ã©ã¤ãç»å(WSI)ããã¿ã¤ã«ãæ½åº
ï¼ã¤ã®ãã¼ã¿ã»ãã ã¨ãã¦åå² (DS1, DS2)
çæ¹ã¯å°é家ã®ã¢ããã¼ã·ã§ã³ ãã (clean)ãããçæ¹ã¯ãªã (noisy)
5æã® WSI ããããã train:public_test:private_test=2:2:1 ã§åãããã
ãã¹ãã»ãã㯠DS1 ç±æ¥ã®ãã®
DS3 ã¨ãã¦è¿½å ã® 9ã¤ã® WSI ç±æ¥ã®ã¢ãããªãã¿ã¤ã«
shake ã®å¯è½æ§ãããã£ã¦ããã®ã§ååã«æ°ãã¤ãã¦ããã¤ããã§ãããCV 㨠Public LB ã®ã¿ã«å¹ãè¬ã®å¾å¦çãå
¬éããã¦ãããããã«é ¼ã£ã¦éåã keep ãã¦ãããã shake down ãã¦ãã¾ãã¾ããã
ãã¼ã ã¡ã³ãã¼ã¯ anonamename-san 㨠harshit-san ã§ããããã¼ã ãã¼ã¸åã«ããããã§å®è£
ãã¦ãããã®ããã¾ãçµã¿åããããã¨ãã§ãã¾ããã楽ããã£ãã§ãã
å½æã®ããã°è¨äº
Child Mind Institute - Detect Sleep States ð¥ (15/1877) (4 shake down ð»)
æé¦ã«ã¤ããå é度è¨ã®è¦³æ¸¬ãã¼ã¿ããå
¥ç (onset)ã¨è¦é(wakeup)ã®ã¤ãã³ããæ¤åºããã³ã³ãã§ãããé称ç¡ç ã³ã³ãã
ã»ã³ãµã¼ãã¼ã¿ããã¾ãæ±ã£ãçµé¨ããªããåå¼·ã®ããã«åå ãã¾ããã
ããã¨ããªãã£ãã®ã§ãããå
¬éãããããã¤ã pipeline ããã¼ã¹ã«æ¹åãããã¨ããä¸æ°ã«éåä¸ä½ã¾ã§ã¹ã³ã¢ã伸ã°ããã®ã§ãã¼ã ãã¼ã¸ãã¦ãããã¾ããã
ãã¼ã ã¡ã³ãã¼ã«ããå¾å¦çãã¢ã³ãµã³ãã«ã«ãã£ã¦ Public éå㧠finish ã§ããã®ã§ãããçµæã¯å°ã shake down ãã¦ãã¾ãã¾ãããåå¥ã¢ãã«ã®å¼·ããä»ã®ãã¼ã ã®æ¹ãå¼·ãã£ãã§ãã
ã¢ã³ãµã³ãã«ã«ããå¤æ§æ§å¼·åã¯ãã¡ãã大äºã ããåå¥ã¢ãã«ã®ç²¾åº¦è¿½æ±ãæå¾ã¾ã§æãããªããããªãã¨ãããªãã¨ããå¦ã³ã«ãªãã¾ããã
ãã®æãã¼ã ãã¼ã¸ãã tereka-san ã® CNN 㨠Transformer ã®å®è£
ããããããããåºç¤ã®å¾©ç¿ã«æ´»ç¨ããã¦ãããã¾ãããããã¾ã§ timm çµç±ã§ãªãã¨ã Net ã Deep 㧠pon ããã¡ã ã£ãã®ã§ãããConv Unit ã UNet ãèªåã§å®è£
ã§ããããã«ãªããå¾ç¶ã® LEAP ã³ã³ãã§ãå¥éã§ãã¾ããã
å½æã®ããã°è¨äº
LEAP - Atmospheric Physics using AI (ClimSim) ð¥ (19/693) (7 shake down ð»)
è¨ç®ã³ã¹ããé«ãæ°åã·ãã¥ã¬ã¼ã¿(E3SM-MMF )ãã¨ãã¥ã¬ã¼ãããMLã¢ãã«ãä½æããã³ã³ãã§ãããé称 LEAPã³ã³ãã
以ä¸ã®ãããªç¹å¾´ãæã£ããã¼ãã«ãã¼ ã¿ã§ãã
ãã¼ã¿éããã®ãããå¤ã
train: 1000ä¸ä»¶ (360GB), test: 62.5ä¸ä»¶(7GB)
train data 㯠subsample 㧠huggingface ä¸ã«å
ãã¼ã¿ (5000ä¸ä»¶)
ããã«é«è§£å度ãã¼ã¿ã...
CSV æåº
ã¨ã«ãããã¼ã¿ããã«ããã¾ããããã®ã³ã³ãããã£ããã«å¤ä»ãã®SSD (2TB) ãå¢ç¯ãã¾ããã
ãã¼ã¿éãå¤ãã®ã§ãããã«è¨ç®ãæ©ãã§ãããå®è£
ã工夫ãã¾ããã
ã¾ããã¼ã¿ããããã float64 ã§ä¿åããã¦ãããå®æã« float16 ãªã©ã« cast ãã¦ãã¾ãã¨æ
å ±ã失ãããäºæ¸¬ã®ç²¾åº¦ãè½ã¡ãé£ãããããã¾ããã
å®æ½ãã工夫
mmap ã hdf5 ãªã©ããã¾ã§è§¦ã£ã¦ãã¦ãªãã£ããã¼ã¿ã®ä¿åå½¢å¼ãåå¼·ãã¦ä½¿ç¨
äºåã«è¨ç®ã§ãããã®ã¯ãã¦ä¿åãã¦ãã
äºåã« target ã weight ã§éã¿ã¤ããã¦å
¨ä»¶ã® std, mean ãè¨ç®
巨大㪠csv ãç´ 650,000 件㮠chunk * 15 ã«åå²ãã¦ä¿åãã (65ä¸ä»¶ãªã®ã¯ test data ã® length ã¨è¿ã¥ãããã) (fp64)
å
¨ã¦ã® chunk ã«å¯¾ãã¦äºåè¨ç®ãã std, mean 㧠scaling ãã¦ä¿å (fp32)
fp32 ãªã®ã¯ fp64 ã¨æ¯è¼ããã¨ããã¾ãå¤ãããªãã£ããã
äºåã« scaling ã¾ã§æ¸ã¾ãã¦ãããã¨ã§ãdataloader ã§è¨ç®ããªãã¦ãããªã
å®è¡æãã¡ã¢ãªã«ç¹å¾´éãäºåã« load ãã¦ãã
dataloader ã§é½åº¦ç¹å¾´éã load ãã㨠iteration åæ°åé
ããªã (I/O bound)
ãããã®å·¥å¤«ã®ç²æãã£ã¦ããªãã¨ãæå
ã® local ãã·ã³ã§å®é¨ãåããã¨ãã§ããéåã¾ã§ããã¾ããããããã㯠Go æ ªå¼ä¼ç¤¾ã®çããã«ãã¼ã ãã¼ã¸ããã¦ãããã¾ããããã¼ã ã¡ã³ãã¼ã subsample ã§ãªã huggingface ä¸ã®å
ãã¼ã¿ãåãããã¨ã¹ã³ã¢ã伸ã³ããã¨ã確èªãã¦ããã¯ãã¿ããªã§å¤é¨ãã¼ã¿ã使ç¨ãã¾ãããhdf5 ã¨ãã¼ã«ã«ãã·ã³ã®ç¸æ§ãæªããè¨ãã»ã©ãã¼ã¿ã®ãã¼ããéããªããªãã£ãã®ã§ Google Cloud 㧠GPU ã¤ã³ã¹ã¿ã³ã¹ ãåãã¦åãçµã¿ã¾ãããæå¾ã¯ 1å®é¨ã§20h ã¨ãããã£ã¦ãã¼ãã§ããã
ã¾ãããªã¼ã¯é¨ãã§2é±é延é·ãããªã©å¤§å¤ã§ããã
ï¼ãªã¼ã¯ã§ä¼¸ã³ãæéã§å¤é¨ãã¼ã¿ã®å¦ç¿ã«æ°ã¥ãããããã£ããã¼ã ã«æããããããã¾ããã延é·ããªãã£ãããã®ã³ã³ã㧠GM ã«ãªãã¦ããããï¼
ãã¼ã ã§ã®åãã¨ãã¦ã¯ãæå¾ã®æ¹ã¯èªåã® single model ãå°ãå¼±ããã¢ã³ãµã³ãã«ããªããããã¯ãã·... ãããã ã£ãã®ã§æããã£ãã§ãã
ããããªãããå人çã«ã¯äºåå¦ç¿æ¸ã¿ã®ãªãã¨ã Net ã® finetune ã§ãªãã課é¡ã«åããã¦èªå㧠Conv Unit ã UNet ãå®è£
ããå¿
è¦ããã£ã¦ç¡ç ã³ã³ãã®å¾©ç¿ãæ´»ããããã¡ã¢ãªã¢ã¯ã»ã¹ã¬ãã«ã§å®è£
ãè¦ç´ãã¦é«éãªã³ã¼ããæ¸ãããã«å·¥å¤«ãã§ãã¦æé·ã§ãã¾ããã
Eedi - Mining Misconceptions in Mathematics ð¥ (7/1446)
æ°å¦çãªè³ªåã«å¯¾ãã¦1ã¤ã®æ£è§£ã¨3ã¤ã®ä¸æ£è§£ã§æ§æãããå¤è¢é¸æåé¡ãä¸ããããä¸æ£è§£ã®èæ¯ã«ãã誤解 (Misconception) ãäºæ¸¬ããã³ã³ãã§ãããé称 Eedi ã³ã³ãã
GM ã¾ã§æå¾ã®éã¡ãã«ãç®æãã3åã»ã© Public ã§éä»è¿ã§çµäºããshake down ããæããã³ã³ããããã¾ããã
復ç¿ãéè¦ã ã¨ãããã¨ã«æ°ã¥ãã¦ãããç´è¿ã®éå»ã³ã³ãã§æ°ã«ãªã£ãã½ãªã¥ã¼ã·ã§ã³ãèªåã§ãåãã㦠latesub ãã¦ãã¾ããã
LLM ã® finetune ã«ã¤ãã¦ã¯ãatmaCup17 ã® latesub ããã¦åæãæ´ãã§ãã¾ããã
Retriever + Reranker ã¨ããæ¹éã«æ°ä»ããReranker ã®ä½æã®é¨åã«ãã®ç¥èãæ´»ããããã¨ãã仮説ãããã£ã¦éåã«ããã¾ããããã®æã²ããããã¼ã«ã«ã®å®é¨ç°å¢ã§ã¯ãReranker ã® finetune ããããªãããNotebook 㧠Retriever model ãå
¬éããããã³ã«å·®ãæ¿ãã¦ã¹ã³ã¢ã伸ã°ãã¦ãã¾ããã
ã¾ãå¾ã
ã®ãã¼ã ãã¼ã¸ãè¦è¶ãã¦ãvLLM ã§ç²¾åº¦è½ã¨ããæ¨è«ã§ããããã«æºåããããRetriever ã®å®è¡ã並è¡å (cuda:0 㨠cuda:1 ã§ããããå®è¡) ããããæ¨è«ã³ã¼ãã®èª¿æ´ãé²ãã¦ãã¾ããã
æªç¥ Misconception ã¸ã®å¯¾å¿ãRetriever ã®èªä½ããã大ã㪠LLM ã® finetune ãªã©ãããä¸äººã§ããåããã¨ã¯é£ããã¨èãã¦ãã¦ãã¼ã ã§åãçµã¿ããã¨æã£ã¦ãã¾ããã
æåã«ç¡ç ã³ã³ãã§ãä¸ç·ãã tereka-san ã¨ãã¼ã ãã¼ã¸ãã¦ãã°ãã2人ã§åãçµã¿ã¾ããããã®å¾ masaya-san, ahmet-san ã¨ããã¼ã ãã¼ã¸ãã¾ããã
ãã¼ã ãã¼ã¸ãã¦ããã¯ãå
¨å¡ã®ãã¤ãã©ã¤ã³ã®çµ±åãé«éåãæ¨è«çµæã® Ensemble ãªã©æ¨è«ã³ã¼ãã®å®è£
ã¡ã¤ã³ã«åãçµã¿ã¾ããã
ä¸ã¤ä¸ã¤ã®ãã¤ãã©ã¤ã³ã®å®è¡æéã大ãããªããã¡ãª LLM ã³ã³ãã§ãããããã¾ãå
¨å¡ã®å®è£
ãæçµãã¤ãã©ã¤ã³ã«çµã¿è¾¼ãã¾ããã
ãã®å¤æ§æ§ãéç²å¾ã«å¹ããã®ã§ã¯ãªããã¨æãã¾ãã
GPU é¢é£ã§ã¯ãTitan RTX 2æ㧠qlora ã®è¨ç®ãã§ãã LLM ã 32B ã¯ã©ã¹ã¾ã§ã ã£ãããã72B ã¯ã©ã¹ã® LLM ã® finetune ããããããlambda labs ã® cloud 㧠H100 ã¤ã³ã¹ã¿ã³ã¹ ãç«ã¦ã¦æ°æ¥åãã¾ããã72B ã¯ã©ã¹ã ã¨1å®é¨ 70h ã»ã©ã§ $2.49/h ã ã£ãã®ã§æ°æ¥ã®ã©ã¹ãã¹ãã¼ãæ°ä¸åã¨è
¹ãæ¬ã£ããã¾ãåºããé¡ã ãªã¨æãã¾ããGCP ã¨ãããã¯å®ãããå人ã®ææã§ãããåæã®ã»ããã¢ããã楽ãã³ã§ããã
ããããè°è«ãã§ããã¨ã¦ã楽ããã³ã³ãã§ããã
Kaggle ã®åãçµã¿æ¹
æè¿ Kaggle ã«åå ããã¨ãã«ãã£ã¦ãããã¨ããã£ãããç´¹ä»ãã¾ãã
ã¨ã¯ããããã¾ãç¹å¥ãªãã¨ã¯ãªããå°éã«ç©ã¿éããè¡ã£ã¦ãä½ããã«ããã¨ããã£ãæã«è¯ãçµæãå¾ããã¦ããã¨æãã¾ãã
ã¾ã課é¡ã¨è©ä¾¡ææ¨ãç解ãã
ä½ã解ããã課é¡ãªã®ãã確èªãããã¼ã¿ã®èª¬æãèªã¿ãè©ä¾¡ææ¨ãç解ãã¾ãã
ãã¼ã¿ã俯ç°ãã¦ã¿ã
CSV ãã¼ã¿ã ã£ããã¹ãã·ã«ã³ãã¼ãã¦ãã£ã«ã¿ãªã©ã使ç¨ããªããå
¨ä½æãæ´ã¿ã¾ãã
ç»åãã¼ã¿ã ã£ãããVSCode 㧠code <dirå> ã§ãµãã¨ç¢ºèªã§ãã¾ãã
CV ãæ§ç¯ãããã³ã³ãåºæã§ãªãæ¹æ³ãä¸éã試ã
CV ãæ§ç¯ãã¦ãã¼ã¹ã©ã¤ã³ã¢ãã«ãä½ãã¾ãã
Private test ã« unseen ãã¼ã¿ããããããªã³ã³ãã®å ´åã¯ããããåç¾ã§ããããã«è©¦ã¿ã¾ãã
æ¨è«æéã® 9h ã«åã¾ãããã«è¤æ°ã®ãããã¯ã¼ã¯ãã¾ããã¦ã¿ã¦ã¢ã³ãµã³ãã«ãã¾ãã
éå»é¡ä¼¼ã³ã³ãã®ã½ãªã¥ã¼ã·ã§ã³ã§æå¹ã ã£ããã®ãè²ã
試ãã¾ãã
è¨ç»ãç«ã¦ã
ä¸éã試ãã¦ãµãã£ã¦ããããä¸ä½ã¨ã®å·®åã«ã¤ãã¦èãã¾ãã
ãã¡ããã¡ãé¢ãã¦ããå ´åã¯ãä½ãè¦è½ã¨ããããã®ãããããªãã§ãããè¿ãå ´åã¯ã¿ããªä¼¼ããå¯ã£ãããªãã¨ããã¦ããã®ãããããªãã§ãã
Eedi ã³ã³ãã§ã¯ãPublic 0.4 æªæºãããã¾ã§ã¯ Retriever ã®ã¿ã§å°éããã©ã¤ã³ã§ã0.45 ã¨ãåºãã¦ãã人㯠Reranker ãä½ã£ã¦ããã ãªã¨ãäºæ³ãã¦ãã¾ããã
2ã¶æã¯åãçµã
æ㯠2é±éãã£ã¬ã³ã¸ã 1ã¶æãã£ã¬ã³ã¸ããããã£ã¦ãã¾ããã
ã³ã³ããçµãã£ãå¾ã®æãããéææã好æ績ãæ®ãã人ã¸ã®ããã§ã¨ãã©ãã·ã¥ã¨ãã£ãé°å²æ°ã好ãã§ããã¯ãã次ã®ã³ã³ãã«åå ãããã2ã¶æãå¾
ã¦ãªããããã¨1ã¶æãããã®ãã®ã«åºãããã¨ããç¦ãããæ®ã 1ã¶æãããã®ã³ã³ãã«ç¹°ãè¿ãåå ãã¦ããææãããã¾ããã
ãã®å§¿å¢ã¯ããã人ã«ã¯ãããã®ãããããªãã§ãããèªåã«ã¯ãã£ã¦ããªãã£ãã§ããã©ãããã£ã¦ããªãã£ããã¨ããã¨ã
ã³ã³ãã®å¾©ç¿ãååã«ã§ããªãã¾ã¾ã次ã®ã³ã³ãã«åãçµãã§å¦ç¿å¹æãæ¸ã£ã¦ãã¾ã
ãã®ãããé¡ä¼¼ã³ã³ãããã£ãã¨ãã«å¼·ãã¦ãã¥ã¼ã²ã¼ã ãã§ããªãã£ã
1ã¶æããåå ããªãã®ã§ãæéã足ããªã
2ã¶æã3ã¶æåãçµãã§ãã人㫠1ã¶æã§åã¤ã®ã¯é£ãã
æä½2ã¶æã¯åãçµãã§ããã³ã³ãã§ã¯ãå®å®ããæ績ãæ®ãã¦ããããã«æãã¾ãã
Eedi ã³ã³ã㯠2ã¶æã¡ãã£ã¨åãçµã¿ã¾ããã(å¾åãã£ã¨ç·å¼µã§åå½ãã¦ã¾ãã)
ãã£ãã復ç¿ããã
å人çã«æãæé·ã§ãããã¤ã³ãã ã¨æã£ã¦ãã¾ããä»ã® GM ã®æ¹ã
ãå£ãæãã¦è¨ã£ã¦ãã¾ããç¾è¡ã³ã³ãã«åå ãã¦ããã ãã§ã¯ã³ãã£ã³ã¹ããæ°ã«ãªã£ã¦ãã¾ãã¾ãããç¾è¡ã³ã³ãã«åå ãã¦ããªãéã¯åæ»ãã¦ããæ°æã¡ã«ãªã£ã¦ããã®ã§ãããçµå±æ¥ãã°åã ã§ããã
å®éã«ãã£ã¦å¹æããã£ãã¨æãã復ç¿
ç¡ç ã³ã³ã
Conv Unit ãéããã·ã³ãã«ãª CNN ã UNet ã¨ã Attention (Transformer) ã®ã¹ã¯ã©ãã å®è£
ãè²ã
試ãã¦ã¿ã¾ããã
ã©ã¤ãã©ãªããã³ã¨ããã ãã§ãªããGELU ã RELUãLayerNorm ã BatchNormãGroupNorm ãªã©è²ã
ããä¸ã§ä½ãã©ãå¤ãããã©ããããã¹ã³ã¢ã«å½±é¿ããã®ããã®èæãæ´ãç®çã§å§ãã¾ããã
ãã¤ãã©ã¤ã³ã®ã¢ãã«é¨åã®ã¿å·®ãæ¿ããå½¢ã§ãã¿ã¼ã³ãå¤ãã¦å®è¡ããCV ãã¿ã¾ãã
3層MLP ããã¯ãã㦠Conv1d UnitãAttention ã®ã¿ãUNet ãªã©
LEAP ã³ã³ãã§èªåãããã¯ã¼ã¯ããããã®çµé¨ãæ´»ãã¾ããã
atmaCup17
åªå解æ³ã ktr ããã«ãã gemma-2b-it ã® NSP ã¿ã¹ã¯ã lora tuning ã§å®æ½ãããã®ã§ããã
ãã®ã³ã³ãã«ã¯åå ã§ããªãã£ãã®ã§ãããä¸ã¯ deberta æ代ãã大LLMæ代ã«ãªã£ãã®ã ãªã¨çæãããã£ããã¢ãããããã¨ã«ãã¾ããã
Eedi ã³ã³ãã§å¤§æ´»èºãã¾ããã
LBã«æ¥æ¬äººã¢ã«ã¦ã³ããå¤ãã£ãã®ã¯ãã® Solution ã®å½±é¿ãå°ãªããããã£ãã®ã§ã¯ãªããã¨æã£ã¦ãã¾ãç¬
å®é¨è¨ç»ãæ°ã«ãªã£ã discussion ããªã³ã¯ãªã©ãã¾ã¨ããããCSV ãã¼ã¿ããã¿è²¼ããã¦ä¿¯ç°ãã¦ã¿ããè²ã
ãªç¨éã«ä½¿ã£ã¦ãã¾ãã
å®é¨è¨ç»ãã¡ã¢ã¯ ã³ã³ãã³ã¼ããç½®ãã¦ãã github ã® issue ã slackãCSV ãã¼ã¿ã® EDA 㯠JupyterNotebook ãªã©è²ã
使ã£ã¦ããã®ã§ãããæéã¨ä¾¿çã®ãã©ã³ã¹ãã¹ãã·ã¯ã¡ããã©ããæãã§ä½¿ããããã£ãã§ããç°¡åãªãã£ã«ã¿ã¨ãã§ããã®ã good ã§ããã
Kaggle ã§å¾ããããã®
DS/ML ã®ãä»äº
ã¾ãããã§ããå½æã¯ä»ã»ã©ç¥ã®é«ééè·¯ãéã£ã¦ããããã¾ã Kaggle èªä½ã®ç¥å度 ãä»ã»ã©é«ããªãã£ãã§ããå®å
¨ã«å
è¡è
å©çã§ä»åãããã« Kaggle ã®æ績ã®ã¿ã§æªçµé¨ã§è»¢è·æ´»åããããã¨æãã¨ãã£ã¨è¯ãæ績ãæ±ããããå¯è½æ§ãããã®ã§ãããèªåã§é
ã¡ãã«ãç²å¾ã§ãããã¨ã MLE ã¨ãã¦ã®ãã¡ã¼ã¹ããã£ãªã¢ã«ç¹ããã¾ããã
ãã¼ã¿åæãæ©æ¢°å¦ç¿ ã®ç¥èã¨å®è£
çµé¨
Kaggle ã¯ã³ã¼ãã解æ³ãå
¬éããæåã¨ã¤ã³ã»ã³ãã£ã ãæ´ã£ã¦ãããå¦ã³ã®å ´ã¨ãã¦ç´ æ´ãããã§ããèè
å®è£
ãã¤ãã¦ããªãææ°ã®è«æã§ãã£ã¦ããä¸çã®èª°ãã Kaggle Notebook ã¨ãã¦å
¬éãã¦ãããã¨ãããã¾ãã0.001 ã®ç²¾åº¦ã競ãå ´ã§ããã®ã§ãç¾è¡ã³ã³ãã§ãã£ã¦ããææ°ã®ææ³ãåºãã¨ãã試ããã¾ããèªåã§è«æãèªãã§å®è£
ããã®ãé£ããã¦ãã誰ãã®å®è£
ãè¦ã¤ãã¦çä¼¼ããããã¨ã§ãã£ããã¢ããã®ãã¼ãã«ããããä¸ããã¾ããã
ã¾ãã競æã§ææãæ®ããæ¬å½ã«å¹ãææ³ãå¹çããå¦ã¹ã¾ããããå®åã§ã¯ 0.001 ã追æ±ããå¿
è¦ããªãã¨è¨ãæ¹å¤ãããã¾ããããã®éããªå´é¢ãããã¾ããä¸æ¹ã§ã0.001 ã追æ±ããããã®å¼ãåºããå¤æ°æã£ã¦ãããã¨ã§ãå®åã«ããã¦å¿
è¦ãªå®è£
ãã¯ã¤ãã¯ã«å¼ãåºããããã«ãªãã¾ããã
å人çãªçµé¨ã§ã¯ Kaggle ã¯å®åã®å½¹ã«ç«ã£ããªã¨æãã¾ãã
Linux ããã¼ãã¦ã§ã¢å¨ãã®ç¥è
Kaggle ã§ç»åã³ã³ãã NLP ã³ã³ãã«åå ããããã«èªåã® GPU ãµã¼ãã¼ãæ§ç¯ãã¾ããã
ã¯ããã¯ãã½ã³ã³å·¥æ¿ ã® BTO ã§çµã¿ç«ã¦æ¸ã¿ã®ãã®ãè³¼å
¥ãã¾ãããã³ã³ããã¤ã³ãã¬ããã«ã¤ãã¦ã©ãã©ãã¹ããã¯ã®è¯ãPCã欲ãããªãã¾ããç¬ GPU ãæ¿ãæ¿ããããã¡ã¢ãªãå¤ä»ãã®SSD ã追å ãããããã¼ãã®å¤æ´ã youtube ãªã©ã§ä¸ããåå¼·ãã¦èªåã§ããã¾ããã äºææ§ã®ãªããã¼ããééã£ã¦è²·ã£ãããé«ããã¼ããè²·ã£ãã®ã«åããªãã£ããè²ã
ãªãã©ãã«ãããã¾ãããé£ãããã¨ã¯å¤ãã§ãããèªå㧠PC ãçµããããã«ãªãæºè¶³æããã¯ã©ã¦ã ã®ã¤ã³ã¹ã¿ã³ã¹ ãä½æããã¨ãã«ããããããããã®ã¹ããã¯ããã¨èæãæ´ããããã«ãªã£ã¦ããã£ãã§ãã
ã¨ã«ããåæè²»ç¨ãããã¼ãã°ããããããã¨ã¨ã¡ãã£ã¨ãã©ãã«ãããã¨ãã£ã¡ãç¦ããã¨ãé¤ãã¦ãããã¨ã ãããªã®ã§ãGPU ãã·ã³ããªãã¨è¨ã£ã¦ããæ¹ã«ã¯ããªã³ãã¬ãã·ã³ã¯ãããï¼ãã¨ãããããã¦ãã¾ãã
ã¡ãã£ã¨åã®ããã°è¨äº
ã¾ããLinux ã®ç¥èã«ã¤ãã¦ã§ããèªåã§ãã·ã³ãæã¤ã¨åä½ããç°å¢ãèªåã®ç°å¢ã«ãã£ããã®ãèªåã§ç¨æããªããã°ãªãã¾ãããOS ã®ã¢ãããã¼ãã cuda ã®ã¢ãããã¼ããã³ã³ããä¸ã§ä½æ¥ãããªã Docker ãªã©æ°æã¡ãã Kaggle ããããã«è²ã
æºåãå¿
è¦ã§ãããã©ãã«ã·ã¥ã¼ããèªåã§ãããªãã¨ããã¾ãããå¤ä»ãã®SSD ãæ¥ã«èªèãããªããªã£ããããªããæ¥ã«é»æºãè½ã¡ãã... ããè¨ã£ããã®ãã·ã¹ãã log ãè¦ãããã³ãã³ãã§ä»ã¤ãªãã£ã¦ããã®ãã©ãã確èªãããããããªå ´é¢ã§ã³ãã³ãã使ãã¾ããæè¿ã 㨠ChatGPT ã«ä¸¸æãã§ãªãã¨ããªãããã«ãªãã¾ãããããã®è¾ºããèªåã§ãã£ããã¨ãããã¨ããæºè¶³æãããã¾ãã
å好ã®ä»²é
å½ããåã®ããã«æ¥åå¤ã®æéã使ã£ã¦å¹³æ¥ä¼æ¥åãã Kaggle ã«å¤¢ä¸ã«ãªã£ã¦ããçããã§ããèªåãåå ãã¦ããªãã³ã³ãã§ãã£ã¦ã X ã§é å¼µã£ã¦ãã姿ãè¦ãããè¯ãçµæãåºã¦ãããããã¨æ¬¡ã¯èªåãã¨å¥®èµ·ã§ãã¾ãããã³ã³ãã«éããããã°ãªã©ã§ç¥è¦ãå
±æãã¦ãã人ãããããã¦åºæ¿ããããã¦ãã¾ããã¾ãããã©ã¤ãã¼ãã§ä»²è¯ããã¦ããããå人ãã§ãã¾ããã社ä¼äººãªã£ã¦ä¹å·ããå身åºã¦ãã¦ãå人ããã¾ãããªãã£ãã¨ããããè²ããªæ¹ã¨ãªãã©ã¤ã³ã¤ãã³ããªã©ã§åºä¼ããåå¥ã«éã³ã«è¡ãããã«ãªãã¾ããã
ä»å¾
Kaggle
ã³ãã¥ããã£ã¸ã®æ©è¿ãã¨æ¬¡ã®ç®æ¨ãå
¼ã㦠Notebook, Discussion ã® GM ãç®æãããã¨æã£ã¦ãã¾ãã
Baseline Notebook ã®æ稿ããæ°ä»ãã Discussion ã«æ稿ãããªã©ãã£ã¦ããããã¨æãã¾ãã
ã³ã³ãã«ã¯å¼ãç¶ãåå ãã¾ãããæ°ã«ãªã£ãã³ã³ãã® latesub ãªã©ã«ãåãå
¥ãã¦ããããã§ãã
ä»ã®ã³ã³ãã¸ã®åå
ã½ãéå«ã4æãéãã¦ãã GM ã«ãªãã¾ã§ä»ã®ã³ã³ããã©ãããã©ã¼ã ã¸ã®åå ãããã¦ãã¾ããã
atmaCup ã yukiCup, probspace, nishika, solafune ãªã©è²ããªã³ã³ãã«åå ãããã§ãã
ä»äºã§ãå°ããã£ã¦ãã pubsub ãªã©ã使ã£ãè² è·åæ£ããéåå ãè¸çã»è»½éãªãã¼ã«ã«LLM ãç¨ãããªã³ã©ã¤ã³æ¨è«ãã§ãããããªã¢ããªã±ã¼ã·ã§ã³åºç¤ãªã©ãèªåã§ä½ããªããåå¼·ãããã¨æã£ã¦ãã¾ãã
ãããã«
æãã®ã»ãé·ç·¨ã«ãªã£ã¦ãã¾ãã¾ããç¬
ç®æ¨ã¨ãã¦ãã Kaggle Competitions Grandmaster ã«ãªãã¾ããã
èªåä¸äººã®åã§ã¯éæã§ããªãã£ãã§ããä»ã¾ã§ãã¼ã ãçµãã§ãã ãã£ãæ¹ãåå¼·ä¼ãæ親ä¼ã§ã話ããã¦ã¢ããã¼ã·ã§ã³ããã ãã£ãæ¹ãXã§ããã¨ãããã¦ãããçãããæå¾ã«ã³ã³ãçµç¤ã«å¤é£ãä½ããªã©ãã¦æ¯ãã¦ããã妻ããããã¨ããããã¾ããã
Kaggle ã¯ãããã£ããçæ´»ã®ä¸é¨ã«ãªã£ã¦ãã¾ãã¾ãããããããããããé·ã Kaggle ã¨ä»ãåã£ã¦ãããããªã¨æã£ã¦ãã¾ãã
ããã¾ã§èªãã§ãã ãããããã¨ããããã¾ãã mm
ããããããããããé¡ããããã¾ãï¼ï¼ð¸