ãKNIMEããæè»½ã«æ©æ¢°å¦ç¿ãã¦ã¿ã¾ãããï¼ ãrandom_forest_regressionã
ä¹ ãã¶ãã«è¨äºãæ¸ãã¾ããäºå ã®è¦ªã¨ãªããå®¶ã®ãã¨ãå¿ãããã¤ã¤ãæè¿ã¯æ©æ¢°å¦ç¿ããããå§ãã¾ãããå§ããã¨ãã£ã¦ããããããã触ã£ã¦ãåããã¦ã¿ã¦ããç¨åº¦ããã£ã¡ãåãçµãã§ããæ¹ã«ã¯æããã¦ãã¾ãããã§ãããæ§ã ãªãã¼ã«ãåºåããã³ã¢ãã£ãã£åãé²ãã§ãããªãã§ãã¾ã£ããããã£ããã¨ããªãã¨ããã®ãè¯ããªãã¨æãã¾ãã¦ã
ä»åã¯ã"å¦ç¿ç¨ãã¼ã¿ãããå ´å"ã«å¯è½ï¼ãæè»½ç°¡åã«æ°å¤äºæ¸¬ãã§ããKNIMEã®ãã¼ã ãç´¹ä»ãããã¨æãã¾ãã
Â
Random Forest Learner & Predictor (regression)
 æè¿ã®æµè¡ã¯ããã£ã±ãdeeplearningã ã¨æãã®ã§ãããããã ããå ¨ã¦ã§ã¯ãªãã§ããç¾å¨ãç§ãDeepChemã使ã£ã¦DLã«ææ¦ãã¦ãã¾ãããwetã®ã±ãã¹ããæ°æ¥½ã«ã¢ã¯ã»ã¹ã§ããæãã§ã¯ããã¾ãã(ããã°ã©ãã³ã°çµé¨ã®ããæ¹ãªããä½ã¨ããªãããããã¾ããã)ã
ãããªä¸ãä»åã®æ¹æ³
KNIMEã使ã£ã¦ãECFP4ãå ¥åã¨ãã¦Random Forestã§å帰ãã
ã¯æ®ã©é ã使ããã¨ããªãã®ã§èª°ã§ã使ããï¼ã¨æè¨ã§ãã¾ã(ä½ããå¾ãããçµæã«ã¤ãã¦ã¯å人ã§å¤æãã¦ãã ããã)ãã§ã¯æ©éKNIMEã®ããã¼ãè¦ã¦ããã¾ãã
å ¨ä½å³ã¯ä¸ã®ãã£ããã£ã®éãã§ããä»åã®é¡æã¨ãã¦ã¯
http://deepchem.io.s3-website-us-west-1.amazonaws.com/datasets/delaney-processed.csv
ããgetã§ããdelaney_processed.csvã使ã£ã¦ãããã¨æãã¾ãã
CSV Reader
åºæ¬çã«ã¯èªã¿ããcsvã®å ´æãInput locationã«æå®ããã ãã§OKãªã®ã§ãããä»åã®å ´åãcolumn(å)ã®ãããã¯ãããã©ãrow(è¡)ã®ãããã¯ãªãã®ã§ãHas Row Headerã®ãã§ãã¯ã¯å¤ãã¦ããã¾ããã¾ãèªã¿è¾¼ããã¡ã¤ã«ã«ã¯SMILESãå«ã¾ããã®ã§ãããã©ã«ãã§Comment Charã«å ¥ã£ã¦ãã#ãæ¶ãã¦ããã¾ããããä¸éçµåãå ¥ã£ã¦ããã¨smilesã䏿ãèªããªããªã£ã¡ããããã§ãã
Molecule Type Cast
ååæ§é ã«é¢ããè¨è¿°ãããcolumnã®å±æ§ãç´ãã¦ããããã¼ãã§ããã¤ã¾ããä»åèªã¿è¾¼ãã csvã«ã¯æ§é ãsmilesè¨æ³ã§ãããã¦ãã¾ããããã¼ã¿ã®å±æ§ã¨ãã¦ã¯string(æåå)ã¨ãã¦èªèããã¦ãã¾ãããããKNIMEã«ãããã¯smilesã ãï¼æ§é å¼ã ãï¼ãã¨åããããããã®å¦çã§ãã
ãã¼ããéãåå¾ã®ãã¼ãã«å 容ãè²¼ãä»ãã¦ããã¾ãã
Fingerprints (CDK)
ååæ§é ã®å ¥ã£ã¦ããcolumnã¨ãåãåºããããfingerprintãæå®ãã¾ããä»åã¯ECFP4ãæ¡ç¨ãã¾ããã
Â
Partitioning
ãã¼ã¿ãäºã¤ã«åãã¾ããä»åã®è¨å®ã¯è¦ã¦ã®éããã©ã³ãã ã«80%(åºå£â¶ï¸ä¸)ã¨20%(åºå£â¶ï¸ä¸)ã«åå²ãã¾ãã80%ãtrain_datasetã20%ãããã«ååã«ãã¦valid_dataset, test_datasetã«ãã¦ãã¾ãã
ãã®å¾ãnode8, 10, 11ã®String Manipulationã使ã£ã¦ãådatasetã«ã©ãã«ãã¤ãã¾ãã(columnåã¯split)ãã¾ãããã®ä½æ¥ã¯ãªãã¦ããããã§ããã©ãã



Â
Rondom Forest Learner
ã©ããªå¤ãäºæ¸¬ãããã®ãï¼ãTarget Columnã«æå®ãã¾ããä»åã¯æº¶è§£åº¦ã§ãããã§ãå ã»ã©åºããECFP4ãå ¥åã«ä½¿ãã®ã§Use fingerprint attributeã«ãã§ãã¯ãããã¦ãcolumnãæå®ãã¾ããfingerprintãããããªãã¦ãä»ã®ãã©ã¡ã¼ã¿ãå ¥åã«ãããå ´åã¯Use column attributesã«ãã§ãã¯ãå ¥ãã¦ãä¸ã®ç·æ å ã«ä½¿ãcolumnãå ¥ãã¦ãã ããã*1
è¨å®ã¯ä»¥ä¸ãªãã§ãããè¨å®ç»é¢ã®ä¸ã®æ¹ã«(ãã£ããã£ã§ã¯åãã¦ã¾ããããããªããã)Forest Optionã¨ãã¦Number of modelsã¨ããã®ãããã¾ãããããå¢ããã°modelã®æ°ãå¢ããã®ã§ãã¼ã¿éãå¤ããªã£ãæã«ã対å¿ã§ãã¾ãã
ããã§å®è¡ããã¨ãããäºæ¸¬ã¢ãã«ãåºæ¥ä¸ãã£ã¡ããã¾ãã
Â
Random Forest Predictor
åºæ¥ä¸ãã£ãã¢ãã«ã«äºæ¸¬ãããããã¼ã¿ãå½ã¦ããã¼ãã§ããlearnerã®åºå£ç°è²â ãpredictorã®å ¥å£ç°è²â ã«ã¤ãªãã§ãããã¨ã§modelã®åãæ¸¡ããè¡ããã¾ããmodelã«å½ã¦ãããã¼ã¿ã¯predictorå ¥åä¸å´â¶ï¸ã«å ¥ãã¦ããã¦ãã ãããä¾ã§ã¯ä¸ã®predictor(node 4)ã«valid_datasetããä¸ã®predictor(node 12)ã«test_datasetãå ¥ãã¦ãã¾ããpredictorãï¼åããã¾ãããâ ã«æµãã¦ããmodelã¯åããã®ãªã®ã§ã両è ã¯å ¨ãåãã§ãã
è¨å®ã¯ãäºæ¸¬å¾ã®å¤ãå ¥ããcolumnåãæ±ºããã ãã§ããããã©ã«ãã®ã¾ã¾ãå«ãªãã°ãchange prediction column nameã«ãã§ãã¯ãå ¥ãã¦ãä¸ã®å ¥åæ¬ã«å¥½ããªååãæå®ãã¦ããã¦ãã ããã
Â
çµæã¯ãããªæãã§ãï¼
2D/3D Scatterplotãå©ç¨ãã¦å ã®å¤(x)ã¨äºæ¸¬å¤(y)ãplotãã¾ããã


ã¡ãã£ã¨ç´°ããã§ãããvalidã®r2_scoreã0.6348, testã®r2_scoreã0.6982ã¨ãªãã¾ãããã¾ãã¾ãã§ã¯ãªãã§ããããï¼(ã¡ãªã¿ã«trainã¯r2=0.6558ãmodelã®æ°ãå¢ããã¦ã¿ã¾ããããtrain_scoreã¯ä¸ããã¾ããã§ããï¼ãåèã«ãªããå¦ãã¯ãããã¾ããããã¨ã«ãããæè»½ã§ããé£ããèãããã¡ãã£ã¨ä½¿ã£ã¦ã¿ã¦ã¯ã©ãã§ãããï¼
Â
ãã¾ã(deepchem, graphconv)
åããã¼ã¿ã»ããã使ã£ã¦ãdeepchemã®graphconvolutionã使ã£ã¦åããã¨ããã£ã¦ã¿ã¾ããã


Â å ¨ãåããã¼ã¿ã»ããã§ããåãæ¹ãåãã§ããåèã¾ã§ã«ãtrain_datasetã®æåã®æ¹ã®idãè¼ãã¦ããã¾ããã
ã§ãå®éã«åããã¦ã¿ãçµæããã®å¾ã®ãã£ããã£ã§ããã³ã¼ãã£ã³ã°ã¨ãã»ã¨ãã©ãã£ããã¨ãªãã®ã§ãå³ã¨ãã¦ããè²¼ãä»ããã§ãã¾ããã§ããâ¦ã
ç§ãåå¼·ä¸ãªãã§ã詳細ã«ã¯è§¦ãã¾ãããr2_scoreãç´0.85ãç´æ¥æ¯è¼ãã¦è¯ããå¾®å¦ã§ã¯ããã¾ããããã¡ãã®æ¹ãè¯ãçµæã§ã(ã¡ãªã¿ã«ããã©ã¡ã¼ã¿ã®æé©åçã¯ãã¦ãã¾ãã)ããããã£ãã¹ãã«ã身ã«ã¤ãã¦ãããã¨ã¢ããã¼ã·ã§ã³ãä¸ããã¾ãã
Â
ãããï¼
æ§ã ãªäºæ¸¬ææ³ã»ã¢ãã«ãä¸è½ã§ããã¨ã¯æã£ã¦ãã¾ãããå®éã«ä½¿ã£ã¦ã¿ããã¨æã£ãæã«ã
- ã©ããªdatasetãç¨æã§ãããï¼
- datasetã®åãåã(train, valid, test)ã¯é©åãï¼
ãªã©ãªã©ãå¦ç¿åã«æ°ã«ããªãããããªããã¨ãããã¨æãã¾ããæ°ããæè¡ãæ£ããæ´»ç¨ã§ããããã«çè§£ãæ·±ãã¦ãããã°ãããªã¨æã£ã¦ãã¾ããã¾ãããã®ä¸ã§ãã©ããªå ´é¢ã§æ´»ç¨ã§ãããããã¤ã¡ã¼ã¸ã§ããããã«ãªããããã®ã§ãã
Â
ãæè»½ç°¡åæ³ã説æãããã¨ã§ã"ä»ã®æ¹æ³ã®ãè¯ãçµæã®ã§ããã®ããã"ã¨ããã¨ã使ã£ã¦ã¿ããã¨ããæ°ãåãããããããã¾ããããããããç°¡åã«ä½¿ããï¼ã¢ã¯ã»ã¹ã§ããï¼ãã¨ããã®ã¯ã¨ã¦ãéè¦ãªãã¤ã³ãã§ã*2ãKNIMEã«ã¯RFR以å¤ã«ãæ§ã ãªæ©æ¢°å¦ç¿ãã¼ããããã¾ããè²ã 試ãã¦ã¿ã¦ã¯ãããã§ããããï¼
*1:fingerprintã¨ä½µç¨ãããå ´åã¯ã[010]ã§ç¤ºããã¦ããfingerprint(屿§ï¼bit vector)ãExpand bit vectorãã¼ãã使ã£ã¦tableã®ã«ã©ã ã¨ãã¦å±éãã¡ããã°ä½¿ããã¨æãã¾ããå¤åãã
*2:ä¸å¿ãGPUæè¼æ©ã使ã£ã¦ãã¾ããCPUãã·ã³ã§ã¯ã¡ãã£ã¨graphconvã¯éãæ°ãããã®ã§ãã¾ãæè³ãå¿ è¦ã ã£ãããã¾ã