ãKNIME ãscikit-learnãKNIMEã§åãããããt-SNEãé¡æã«ã
ãã¿ãªããã年度æ«ãè¿ã¥ãã¦ãã¾ãããè²ã ã¨å¿ããæ¥ã ã§ããä½èª¿ã«ã¯æ°ãã¤ããªãã¨ããã¾ããï¼ç§ã¯ãå æ¥çºç±ãã¾ããâ¦ï¼ã
Â
ãã¦ãã¦ãä»åã¯KNIMEä¸ã§pythonæ¸ãã¦ãscikit-learnãåããã¦ã¿ã¾ãã
ã§ç´¹ä»ããå¯è¦åããt-SNEã使ã£ã¦ãã£ã¦ã¿ã¾ãã
Â
ãã®åã«ï¼
KNIMEä¸ã§pythonãã©ããã£ã¦ä½¿ãã®ï¼ã«è»½ããµãã¾ãã
ã¾ããpython3ã使ããããã°ãKNIME3.5以ä¸ãè¯ãæ°ããã¾ã(3.4ã§ãããããã§ãã)ã
File >> preferencesã¨é²ãã§ãæ¤ç´¢çªã«pythonã¨å ¥ããã°ãä¸ã®ãããªè¨å®ç»é¢ã«ãã©ãçãã¾ããããã«ä½¿ãããpythonã®å®è¡ãã¡ã¤ã«ãæå®ããã°OKã§ããç§ã®å ´åã¯anacondaã使ã£ã¦pythonç°å¢ãç¨æãã¦ããã®ã§ãææã®envã®binã®ä¸ã®pythonãæå®ãã¦ãã¾ãã
ããã§ãç§ã®ãããªã©ç´ 人ï¼ååç©æ å ±ãæ±ãããï¼windowsã¦ã¼ã¶ã¼ã®æ¹!!!
RDKitã£ã¦ã®ãå ¥ãããã¨ã«ãªãã¨æãã¾ããanacondaç°å¢ã§ããã°
conda install -c rdkit rdkit
ãããã°installã§ãã¾ãããã¿ã¼ããã«ã»ã³ãã³ãããã³ãããjupyterã§ã¯åãã¯ãã§ããã§ãã§ããKNIMEä¸ã ã¨å¤ååããªãã§ããåããããªãã¨ãè¨ã£ã¦ãã人ããã¾ããã®ã§ããªã³ã¯ãè²¼ã£ã¦ããã¾ãã
ç§ãlinux(centos7)ãmac(sierra)ã§ã¯ç¹ã«æ°ã«ãªãã¾ããã§ããããwin7ã§ã¯ãã¾ãããã¾ããã§ããã®ã§ãã·ã¹ãã ç°å¢å¤æ°ã®Pathã«ãanacondaç³»ã®è¶³ããªãé¨åãå¼·å¼ã«è¿½è¨ãã¾ãã(å¼·å¼ã ã¨æãã®ã§èªå·±è²¬ä»»ã§ãé¡ããã¾ãã)ã
ç°å¢å¤æ°ã®è¶³ããªãé¨åã¯
import os
print(os.environ["PATH"])
ãrdkitãåãç°å¢ã¨ãåããªãç°å¢(KNIME)ã§ããã¹ã¦ããã°ãããã¨æãã¾ãã*1
Â
åç½®ãã¯ãã®ãããã«ãã¾ãã
ä»åã®å ¨ä½åã¯æ¬¡ã®ãããªãã®ã§ãã
Â
以åã¨åæ§ã«NS5Bããªã¡ã©ã¼ã¼é»å®³å¤é¢ãã5ã¤ã®æç®ã«è¨è¼ããã¦ããååç©ããæ§é ãã¼ã¹ã§ã¯ã©ã¹ã¿ãªã³ã°ãã¦ããã¾ããéå»è¨äºã§ã¯KNIMEã®MDSãã¼ããå©ç¨ãã¦æ¬¡å ãåæ¸ã(対distance matrix)å¯è¦åãè¡ãã¾ããããä»åã¯distance matrixãæã¾ããfingerprintã«å¯¾ãã¦ç´æ¥t-SNEãç¨ãã¦æ¬¡å ãè½ã¨ããå¯è¦åãã¦ã¿ã¾ãã
t-SNE
次å åæ¸æ³ã®ä¸ã¤ã§ã人æ°ãªãã ããã§ããä¸è¨ãµã¤ããé°å²æ°ãæ´ã¿ãããã£ãã§ãã
perplexityã¨ãã調æ´å¯è½ãªãã©ã¡ã¼ã¿â¦(ä¸ç¥)â¦ã¯ããããã£ã±ã«è¨ãã¨ããã¼ã¿ã®å±æçãªç¹æ§ã¨å ¨ä½çãªç¹æ§ã®ã©ã¡ããããèæ ®ãããããã®ãã©ã³ã¹ã表ãã¦ãã¾ãã
ã¨æ¸ããã¦ããããã®ãã©ã¡ã¼ã¿ãåãããªããå¯è¦åããã£ã¦ã¿ã¾ãã
 SDFãèªãã¨ãã(SDF Reader)ã¯å²æãã¾ããããã¡ã¤ã«ã®ä¸èº«ã¯ãããªæãã§ãã
ååç©ã®IDã«ç¸å½ããã«ã©ã ããªãã®ã§ãæ¯åº¦ããªãã¿String Manipulationã使ã£ã¦ãæ°ãã«ï¼ã¤ã«ã©ã ãä½æãã¾ããã
ãcompound_ãã¨ããæååã¨1ããå§ã¾ãæ°åã足ãã¦IDã¨ããååã®ã«ã©ã ã«å ¥ãããã¨ã«ãã¾ãããjoinã¯æååã®é£çµãªã®ã§ãæ°åé¨åã¯string()ã§ããã£ã¦æååã«ãã¦ãã¾ããã¾ããROWINDEXã¯ãã¼ãã«å ã®è¡æ°(ã¹ã¿ã¼ãã¯ã¼ã)ãªã®ã§ãROWINDEX+1ã®æå³ããã¨ããã¯ã1ããå§ã¾ãè¡çªå·ãã§ããããã¾ã§ãå®è¡ããã¨æ¬¡ã®ããã«ãªãã¾ãã
Â
ã¤ã¥ãã¦canonical smilesãæ¸ãã¾ãã
pythonå ã§ã¯rdkitã使ã£ã¦æ§é ãæ±ãã®ã§ããªãã¨ãªãKNIMEã®ãã¼ããRDkit Canon SMILESã使ãã¾ãããè¨å®ã¯ç°¡åã§ãæ§é æ å ±ãå ¥ã£ã¦ããã«ã©ã ãRDKit Mol columnã«æå®ãã¦ãSMILESãæ¸ãåºãã«ã©ã åãä½ã«ãããNew column nameã«æå®ããã°OKã§ããÂ
å®è¡ããã¨æ¬¡ã®ããã«ãªãã¾ãã
æ§é å¼ãï¼ã¤ã«ãªãã¾ããããSDFã¨SMILESãªã®ã§è¦ãæ¹ã¯åãã§ããä¸èº«ã¯éãã¾ããæååã§è¡¨ç¤ºããã¦ã¿ã¾ããã
ã«ã©ã åã®ä¸ã§å³ã¯ãªãã¯ããã¨è¡¨ç¤ºã¡ãã¥ã¼ãã§ãã®ã§ãSDF StringãStringãé¸æããã¨ãã®éãããããã¨æãã¾ãã
Â
Parameter Oprimization Loop Start
ã¡ãã£ã¨èéã«é¸ãã¦ãloop startãå ã«ç´¹ä»ãã¾ããå ã«è¿°ã¹ãéããä»åã¯perplexityã¨ãããã©ã¡ã¼ã¿ãåãããªããå¯è¦åãè¡ãã¾ããããã§ããã®å¤ãpã¨ããå¤æ°ã«ãããã¨ã¨ãã¦ãéå§ã10ãçµäºã90ã§10å»ã¿ã«æ°åãçºçããã¦ã«ã¼ããåãã¦ãã¨ããæ¡ä»¶è¨å®ã«ä¸ã®ãã£ããã£ã§ã¯è¨å®ãã¦ããæ§åããããã¨æãã¾ãã
for i in numpy.arange(10, 100, 10):
ã¿ãããªæãã§ãã
Python Script
å ¥ãã®â¶ï¸ã¨åºå£ã®â¶ï¸ã®ãã¿ã¼ã³ãéãããã¤ããããã¾ãããä»åã¯1=>2ã使ãã¾ãã
ã¬ã¤ã¢ã¦ãã¯ãããªæãã§ããã¹ã¯ãªãããæ¸ãç»é¢ãã¨ãã£ã¿ã«ãªã£ã¦ãã¦ãå°ããã¤åãã確èªããªããæ¸ãã¦ãããã¨ãã§ãã¾ãããã¼ãèªä½ãå®è¡ãã¦ããããã§ã¯ãªãã®ã§ããã®ã¨ãã£ã¿ã®ä¸ã§runãã¦ããoutputã®â¶ï¸ã«ã¯ãã¼ã¿ã¯ããã¾ããã
ç»é¢ä¸å¤®ã«ã¯ã¹ãªãããæ¸ãã¾ããä¸èº«ã¯é°å²æ°ã§æ¸ãã¦ãã®ã§ãééã£ã¦ããããããªãããããããã®ååºãã§ãããçµæ§æ¥ããããã§ããã
ç»é¢å·¦ã«ã¯KNIMEã®ããã¼ã§æµãã¦ããæ å ±ããã£ã¦ã使ç¨ãããã¨ãã§ãã¾ããä¸å´ãinput_tableãä¸å´ãflow_variablesã¨ããDataFrameã ã¨ããã¤ã¡ã¼ã¸ãã¨æãã¾ãããã®ã¨ãã£ã¿å ã§ã¡ãã£ã¨åããã¦ã¿ã¾ãã
é¸æããè¡ãExecute selected linesãã¯ãªãã¯ãã¦å®è¡ãã¾ãããããã¨ç»é¢å³å´ã«èªã¿è¾¼ãã ã¢ã¸ã¥ã¼ã«ããå®ç¾©ãããªã¹ããªã©ãªã©ã表示ããã¾ããã¾ãããããããã«ã¯ãªãã¯ããã¨æ¨æºåºåã¨ãªã¢ã«å 容ã表示ããã¾ããä¾ã§ã¯refsãããã«ã¯ãªãã¯ããå¾ã®æ§åããã£ããã£ãã¾ããã
ã¾ããç§ãå«ãã¦ãã¾ãã¯æ £ãããã§ãã
Â
å®è¡å¾ã®çµæã¯æ¬¡ã®ããã«ãªãã¾ãã
å·¦ãä¸â¶ï¸ãå³ãä¸â¶ï¸ã§ãã
Joinerã使ã£ã¦ï¼ã¤ããã£ã¤ãã¦ããã¾ããã
Â
Â
Color Manager
plotãè¦ããããªãããã«ï¼ä»åã®å ´åãè«æãã¨ã«è²ã¥ãããã¨ä¸æãåããããã¨ã¯éå»è¨äºã®å 容ã§åãã£ã¦ããã®ã§ï¼è²ãã¤ãã¾ããã
åå²å ã®string manipulationã§ã¯plotã®ã¿ã¤ãã«ãã¤ããããã«ä¸ã®ãããªå¦çããã¦ãã¾ããããã®string manipulationã¯variableç¨ã®ã使ã£ã¦ããã¨ããããã¤ã³ãã§ãã(åºå ¥ãå£ã赤丸)ã
Â
次å ãè½ã¨ãã¦ä½æããcol1ã¨col2ããããããããã®ã次ã®ãã£ããã£ã§ããä¸ã®ä¾ã¯loopã®æçµiterationãçµãã£ãã¨ããªã®ã§preplexity=90ã§ãã
image to tableã使ã£ã¦ãplotã®ç»åãtableã«å ¥ãã¦ããã¾ãã
ãããloopãã¦ããã®ã§ãloop endãçµãã£ãå¾ã¯æ¬¡ã®ããã«ãªãã¾ãã
å ¨ã¦ã®plotããã¼ãã«ã®åãè¾¼ã¿ç»åã¨ãã¦æºã¾ã£ã¦ãã¾ããä»åã®loop endã¯2portã®ãã®ã使ãã¾ãããä¸â¶ï¸ããå ¥ã£ããã¼ã¿ã¯ä¸â¶ï¸åºå£ãããä¸â¶ï¸ããå ¥ã£ããã¼ã¿ã¯ä¸â¶ï¸åºå£ããåãåºããã¾ãã
molecule type castã§SMILES(æåå)ãæ§é ã¨ãã¦èªèããã¦ãã¾ããcolumn filterã§ã¯ãããªãã«ã©ã ãæ¨ã¦ã¾ãããloop endã®ä¸â¶ï¸åºå£ãè¦ã¦ã¿ã¾ãããã
ãããªæãã§ãå ¨ã¦ã®çµæãtableã¨ãã¦ä¿æã§ãã¾ãã®ã§ã好ã¿ã®æ¡ä»¶ã®iterationãæå®ãã¦ãã£ã«ã¿ã¼å¾ãå¦çãç¶ãããã¨ãªã©ãã§ããã¨æãã¾ãããç´¹ä»ã¾ã§ã
Â
æå¾ã«grid to tableã§ã
æå®ããã«ã©ã ãgridã«ãã¾ããä»åã¯Grid Column Countãï¼ã«æå®ãã¾ããã説æããããè¦ãæ¹ãã¯ããã§ããçµæã®ãã£ããã£ãè¼ãã¾ãã
ã¯ãããããããã¨ã§ã(èªå½ä¸è¶³)ã
perplexityã®å¤ã«ãã£ã¦ãããããã®æ§åãå¤åãã¦ãããã¨ããããã¾ããå¤ãå°ããã»ããå±æã«è¦ç¹ããããã§ãããï¼perplexity=20~40ãããã§ã¯ãªã¬ã³ã¸ãï¼ã¤ã«å²ãã¦ããããã«è¦ãã¾ããå®éä¸èº«ãè¦ã¦ã¿ãã¨ãï¼ã¤ã®ç³»çµ±ãããããã«è¦ãã¾ãããè«æã§ã¯5~50ãããã®è¨å®ãæ¨å¥¨ãã¦ããããã§ããååã®MDSã«ããã¹ã¦ãã¹ããã¨åããã¦ããå°è±¡ãåãã¾ããã
ãã®ããã«t-SNEãçµæ§åªç§ã§ããæ°åä¸ã¯ã©ã¹ã®ãã¼ã¿ã«ãªãã¨ãããã¾ããããï¼å°ãªãã¨ãä»åã®ä¾ã§ã¯ï¼ã600ååç©ã®ã¯ã©ã¹ã¿ãªã³ã°ãããåå¦çã¨ãã¦ãè¦è¦çã«å ¨ä½ãæãããã¨ãã§ãã¾ãããå ã«ãå°ãè¿°ã¹ã¾ãããããã®ä¸ã§æ°ã«å ¥ã£ãperplexityã§å¾ãããcol1,col2ã使ã£ã¦å度ã¯ã©ã¹ã¿ãªã³ã°ããã°åãããããã®ã§ã¯ï¼ã¨æãã¾ãã
Â
KNIMEã®æ ãè¶ ããã¨ããã«ã¯ãããããªä¾¿å©ãªãã®ãã¾ã ã¾ã 沢山ããããã§ããå°ããã¤èº«ã«ã¤ããããããã«é å¼µãã¾ãã
Â
ã追è¨ãScatter Plotã®ã¿ã¤ãã«ãå¤æ°ã使ã£ã¦æå®ããæ¹æ³(2018.04.16)
id:rkakamilanããã®ãææéãã"å¤æ°ãã¿ã¤ãã«ã«ãã"é¨åãæ¸ãã¦ãªãã£ãã®ã§è¿½è¨ãã¾ãã
ç¹å¥ãåã£ãæä½ã¯å¿ è¦ãªãã¦ãä»åã®å ´å
Scatter Plotãã¼ãã®è¨å®ç»é¢å ä¸é¨ã®ã¿ããã"Flow Variables"ãé¸ãã å¾ãchartTitleã®é¨åã«ææã®å¤æ°ãæå®ããã°OKã§ã(ãã®è¨äºå ã§ã¯titleã¨ããååã®å¤æ°)ï¼
大ä½ã®KNIMEãã¼ãã§ã¯è¨å®ç»é¢ã«Flow Variablesã®ã¿ããç¨æããã¦ããã®ã§ããã®ã®ä¸ã§ãããã£ã½ãã¨ããã«æå®ãã¦ããã°ããã¡ããæãã§ãã
Â
Â
*1:ããã«ãã©ãçãã®ã«æ°æéããã£ããåããããã®ç°å¢ãç¨æããã®ã§ä¸è¦å´ã§ãã