2015-01-01ãã1å¹´éã®è¨äºä¸è¦§
(Photo via VisualHunt.com) åãä¼å¡ã«ãªã£ã¦ãData Science Central*1ã«ããããªé¢ç½ãè«èª¬ãè¼ã£ã¦ããã®ãè¦ã¤ãã¾ããã Data science and statistical modeling will be further automated, with better black-box products Frontiers between data scâ¦
(Photo credit: Team Lane via Visual hunt / CC BY-ND) ç§äºã§ãããæ¬æ¥ããã£ã¦2å¹´åå¤ãã¦ããæ ªå¼ä¼ç¤¾ãªã¯ã«ã¼ãã³ãã¥ãã±ã¼ã·ã§ã³ãº(RCO)ãéè·ãã¾ãããã¾ãåæã«ãªã¯ã«ã¼ãã°ã«ã¼ããããé¢ãããã¨ã«ãªãã¾ãã*1ãæ£å¼ã«ã¯12æ31æ¥å¤§æ¦æ¥ãéè·â¦
(Photo via VisualHunt) 追è¨2017å¹´3æç¾å¨ã®ææ°æ¸ç±ãªã¹ãã¯ãã¡ãã§ãã æè¿ã«ãªã£ã¦ã¾ãè²ã ã¨ãã¼ã¿ãµã¤ã¨ã³ãã£ã¹ããç®æã人åãã®ãè¦ãæ¸ç±ãªã¹ãã¨ãè³æãªã¹ãã¨ããåºã¦ãã¦ããã§ãããå人çã«ã¯ä½ãã¨æãã¨ãããããã®ã§åãé©å½ã«ã¾ã¨ãâ¦
æ¬æ¥ã®è¼ªèªä¼ã§åãæ å½ããè«æã®ã¡ã¢ã©ã³ãã ã¨ãããã¨ã§ãç½®ãã¨ãã¾ãã æ¦è¦ Gradient Boosted Feature Selection (Xu, Huang, Weinberger and Zheng, KDD 2014)ã¿ã¤ãã«ã示ãããã«ç¹å¾´éé¸æãããããã¨ããã®ã第ä¸ã®ã¢ããã¼ã·ã§ã³ã§ããããgraâ¦
å æ¥éå¹ããNIPS2015ã§ãã*1ãããè¨ãã°ãµã¤ãä¸ã«å ¨è«æã®ã¿ã¤ãã«ï¼èè ä¸è¦§ããããªã¨æãåºããã®ã§ããã ã¨ãããã¨ã§ãããã¾ã§ã®4åã®ã°ã©ãã»ãããã¯ã¼ã¯åæç¹éã§å¦ãã ãã¨ããã®èè ä¸è¦§ã«å¿ç¨ãã¦ã¿ãããã¨æãã¾ãããã£ããã¨ã¯ãããâ¦
ãããã¯ã¼ã¯å ¨ä½ææ¨ã¯ãã¾ããã¸ãã¹çã«æ±ããã¨ãå¤ããªãã®ã§ã代ããã«ä»åã¯ã³ãã¥ããã£æ¤åºï¼è¦ã¯ã°ã©ãæ§é å ã§ã®ã¯ã©ã¹ã¿ãªã³ã°ï¼ã«ã¤ãã¦åãä¸ãã¾ãããã ãååã¾ã§åèã«ãã¦ããããããã¯ã¼ã¯åæãã¯ãã¾ãã³ãã¥ããã£æ¤åºã«ã¤ãã¦ããâ¦
ãã¸ãã¹çã«éè¦åº¦ãé«ãã®ããã®è¾ºã®è©±é¡ã§ã¯ãªãããªï¼ã¨ãããã¨ã§ãä»åã¯ä¸å¿æ§(centrality)ã®è©±é¡ãåãä¸ãã¦ã¿ããã¨æãã¾ããåèæç®ã¯ãã¤ãéããã¡ãã ãããã¯ã¼ã¯åæ (Rã§å¦ã¶ãã¼ã¿ãµã¤ã¨ã³ã¹ 8)ä½è : é´æ¨åª,éæå²åºç社/ã¡ã¼ã«ã¼: å ±â¦
å æ¥ãã¡ãã®å¦çãã¼ã¿åæã³ã³ãã®è¡¨å½°å¼ã«ããã¬ã¼ã³ã¿ã¼ï¼è§£èª¬è ã¨ãã¦ç»å£ãã¦ã¾ããã¾ãããæ£ç´è¨ã£ã¦ããã¼ã¿ãæä¾ãã¦ä¸ãã£ãData Stadium社ã®çæ§ããããããã»ã©ã¾ã§ã®çµæã«ãªãã¨ã¯ãã¨ããæåã®å£°ãä¸ããã»ã©ãã¤ã¬ãã«ãªæ¦ãã¶ãã§ãåâ¦
ã¡ãã£ã¨åã«ãã¯ã¤ã³ã®å³ããã¨ãã¼ã¿ãµã¤ã¨ã³ã¹ãã¨ãããé¡ã§è©±ãã¦ããããã§ããã å®ã¯ãåå¤éã¢ãã«ã¨ããåã®éå 主義ãvs.ãå¤å¤éã¢ãã«ã«åºã¥ããã¼ã¿ãµã¤ã¨ã³ã¹ãã¨ãããã¼ããä¸è²«ãã¦ç½®ãã¦ããã®ã§ããããã¾ãããã«ã¹ãããã©ã¤ããå½ãâ¦
ååã®è¨äºã«å¼ãç¶ã主ã«{igraph}ã®åé¢æ°ã§éã³ãªããã°ã©ãçè«ã»ãããã¯ã¼ã¯åæãå¦ã¶ãã®ã·ãªã¼ãºã§ãããä»åã¯æ§ã ãªãã¼ãéã®ç¹å¾´éã«ã¤ãã¦è¦ã¦ã¿ã¾ãããã¡ããä»åãåèæç®ã¯ãã¡ãã ãããã¯ã¼ã¯åæ (Rã§å¦ã¶ãã¼ã¿ãµã¤ã¨ã³ã¹ 8)ä½è : é´â¦
ã¡ãã£ã¨åã«è©±é¡ã«ãªã£ã¦ããã§ãããä½ã§ãCRANã«ç¢ºççå¾é éä¸æ³(Stochastic Gradient Descent)ãå®è£ ãã{sgd}ã¨ããããã±ã¼ã¸ãå ¬éããã¦ããããã§ãJSSæ²è¼äºå®ã®Vignetteãããã¿ããã§ãã CRAN - Package sgd Stochastic gradient decent methoâ¦
ã¡ãã£ã¨èå³ã湧ãã¦ããã®ã§ãä»å¾ãã°ããã°ã©ãçè«ã»ãããã¯ã¼ã¯åæã«åãå ¥ãã¦ã¿ããããªã¨æã£ã¦ã¾ããã¨ãããã¨ã§ãã¬ã»ãã¼ã©ãã«ãã®æåæ§ã«ãªã¼ãã³ãã¼ã¿ã»ãããåã£ã¦ãã¾ãããã Network data ä»å使ãã®ã¯"Neural network"ãããã¯ï¼â¦
Taste of Wine vs. Data Science from Takashi J OZAKI å æ¥ãã¨ããåå¼·ä¼ã§è©±ãã¦ããå 容ããã¡ãã§ãããã¿ã¨ãã¦ã¯ããçãããåããã§ããããã以åæ¸ããè¨äºã®ç¶ãã¿ãããªãã®ã§ãã ããç¨åº¦èªåçã«ãã¤ã¹ãã£ã³ã°ã»ã¹ã³ã¢ãä»ããããã°ãä¸ã®â¦
岩波ãã¼ã¿ãµã¤ã¨ã³ã¹ Vol.1ä½è : 岩波ãã¼ã¿ãµã¤ã¨ã³ã¹åè¡å§å¡ä¼åºç社/ã¡ã¼ã«ã¼: 岩波æ¸åºçºå£²æ¥: 2015/10/08ã¡ãã£ã¢: åè¡æ¬ï¼ã½ããã«ãã¼ï¼ãã®ååãå«ãããã° (4件) ãè¦ã å®ã¯ãã®åè¡å§å¡ä¼ã«æ¨å¹´ã®ç§é ï¼ã«æããã¾ãã¦ããã£ã¨æ°´é¢ä¸ã§ãããâ¦
å»å¹´ãåã趣æ¨ã®å¦çãã¼ã¿ãµã¤ã¨ã³ãã£ã¹ãæ¥æ¬ä¸æ±ºå®æ¦ã®PRè¨äºæ¸ãã¾ããããä»å¹´ãæ²ããã«æ¸ãã¾ãã å»å¹´ã¯æ¥æ¬ã®ããéçã®ãã¼ã¿ã»ããã使ãã¾ããããä»å¹´ã¯Jãªã¼ã°ã®ãã¼ã¿ã»ããã使ãã¾ããä»å¹´ã3ä½ã¾ã§ã®å ¥è³è ã«ã¯ç·é¡18ä¸åã®è³éãåºãã¨â¦
å æ¥ãã¨ãããã¼ã¿åæãã¬ã¼ã ã¯ã¼ã¯*1ã®å¶æ¥ããã¡ã®ãã¼ã ã®äººãã¡ãåããããã§ãå¾ã§è²ã 話ãèãã¾ããã ä½ã§ããã®ãããã¯ãã¯åç«ããå½éãã¼ã¿åæã³ã³ã*2ä¸ä½å ¥è³è ã¨ããçè ãã¡ãä½ã£ã代ç©ã ããã§ã宣ä¼æå¥ããã¢ã«ããã°ããã¼ã¿ã®åâ¦
æ¬å½ã¯ä»é±ã¯ã¨ãããªã¯ã¨ã¹ããããã ããé¢ä¿ã§doc2vecã®è¨äºã§ãæ¸ãããã¨æã£ã¦ãããã§ãããäºæ³ä»¥ä¸ã«åå¦çã«é£åãã¦éã«åãããããªãã®ã§ä»åã¯å¥ã®è©±é¡ã§ãè¶ãæ¿ãã¾ãï¼ç¬ï¼ã ããã¯ãã¯ã¤ã³ã¨ãã¼ã¿åæã¨ã®é¢ä¿ã«ã¤ãã¦ãã¨ããã®ããã·â¦
ä½ããããªã¡ãã£ã¢è¨äºãåºã¦ããããã§ãã ãããèªãã§è²ã ãªäººãããã³ããå ¥ãã¾ãã£ã¦ãã模æ§ã§ããããã®è¨äºã®ä¸æè°ãªã¨ããã¯ãå®å ¨ã«ééã£ã説æã¨ããããã§ããªãã®ã«ä½æ ãï¼ä¸¡åéã«è©³ããï¼èª°ãèªãã§ãççãªéåæãè¦ãããã¨ãããªãâ¦
ã¿ã¤ãã«ãèªãã§åã®å¦ãã§ãããåä½åæ§ã«*1æè©ããããæ£å¼çºå£²æ¥ããåã«è¦æ¬ååããæµè´ããã ãã¾ããã ãã¼ã¿ãµã¤ã¨ã³ãã£ã¹ãé¤æèªæ¬ æ©æ¢°å¦ç¿å ¥éç·¨ (Software Design plus)ä½è : æ¯æ¸å°å¹³,é¦¬å ´éªä¹,éæ´å¹³,æ¸å¶é¾å,å¾å± èª ä¹,ç¦å³¶ç太æ,å è¤â¦
ä»æ¥ã®ãã¡ã®ãã¼ã ã®è¼ªèªä¼ã§"A Safe Screening Rule for Sparse Logistic Regression" (Wang et al., NIPS2014)ãèªãã§ãã*1ã®ã§ããã®æã®è³æãã¤ãã§ã«ãã¡ãã«ãupãã¦ããã¾ãã ãªãããã®è«æã®çè ã®GitHubã¨ãã«å®è£ ä¸ãã£ã¦ãªãããªã¼ã¨æã£ã¦â¦
ã¨ãããã¨ã§å é±åï¼éé±1åï¼ã®ããã°æ´æ°ãã¹ããããã¦è¡ã£ã¦ã¾ããã¾ããã夢ã®é½ããªãåã«ã¨ã£ã¦ã¯åãã¦ã®ã¨ã¼ãããã§ããªããã¤ããæè¿ã§ã¯ä¹ ãã¶ãã®å®å ¨ãã©ã¤ãã¼ãã§ã®æµ·å¤æ è¡ã ã£ãã®ã§*1ãã ãã¶æºå«ãã¦ã¾ããã¾ããã ï¼ãã¼ãã«ãã 大â¦
ã¨ãããã¨ã§ä»é±åã¯ãä¼ã¿ã§ããã¾ãåæ¥é±ããªã1æç®ã¯ã´ã§ã«ãµã¤ã¦å®®æ®¿ã®é¡ã®éã2æç®ã¯ãµã³=ã¸ã§ã«ãã³ã»ãã»ãã¬ã§è¦ããããã®åããã¢ã³ããªã¢ã³ãã¨ããåã®ã«ãã§ã§ãï¼ç¬ï¼ã
æ¬æ¥8æ6æ¥ã«é§å ´ã§éãããæ¥æ¬çæ å¦ä¼é¢æ±å°åºä¼å ¬éã·ã³ãã¸ã¦ã ãéã¬ã¦ã¹æ§ï¼éç·å½¢æ§ï¼é対称æ§ããã®å ææ¨è«ææ³ï¼ãã®ä½¿ãã©ããã»åçã»å®è£ ãå¦ã¶ãé称å æãã§ã¹ã«ã¦ãGrangerå æã«ã¤ãã¦è©±ãã¦ãã¾ããã ã¡ãªã¿ã«äºåã«æ岳彦(id:takehiko-â¦
å æ¥ask.fmã§ãããªè³ªåãããã ããã®ã§ãã£ã¦ã¿ã¾ããã ä¸åè¡¡ãã¼ã¿ã®åé¡ã«ã¤ãã¦ããã°ãæè¦ãã¾ããã ä¸èº¾ãªè³ªåã§æ縮ã§ãããæ£ä¾ã®å°ãªãä¸åè¡¡ãã¼ã¿ãRandomforestã§2å¤åé¡ãè¡ãéã«ãã¦ã§ã¤ãã使ãã®ã§ããã°ãåç´ã«Probã§åºåãããï½¢æ£â¦
ã¿ããªã®R ?ãã¼ã¿åæã¨çµ±è¨è§£æã®æ°ããæç§æ¸?ä½è : Jared P. Lander,Tokyo.Rï¼ååï¼,é«æ³æ ä¸,ç§å±±å¹¸å²,ç°ç°é«å¿åºç社/ã¡ã¼ã«ã¼: ãã¤ããçºå£²æ¥: 2015/06/30ã¡ãã£ã¢: åè¡æ¬ï¼ã½ããã«ãã¼ï¼ãã®ååãå«ãããã° (2件) ãè¦ã ã¨ãããã¨ã§ã訳è ã®â¦
ãã¼ã¿åæããã»ã¹ (ã·ãªã¼ãº Useful R 2)ä½è : ç¦å³¶ç太æ,éæå²åºç社/ã¡ã¼ã«ã¼: å ±ç«åºççºå£²æ¥: 2015/06/25ã¡ãã£ã¢: åè¡æ¬ãã®ååãå«ãããã° (1件) ãè¦ã èè ã®ç¦å³¶ãããããæµè´ããã ãã¾ãããã¨ãããã¨ã§ãæ©éæ¸è©ããã¦ããã ãã¾ãã
ãã®ã·ãªã¼ãºãååã¯UCIãªãã¸ããªã§ã¯ãªããã¼ã¿ã»ããã使ã£ã¦ãã¾ã£ã¦æ¬ç¾©ã«æãå 容ã«ãªã£ã¦ãã¾ã£ãã®ã§ï¼ç¬ï¼ãä»åã¯UCIã®ãã¼ã¿ã»ããã使ã£ã¦ã¿ããã¨ã«ãã¾ãããã®ãã¼ã¿ããã¡ãã Credit Approval Data Set Data set descriptionãè¦ãã¨ããâ¦
ããã¼ã¿ãµã¤ã¨ã³ãã£ã¹ãã¯ã¤ããããã注ç®è·ç¨®ãæ±äººãå¤ããªãçç± ãªããªãåºæ¿çãªç¿»è¨³è¨äºãåºã¦ããããã§ãã¡ãªã¿ã«æ¬å®¶çã®åå ¸è¨äºã¯ãã¡ãã Data science jobs not as plentiful as all the hype indicates 大ä½ãããã話ãããã¨ããçä¸ããâ¦
ä»ãKaggleãKDD cup以ä¸åã ããæ©æ¢°å¦ç¿ã³ã³ãã§çµ¶å¤§ãªäººæ°ãèªãåé¡å¨ãXgboost (eXtreme Gradient Boosting)ãç¹ã«Kaggleã®Higgs Boson Machine Learning Challengeã®åªåãã¼ã ãé§ä½¿ãããã¨ã§æåã«ãªã£ãæãããããã§ã ãã®å®è£ ã§ãããC++ãã¼â¦
ããæ°æ¥Kaggleã®Ottoãææ½°ãã«ãã£ã¦ã¿ããããé½åã§{xgboost}ãåææ¦ãã¦ã¿ããã§ããããã®ã¤ã³ã¹ãã¼ã«ã®éã«ççã«ãã©ãã£ãã±ã¼ã¹ãå¹¾ã¤ããã£ãã®ã§åå¿é²çã«è¨äºã«æ¸ãèµ·ããã¦ããã¾ããå®ã¯è±èªåã§ããã¾ãã¾ã¨ã¾ã£ãè¨äºããªãã¨ããããå â¦