ãè¿å¹´IoTã¸ã®é¢å¿ã大ããé«ã¾ã£ã¦ãã¾ãããã¤ã³ã¿ã¼ãããã«æ¥ç¶ãããå種ããã¤ã¹ããéç´ããã大éã®æ å ±ãæå¹æ´»ç¨ããã«ã¯ãéããããæ å ±ãããã«å¦çããããã«ã®ã¨ãªãã¾ãããApache Sparkãã¯ããããããã°ãã¼ã¿ãé«éã«å¦çããããã®æ±ç¨ã¨ã³ã¸ã³ã¨ãã¦ç¾å¨æ´»çºã«éçºãé²ãããã¦ãã¾ããæ¬ç¨¿ã§ã¯ãBluemixãã®æä¾ããApache Sparkãµã¼ãã¹ãå©ç¨ãã¦ãåæ©çãªApache Sparkã®å©ç¨æ¹æ³ããã³æ©æ¢°å¦ç¿ã©ã¤ãã©ãªãå©ç¨ããããã°ã©ã ã®å®è£ æ¹æ³ããç´¹ä»ãã¾ãã
ã¯ããã«
ãæ¬ç¨¿ã¯Bluemixã®ã¢ã«ã¦ã³ããææãã¦ããèªè ã対象ã¨ãã¦ãã¾ããBluemixã¯ååç»é²å¾ã¯ã¬ã¸ããã«ã¼ãã®ç»é²ãªãã§30æ¥éç¡æã§å©ç¨ãããã¨ãã§ããã®ã§ãã¾ã ã¢ã«ã¦ã³ãããæã¡ã§ãªãæ¹ã¯ã¾ãã¢ã«ã¦ã³ãç»é²ããããããã¾ããBluemixã®Apache Sparkãµã¼ãã¹ã«ã¯ããJupyter Notebookãããã¼ã¹ã¨ããã¤ã³ã¿ã©ã¯ãã£ããªãã¼ã¿åæç°å¢ãæä¾ããã¦ãããä»åã¯ããã®Notebookä¸ã«Pythonã§ããã°ã©ã ãè¨è¿°ãã¦ããã¾ããBluemixã§æä¾ããã¦ããåã½ããã¦ã§ã¢ã®ãã¼ã¸ã§ã³ã¯ä¸è¨ã®ã¨ããã§ãã
- Apache Spark 1.6.0
- Python 2.7.11
- Jupyter notebook 4.0.6
- iPython 4.0.1
Apache Sparkã¨ã¯
ãHadoop MapReduceã«ãã£ã¦å§ã¾ã£ãããã°ãã¼ã¿é©å½ã®ã®ã¡ãMapReduceã®å©ç¨ãåºã¾ãã«ã¤ãé¡å¨åãã¦ããããã©ã¼ãã³ã¹é¢ã®åé¡ãªã©ã解決ããããçã¾ããã®ãApache Sparkï¼ä»¥ä¸ãSparkï¼ã§ããSparkã¯ã2009å¹´UCãã¼ã¯ã¬ã¼ã®RAD Labã«ãããç 究ããã¸ã§ã¯ãã¨ãã¦èªçãã2010å¹´ã®3æã«ãªã¼ãã³ã½ã¼ã¹åã2013å¹´ã®6æã«ã¯Apache Software Foundationã¸ç§»è¡ããã¾ãããä»å¹´ã®7æ26æ¥ã«ã¯ç¾æç¹ã®ææ°ãã¼ã¸ã§ã³ã§ãã2.0.0ããªãªã¼ã¹ããã¦ãã¾ããSparkã®ç¹é·ã¨ãã¦ãå ¬å¼ãµã¤ãã§ã¯ä¸è¨ã®4ç¹ãæãããã¦ãã¾ãã
- Speedï¼Hadoop MapReduceã¨æ¯è¼ãã¦on-memoryã§100åãon-diskã§10åã®é度
- Ease of Useï¼è±å¯ãªAPIã«ããå種ããã°ã©ãã³ã°è¨èªï¼JavaãScalaãPythonãRï¼ã§ã®éçº
- Generalityï¼SQLãæ©æ¢°å¦ç¿ã©ã¤ãã©ãªãã°ã©ãå¦çãªã©ã®å¤æ§ãªå¦çã®çµã¿åãã
- Runs Everywhereï¼HadoopãMesosãã¹ã¿ã³ãã¢ãã³ãããã¯ã¯ã©ã¦ãã§ã®å®è¡
ãSparkã§ã¯ãæ§é åãã¼ã¿ãåãæ±ããã¨ãã§ãããSpark SQLããã¹ããªã¼ãã³ã°ã¢ããªã±ã¼ã·ã§ã³ã®ããã®ãSpark Streamingããã°ã©ãè¨ç®ã®ããã®ãGraphXããããã¦ä»åå©ç¨ããæ©æ¢°å¦ç¿ç¨ã©ã¤ãã©ãªãMLlibããç¨æããã¦ãã¾ãããããã®ã©ã¤ãã©ãªã¯ãã¹ã¦fig2ã®ããã«Spark Coreã¨å¼ã°ããåºç¤APIãæä¾ããã³ã³ãã¼ãã³ãã®ä¸ã«æ§ç¯ããã¦ãããä¾ãã°ã¹ããªã¼ãã³ã°ã½ã¼ã¹ããã®ãã¼ã¿ãæ©æ¢°å¦ç¿ã«ãã£ã¦ãªã¢ã«ã¿ã¤ã ã«åé¡ãããªã©ãç¸äºã«çµã¿åããããã¨ã§é«åº¦ãªãã¼ã¿åæãè¡ããã¨ãå¯è½ã§ãã