Streaming Processing for Big Data - ãªãœã‹æ•°å¦è€…ã«ã¯ãƒ¯ã‚¤ãƒ³å¥½ããŒå¤šã„

ãƒ“ãƒƒã‚°ãƒ‡ãƒ¼ã‚¿ãŒæµè¡Œã£ã¦ã„ã‚‹ã®ã§ï¼Œå¤§é‡ãƒ‡ãƒ¼ã‚¿ã‚’ã‚¹ãƒˆãƒªãƒ¼ãƒŸãƒ³ã‚°ã«semi-realtimeã«å‡¦ç†ã§ãã‚‹ãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã‚’æŽ¢ã—ã¾ã—ãŸï¼Ž

æ¡ä»¶ã¯ï¼Œ

å‡¦ç†ã®å–ã‚Šã“ã¼ã—ãŒç„¡ã„ã“ã¨
ã§ãã‚‹ã ã‘ã‚·ãƒ³ãƒ—ãƒ«ã§ã‚ã‚‹ã“ã¨
å„ç¨®è¨€èªžã¸ã®ãƒã‚¤ãƒ³ãƒ‡ã‚£ãƒ³ã‚°ãŒã‚ã‚‹ã“ã¨

ãªã©ã§ã™ï¼Ž

ãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã¨ã—ã¦ã¯ï¼ŒDempsyï¼ŒStormï¼ŒEsperï¼ŒStreambaseï¼ŒHStreamingï¼ŒYahoo S4ãªã©ãŒã‚ã‚Šã¾ã™ãŒï¼Œé †æ¬¡èª¿ã¹ã¦ã„ãã¾ã™ï¼Ž

Dempsy

æ¯”è¼ƒçš„æ–°ã—ã„ï¼ŒDempsy.
http://dempsy.github.com/Dempsy/

What is Dempsy?
In a nutshell, Dempsy is a framework that provides for the easy implementation Stream-based, Real-time, BigData applications.
Dempsy is the Nokia's "Distributed Elastic Message Processing System."
Dempsy is Distributed. That is to say a dempsy application can run on multiple JVMs on multiple physical machines.
Dempsy is Elastic. That is, it is relatively simple to scale an application to more (or fewer) nodes. This does not require code or configuration changes but allows the dynamic insertion and removal of processing nodes.
Dempsy is Message Processing. Dempsy fundamentally works by message passing. It moves messages between Message processors, which act on the messages to perform simple atomic operations such as enrichment, transformation, or other processing. Generally an application is intended to be broken down into more smaller simpler processors rather than fewer large complex processors.
Dempsy is a Framework. It is not an application container like a J2EE container, nor a simple library. Instead, like the Spring Framework it is a collection of patterns, the libraries to enable those patterns, and the interfaces one must implement to use those libraries to implement the patterns.

Dempsyã¨ã¯ï¼Ÿ
ç°¡å˜ã«è¨€ã†ã¨ï¼ŒDempsyã¯ã‚¹ãƒˆãƒªãƒ¼ãƒ åž‹ã§ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã®BigDataã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã®å®¹æ˜“ãªå®Ÿè£…ã‚’æä¾›ã™ã‚‹ãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã§ã™ï¼ŽDempsyã¯Nokiaã®ã€Œåˆ†æ•£åž‹æŸ”è»Ÿåž‹ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸å‡¦ç†ã‚·ã‚¹ãƒ†ãƒ ã€ã§ã™ï¼Ž
ãƒ»Dempsyã¯åˆ†æ•£åž‹ã§ã™ï¼Žã¤ã¾ã‚Šï¼Œdempsyã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã¯è¤‡æ•°ã®ç‰©ç†ãƒžã‚·ãƒ³ä¸Šã®è¤‡æ•°ã®JVMä¸Šã§å®Ÿè¡Œã™ã‚‹ã“ã¨ãŒã§ãã¾ã™ï¼Ž
ãƒ»Dempsyã¯æŸ”è»Ÿåž‹ã§ã™ï¼Žã¤ã¾ã‚Šï¼Œã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚’ã‚¹ã‚±ãƒ¼ãƒ«ã•ã›ã‚‹ãŸã‚ã«ãƒŽãƒ¼ãƒ‰ã‚’å¢—æ¸›ã•ã›ã‚‹ã“ã¨ãŒæ¯”è¼ƒçš„ã‚·ãƒ³ãƒ—ãƒ«ã«å¯èƒ½ã§ã™ï¼Žã‚³ãƒ¼ãƒ‰ã‚„è¨å®šã®å¤‰æ›´ç„¡ã—ã«ï¼Œå‹•çš„ã«å‡¦ç†ãƒŽãƒ¼ãƒ‰ã®è¿½åŠ ã‚„å‰Šé™¤ãŒå¯èƒ½ã§ã™ï¼Ž
ãƒ»Dempsyã¯ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸å‡¦ç†ã§ã™ï¼Žã¤ã¾ã‚Šï¼ŒDempsyã¯åŸºæœ¬çš„ã«ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ãƒ‘ãƒƒã‚·ãƒ³ã‚°ã«ã‚ˆã£ã¦å‹•ãã¾ã™ï¼Žä¸€èˆ¬ã«ï¼Œã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã¯å°‘ãªã„å·¨å¤§ã§è¤‡é›‘ãªå‡¦ç†ã§ã¯ãªãå°ã•ã„ã‚·ãƒ³ãƒ—ãƒ«ãªå‡¦ç†ã«åˆ†è§£ã•ã‚Œã‚‹ã‚ˆã†æ±‚ã‚ã‚‰ã‚Œã¾ã™ï¼Ž
ãƒ»Dempsyã¯ãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã§ã™ï¼ŽJ2EEã‚³ãƒ³ãƒ†ãƒŠã®ã‚ˆã†ãªã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚³ãƒ³ãƒ†ãƒŠã‚„å˜ç´”ãªãƒ©ã‚¤ãƒ–ãƒ©ãƒªã§ã¯ã‚ã‚Šã¾ã›ã‚“ï¼Žãƒ‘ã‚¿ãƒ¼ãƒ³ã®é›†åˆã§ã‚ã‚‹Spring Frameworkã®ã‚ˆã†ã«ï¼Œãƒ‘ã‚¿ãƒ¼ãƒ³ã‚’å®Ÿç¾ã™ã‚‹ãƒ©ã‚¤ãƒ–ãƒ©ãƒªã¨ã‚¤ãƒ³ã‚¿ãƒ¼ãƒ•ã‚§ãƒ¼ã‚¹ã¯ï¼Œãƒ‘ã‚¿ãƒ¼ãƒ³ã‚’å®Ÿè£…ã™ã‚‹ãŸã‚ã«ãƒ©ã‚¤ãƒ–ãƒ©ãƒªã‚’ä½¿ã£ã¦å®Ÿè£…ã™ã‚‹å¿…è¦ãŒã‚ã‚Šã¾ã™ï¼Ž

What Problem is Dempsy solving?
Dempsy is not designed to be a general purpose framework, but is intended to solve a certain class of problems while encouraging the use of the best software development practices.
Dempsy is meant to solve the problem of processing large amounts of "near real time" stream data with the lowest lag possible; problems where latency is more important that "guaranteed delivery." This class of problems includes use cases such as:
Real time monitoring of large distributed systems
Processing complete rich streams of social networking data
Real time analytics on log information generated from widely distributed systems
Statistical analytics on real-time vehicle traffic information on a global basis
It is meant to provide developers with a tool that allows them to solve these problems in a simple straightforward manner by allowing them to concentrate on the analytics themselves rather than the infrastructure. Dempsy heavily emphasizes "separation of concerns" through "dependency injection" and out of the box supports both Spring and Guice. It does all of this by supporting what can be (almost) described as a "distributed actor model."
In short Dempsy is a framework to enable decomposing a large class of message processing applications into flows of messages to relatively simple processing units implemented as POJOs

Dempsyã¯ã©ã®ã‚ˆã†ãªå•é¡Œã‚’è§£æ±ºã—ã¦ã„ã¾ã™ã‹ï¼Ÿ
Dempsyã¯æ±Žç”¨çš„ãªãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã¨ã—ã¦è¨è¨ˆã•ã‚ŒãŸã‚ã‘ã§ã¯ã‚ã‚Šã¾ã›ã‚“ï¼Žã—ã‹ã—ãªãŒã‚‰å„ªã‚ŒãŸã‚½ãƒ•ãƒˆã‚¦ã‚§ã‚¢é–‹ç™ºæ¥å‹™ã‚’æŽ¨å¥¨ã™ã‚‹æ„å‘³ã§ï¼Œã‚ã‚‹ç¨‹åº¦ã®ç¯„å›²ã®å•é¡Œã‚’è§£æ±ºã§ãã‚‹ã‚ˆã†è¨è¨ˆã•ã‚Œã¦ã„ã¾ã™ï¼Ž
Dempsyã¯å¤§é‡ã®ã€Œã»ã¼ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã®ã€ã‚¹ãƒˆãƒªãƒ¼ãƒ ãƒ‡ãƒ¼ã‚¿ã‚’ï¼Œæœ€å°é™ã®é…å»¶å¯èƒ½æ€§ã§å‡¦ç†ã™ã‚‹ã‚ˆã†ãªå•é¡Œã‚’è§£æ±ºã™ã‚‹ã‚ˆã†ï¼Œè¨è¨ˆã•ã‚Œã¦ã„ã¾ã™ï¼Žç‰¹ã«ï¼Œé…å»¶ãŒã€Œä¿è¨¼ã•ã‚ŒãŸçµæžœå‡ºåŠ›ã€ã‚ˆã‚Šã‚‚é‡è¦ãªå•é¡Œã‚’æƒ³å®šã—ã¦ã„ã¾ã™ï¼Žã“ã®ã‚ˆã†ãªç¯„ç–‡ã®å•é¡Œã«ã¯ï¼Œæ¬¡ã®ã‚ˆã†ãªã‚‚ã®ãŒã‚ã‚Šã¾ã™ï¼Ž

å·¨å¤§ãªåˆ†æ•£ã‚·ã‚¹ãƒ†ãƒ ã®ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ç›£è¦–

ã‚½ãƒ¼ã‚·ãƒ£ãƒ«ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚ãƒ³ã‚°ãƒ‡ãƒ¼ã‚¿ã®ãƒªãƒƒãƒã‚¹ãƒˆãƒªãƒ¼ãƒ ã®å‡¦ç†å®Œäº†

åºƒç¯„å›²ã«åˆ†æ•£åŒ–ã•ã‚ŒãŸã‚·ã‚¹ãƒ†ãƒ ã‹ã‚‰ç”Ÿæˆã•ã‚ŒãŸãƒã‚°æƒ…å ±ã®ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ è§£æž

å¤§åŸŸçš„ãªãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ è»Šä¸¡é€šè¡Œæƒ…å ±ã®çµ±è¨ˆè§£æž

ã“ã®æ„å‘³ã™ã‚‹ã¨ã“ã‚ã¯ï¼Œã‚¤ãƒ³ãƒ•ãƒ©ã‚ˆã‚Šã‚‚è§£æžãã®ã‚‚ã®ã«é–‹ç™ºè€…ãŒé›†ä¸ã§ãã‚‹ã‚ˆã†ï¼Œã‚·ãƒ³ãƒ—ãƒ«ã§ç›´æŽ¥çš„ãªæ‰‹æ³•ã§å•é¡Œã‚’è§£æ±ºã§ãã‚‹ãƒ„ãƒ¼ãƒ«ã‚’æä¾›ã—ã¦ã„ã‚‹ã¨ã„ã†ã“ã¨ã§ã™ï¼Ž
Dempsyã¯ã€Œä¾å˜æ€§æ³¨å…¥ã€(依存性の注入 - Wikipedia)ã‚’é€šã—ãŸã€Œé–¢å¿ƒã®åˆ†é›¢ã€(関心の分離 - Wikipedia)ã¨ï¼ŒSpringã¨Guiceã®æž ã‚’è¶…ãˆãŸã‚µãƒãƒ¼ãƒˆã‚’å¼·ãæ”¯æ´ã—ã¾ã™ï¼Žã“ã‚Œã¯é€šå¸¸ã€Œåˆ†æ•£ã‚¢ã‚¯ã‚¿ãƒ¢ãƒ‡ãƒ«ã€ã¨å‘¼ã°ã‚Œã‚‹ã‚’ã‚µãƒãƒ¼ãƒˆã™ã‚‹ã“ã¨ã«ã‚ˆã‚Šå®Ÿç¾ã•ã‚Œã¦ã„ã¾ã™ï¼Ž
ã¾ãŸï¼Œä¸€è¨€ã§è¨€ã†ã¨ï¼ŒDenpsyã¯ã‚ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ãƒ‘ãƒƒã‚·ãƒ³ã‚°ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã®åºƒç¯„å›²ãªé ˜åŸŸã‚’æ¯”è¼ƒçš„ã‚·ãƒ³ãƒ—ãƒ«ãªPOJOs(Plain Old Java Object - Wikipedia)ã§å®Ÿè£…ã•ã‚Œã‚‹å‡¦ç†å˜ä½ã®ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ãƒ•ãƒãƒ¼ã«åˆ†é–¢ã™ã‚‹ã“ã¨ã‚’å®Ÿç¾ã™ã‚‹ãƒ•ãƒ¬ãƒ¼ãƒ ãƒ¯ãƒ¼ã‚¯ã§ã™ï¼Ž

Storm

ãã—ã¦æˆ‘ã‚‰ãŒæœ¬å‘½ï¼ŒStorm.
Home · nathanmarz/storm Wiki · GitHub
http://storm-project.net/documentation.html

Rationale
The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There's no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing.
However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a "Hadoop of realtime" has become the biggest hole in the data processing ecosystem.
Storm fills that hole.
Before Storm, you would typically have to manually build a network of queues and workers to do realtime processing. Workers would process messages off a queue, update databases, and send new messages to other queues for further processing. Unfortunately, this approach has serious limitations:
Tedious: You spend most of your development time configuring where to send messages, deploying workers, and deploying intermediate queues. The realtime processing logic that you care about corresponds to a relatively small percentage of your codebase.
Brittle: There's little fault-tolerance. You're responsible for keeping each worker and queue up.
Painful to scale: When the message throughput get too high for a single worker or queue, you need to partition how the data is spread around. You need to reconfigure the other workers to know the new locations to send messages. This introduces moving parts and new pieces that can fail.
Although the queues and workers paradigm breaks down for large numbers of messages, message processing is clearly the fundamental paradigm for realtime computation. The question is: how do you do it in a way that doesn't lose data, scales to huge volumes of messages, and is dead-simple to use and operate?
Storm satisfies these goals.

ç†è«–çš„æ ¹æ‹
ã“ã“10å¹´ã§ãƒ‡ãƒ¼ã‚¿å‡¦ç†é©å‘½ãŒã‚ã‚Šã¾ã—ãŸï¼ŽMapReduceï¼ŒHadoopï¼Œãã—ã¦ãã‚Œã‚‰ã«é–¢é€£ã™ã‚‹æŠ€è¡“ã¯ä»¥å‰ã«ã¯è€ƒãˆã‚‰ã‚Œãªã‹ã£ãŸã‚ˆã†ãªã‚¹ã‚±ãƒ¼ãƒ«ã§ã®ãƒ‡ãƒ¼ã‚¿ã‚¹ãƒˆã‚¢ã¨å‡¦ç†ã‚’å¯èƒ½ã¨ã—ã¾ã—ãŸï¼Žã—ã‹ã—ãªãŒã‚‰ï¼Œã“ã‚Œã‚‰ã¯ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã‚·ã‚¹ãƒ†ãƒ ã§ã¯ãªãï¼Œãã®ã‚ˆã†ã«è¨è¨ˆã‚‚ã•ã‚Œã¦ã„ã¾ã›ã‚“ã§ã—ãŸï¼ŽHadoopã‚’ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã‚·ã‚¹ãƒ†ãƒ ã«é©ç”¨ã™ã‚‹å„ªã‚ŒãŸæŠ€è¡“ã¯ã‚ã‚Šã¾ã›ã‚“ï¼Žãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ãƒ‡ãƒ¼ã‚¿å‡¦ç†ã¯ãƒãƒƒãƒå‡¦ç†ã¨ã¯åŸºæœ¬çš„ã«å¿…è¦ãªã‚‚ã®ãŒé•ã†ã®ã§ã™ï¼Ž
ã—ã‹ã—ï¼ŒèŽ«å¤§ãªã‚¹ã‚±ãƒ¼ãƒ«ã®ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ãƒ‡ãƒ¼ã‚¿å‡¦ç†ã¯ã¾ã™ã¾ã™ãƒ“ã‚¸ãƒã‚¹ã«å¿…è¦ã¨ã•ã‚Œã¦ãã¦ã„ã¾ã™ï¼Žã€Œãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã®Hadoopã€ã®æ¬ å¦‚ã¯ï¼Œãƒ‡ãƒ¼ã‚¿å‡¦ç†ç•Œéšˆã§æœ€ã‚‚å¤§ããªè½ã¨ã—ç©´ã¨ãªã£ã¦ã„ã¾ã™ï¼Ž
Stormã¯ï¼Œã“ã®ç©´ã‚’åŸ‹ã‚ã¾ã™ï¼Ž
Stormä»¥å‰ã§ã¯ï¼Œãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ã®ãŸã‚ã«ã‚ãƒ¥ãƒ¼ã¨ãƒ¯ãƒ¼ã‚«ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã‚’è‡ªèº«ã§æ§‹ç¯‰ã™ã‚‹å¿…è¦ãŒã‚ã‚‹ã§ã—ã‚‡ã†ï¼Žãƒ¯ãƒ¼ã‚«ã¯ã‚ãƒ¥ãƒ¼ã®ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã‚’å‡¦ç†ã—ï¼Œãƒ‡ãƒ¼ã‚¿ãƒ™ãƒ¼ã‚¹ã‚’æ›´æ–°ã—ï¼Œæ¬¡ã®å‡¦ç†ã®ãŸã‚ã«ä»–ã®ã‚ãƒ¥ãƒ¼ã«ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã‚’é€ã‚‹ã“ã¨ã«ãªã‚‹ã§ã—ã‚‡ã†ï¼Žæ®‹å¿µãªãŒã‚‰ï¼Œã“ã®ã‚¢ãƒ—ãƒãƒ¼ãƒã¯æ·±åˆ»ãªé™ç•ŒãŒã‚ã‚Šã¾ã™ï¼š

ã¤ã¾ã‚‰ãªã„ï¼šé–‹ç™ºæ™‚é–“ã®å¤šãã‚’ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸é€ä¿¡ï¼Œãƒ¯ãƒ¼ã‚«ã®ãƒ‡ãƒ—ãƒã‚¤ï¼Œå¾…ã¡è¡Œåˆ—ã®ãƒ‡ãƒ—ãƒã‚¤ã«è£‚ã‹ãªã‘ã‚Œã°ãªã‚Šã¾ã›ã‚“ï¼Žæœ¬æ¥æ°—ã«ã‹ã‘ã‚‹ã¹ããƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ãƒã‚¸ãƒƒã‚¯ã¯ï¼Œãƒ—ãƒã‚°ãƒ©ãƒŸãƒ³ã‚°ã«å¯¾ã—ã¦ç›¸å¯¾çš„ã«ãƒ‘ãƒ¼ã‚»ãƒ³ãƒ†ãƒ¼ã‚¸ãŒå°ã•ããªã‚Šã¾ã™ï¼Ž

ã‚‚ã‚ã„ï¼šãƒ•ã‚©ãƒ¼ãƒ«ãƒˆãƒ»ãƒˆãƒ¬ãƒ©ãƒ³ã‚¹æ€§ãŒã‚ã‚Šã¾ã›ã‚“ï¼Žãƒ¯ãƒ¼ã‚«ã¨ã‚ãƒ¥ãƒ¼ãŒæ´»å‹•ã—ç¶šã‘ã•ã›ã‚‹ã“ã¨ã«è²¬ä»»ã‚’æŒã¤å¿…è¦ãŒã‚ã‚Šã¾ã™ï¼Ž

ã‚¹ã‚±ãƒ¼ãƒ«ãŒå¤§å¤‰ï¼šã‚·ãƒ³ã‚°ãƒ«ã®ãƒ¯ãƒ¼ã‚«ã‚„ã‚ãƒ¥ãƒ¼ã«å¯¾ã™ã‚‹ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã®ã‚¹ãƒ«ãƒ¼ãƒ—ãƒƒãƒˆãŒéžå¸¸ã«é«˜ããªã£ãŸå ´åˆï¼Œãƒ‡ãƒ¼ã‚¿ãŒã†ã¾ãã°ã‚‰ã¾ã‹ã‚Œã‚‹ã‚ˆã†ã«åˆ†å‰²ã—ãªã‘ã‚Œã°ãªã‚Šã¾ã›ã‚“ï¼Žãã®ãŸã‚ã«ã¯ä¸€éƒ¨åˆ†ã®ç§»å‹•ã‚„æ–°è¦ã®éƒ¨åˆ†ã‚’å°Žå…¥ã™ã‚‹ã“ã¨ã«ãªã‚Šã¾ã™ãŒï¼Œãã‚ŒãŒä¸Šæ‰‹ãåƒãã¨ã¯é™ã‚Šã¾ã›ã‚“ï¼Ž

ã‚ãƒ¥ãƒ¼ã¨ãƒ¯ãƒ¼ã‚«ã®æ¦‚å¿µã¯å¤§é‡ã®ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã«å¯¾ã—ã¦ã¯å¤±æ•—ã™ã‚‹ã«ã‚‚é–¢ã‚ã‚‰ãšï¼Œãƒ¡ãƒƒã‚»ãƒ¼ã‚¸å‡¦ç†ã¯æ˜Žã‚‰ã‹ã«ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ã®ãŸã‚ã®åŸºæœ¬æ¦‚å¿µã§ã™ï¼Žã™ã‚‹ã¨å•é¡Œã¯ï¼Œã€Œãƒ‡ãƒ¼ã‚¿æå¤±ç„¡ãï¼Œå¤§é‡ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã«å¯¾ã—ã‚¹ã‚±ãƒ¼ãƒ«ã—ï¼Œéžå¸¸ã«ã‚·ãƒ³ãƒ—ãƒ«ã«åˆ©ç”¨ã‹ã¤æ“ä½œã™ã‚‹ã«ã¯ã©ã†ã™ã‚Œã°è‰¯ã„ã®ã§ã—ã‚‡ã†ã‹ï¼Ÿã€ã¨ã„ã†ã“ã¨ã§ã™ï¼Ž
Stormãªã‚‰ï¼Œã“ã‚Œã‚‰ã®ç›®çš„ã‚’é”ã™ã‚‹ã“ã¨ãŒã§ãã¾ã™ï¼Ž

Why Storm is important
Storm exposes a set of primitives for doing realtime computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm's primitives greatly ease the writing of parallel realtime computation.
The key properties of Storm are:
Extremely broad set of use cases: Storm can be used for processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more. Storm's small set of primitives satisfy a stunning number of use cases.
Scalable: Storm scales to massive numbers of messages per second. To scale a topology, all you have to do is add machines and increase the parallelism settings of the topology. As an example of Storm's scale, one of Storm's initial applications processed 1,000,000 messages per second on a 10 node cluster, including hundreds of database calls per second as part of the topology. Storm's usage of Zookeeper for cluster coordination makes it scale to much larger cluster sizes.
Guarantees no data loss: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4.
Extremely robust: Unlike systems like Hadoop, which are notorious for being difficult to manage, Storm clusters just work. It is an explicit goal of the Storm project to make the user experience of managing Storm clusters as painless as possible.
Fault-tolerant: If there are faults during execution of your computation, Storm will reassign tasks as necessary. Storm makes sure that a computation can run forever (or until you kill the computation).
Programming language agnostic: Robust and scalable realtime processing shouldn't be limited to a single platform. Storm topologies and processing components can be defined in any language, making Storm accessible to nearly anyone.

ãªãœStormãŒé‡è¦ãªã®ã§ã—ã‚‡ã†ã‹ï¼Ÿ
Stormã¯ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ã‚’è¡Œã†ãŸã‚ã®ãƒ—ãƒªãƒŸãƒ†ã‚£ãƒ–é›†åˆã‚’æ˜Žã‚‰ã‹ã«ã—ã¾ã™ï¼ŽMapReduceãŒä¸¦åˆ—ãƒãƒƒãƒå‡¦ç†ã‚’é©šãã»ã©ç°¡å˜ã«æ›¸ã‘ã‚‹ã‚ˆã†ã«ã™ã‚‹ã‚ˆã†ã«ï¼ŒStormã®ãƒ—ãƒªãƒŸãƒ†ã‚£ãƒ–ã¯ä¸¦åˆ—ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ã®è¨˜è¿°ã‚’é©šãã»ã©ç°¡å˜ã«ã—ã¾ã™ï¼Ž
Stormã®ç‰¹å¾´çš„ãªæ€§è³ªã¯ä»¥ä¸‹ã®ã‚ˆã†ãªã‚‚ã®ã§ã™ï¼š

åˆ©ç”¨ç¯„å›²ã®é©šç•°çš„ãªåºƒã•ï¼šStormã¯ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã®å‡¦ç†ã¨ãƒ‡ãƒ¼ã‚¿ãƒ™ãƒ¼ã‚¹ã®æ›´æ–°(ã‚¹ãƒˆãƒªãƒ¼ãƒŸãƒ³ã‚°å‡¦ç†)ã«ä½¿ã†ã“ã¨ãŒã§ãã¾ã™ï¼Žãã®éš›ï¼Œãƒ‡ãƒ¼ã‚¿ã‚¹ãƒˆãƒªãƒ¼ãƒŸãƒ³ã‚°ã®é€£ç¶šçš„ãªã‚¯ã‚¨ãƒªã‚’ç™ºè¡Œã—ï¼Œçµæžœã‚’ã‚¯ãƒ©ã‚¤ã‚¢ãƒ³ãƒˆã«ã‚¹ãƒˆãƒªãƒ¼ãƒŸãƒ³ã‚°ã—ï¼ˆé€£ç¶šçš„å‡¦ç†ï¼‰ï¼Œã‚µãƒ¼ãƒã‚¯ã‚¨ãƒªã®ã‚ˆã†ãªå¼·åŠ›ãªã‚¯ã‚¨ãƒªã®ã‚ªãƒ³ã‚¶ãƒ•ãƒ©ã‚¤(åˆ†æ•£RPC)ã®ä¸¦åˆ—åŒ–ç‰ã€…ã‚’è¡Œã„ã¾ã™ï¼ŽStormã®å°ã•ãªãƒ—ãƒªãƒŸãƒ†ã‚£ãƒ–é›†åˆã¯ï¼Œé©šãã»ã©å¤šãã®åˆ©ç”¨å ´é¢ã«é©ç”¨ã§ãã¾ã™ï¼Ž

ã‚¹ã‚±ãƒ¼ãƒ©ãƒ–ãƒ«ï¼šStormã¯æ¯Žç§’ã®èŽ«å¤§ãªãƒ¡ãƒƒã‚»ãƒ¼ã‚¸æ•°ã¾ã§ã‚¹ã‚±ãƒ¼ãƒ«ã—ã¾ã™ï¼Žãƒˆãƒãƒã‚¸ãƒ¼ã‚’ã‚¹ã‚±ãƒ¼ãƒ«ã™ã‚‹ãŸã‚ã«ã‚„ã‚‰ãªã‘ã‚Œã°ãªã‚‰ãªã„ã“ã¨ã¯ï¼Œãƒžã‚·ãƒ³ã‚’å¢—è¨ã—ã¦ãƒˆãƒãƒã‚¸ãƒ¼ã®ä¸¦åˆ—è¨å®šã‚’å¢—åŠ ã•ã›ã‚‹ã“ã¨ã ã‘ã§ã™ï¼ŽStormã®ã‚¹ã‚±ãƒ¼ãƒ«ã®ä¾‹ã¨ã—ã¦ï¼Œã‚ã‚‹10ãƒŽãƒ¼ãƒ‰ã‚¯ãƒ©ã‚¹ã‚¿nã®Stormã®åˆæœŸã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ãŒä½•ç™¾ã‚‚ã®ç§’é–“ã®ãƒ‡ãƒ¼ã‚¿ãƒ™ãƒ¼ã‚¹ã‚¢ã‚¯ã‚»ã‚¹ã¨å…±ã«ï¼Œç§’é–“1,000,000ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã‚’å‡¦ç†ã—ã¦ã„ãŸã¨ã—ã¾ã—ã‚‡ã†ï¼ŽStormã®ã‚¯ãƒ©ã‚¹ã‚¿ç®¡ç†ã®ãŸã‚ã®Zookeeperã®åˆ©ç”¨ã¯ï¼Œã‚¯ãƒ©ã‚¹ã‚¿ã®ã‚µã‚¤ã‚ºã‚’å¤§ããã‚¹ã‚±ãƒ¼ãƒ«ã•ã›ã¾ã™ï¼Ž

ãƒ‡ãƒ¼ã‚¿æå¤±ç„¡ã—ã®ä¿è¨¼ï¼šãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã‚·ã‚¹ãƒ†ãƒ ã¯ãƒ‡ãƒ¼ã‚¿å‡¦ç†ãŒæˆåŠŸã™ã‚‹ã“ã¨ã‚’å¼·ãä¿è¨¼ã—ãªã‘ã‚Œã°ãªã‚Šã¾ã›ã‚“ï¼Žã‚·ã‚¹ãƒ†ãƒ ãŒãƒ‡ãƒ¼ã‚¿ã‚’æå¤±ã™ã‚‹ã“ã¨ã¯ï¼Œéžå¸¸ã«é™ã‚‰ã‚ŒãŸå ´åˆã§ã‚ã‚‹ã¹ãã§ã™ï¼ŽStormã¯å„ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ãŒå‡¦ç†ã•ã‚Œã‚‹ã“ã¨ã‚’ä¿è¨¼ã—ã¾ã™ï¼Žãã—ã¦ãã‚ŒãŒï¼Œä»–ã®S4ã®ã‚ˆã†ãªã‚·ã‚¹ãƒ†ãƒ ã¨ã®é•ã„ã¨ãªã‚Šã¾ã™ï¼Ž

æ¥µã‚ã¦ãƒãƒã‚¹ãƒˆï¼šé‹ç”¨ãŒé›£ã—ãã¦æ‚ªåé«˜ã„Hadoopã®ã‚ˆã†ãªã‚·ã‚¹ãƒ†ãƒ ã¨ç•°ãªã‚Šï¼ŒStormã‚¯ãƒ©ã‚¹ã‚¿ã¯ãã¡ã‚“ã¨å‹•ãã¾ã™ï¼Žãƒ¦ãƒ¼ã‚¶ã®Stormã‚¯ãƒ©ã‚¹ã‚¿é‹ç”¨çµŒé¨“ãŒè‹¦ç—›ã§ãªããªã‚‹ã‚ˆã†ã«ã™ã‚‹ã“ã¨ãŒStormãƒ—ãƒã‚¸ã‚§ã‚¯ãƒˆã®ã‹ã‹ã’ã‚‹ã‚´ãƒ¼ãƒ«ã§ã™ï¼Ž

è€éšœå®³æ€§ï¼šè¨ˆç®—å‡¦ç†å®Ÿè¡Œä¸ã«éšœå®³ãŒç™ºç”Ÿã—ãŸã‚‰ï¼ŒStormã¯å¯èƒ½ãªé™ã‚Šã‚¿ã‚¹ã‚¯ã‚’å†å‰²å½“ã¦ã—å†å®Ÿè¡Œã—ã‚ˆã†ã¨ã—ã¾ã™ï¼ŽStormã¯ï¼Œè¨ˆç®—å‡¦ç†ãŒkillã•ã‚Œãªã„é™ã‚Šï¼Œå¿…ãšå®Ÿè¡Œã•ã‚Œã‚‹ã“ã¨ã‚’ä¿è¨¼ã—ã¾ã™ï¼Ž

ãƒ—ãƒã‚°ãƒ©ãƒŸãƒ³ã‚°è¨€èªžéžä¾å˜ï¼šãƒãƒã‚¹ãƒˆã§ã‚¹ã‚±ãƒ¼ãƒ©ãƒ–ãƒ«ãªãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ å‡¦ç†ã¯ï¼Œä¸€ã¤ã¨ãƒ—ãƒ©ãƒƒãƒˆãƒ•ã‚©ãƒ¼ãƒ ã«åˆ¶é™ã•ã‚Œã‚‹ã¹ãã§ã¯ã‚ã‚Šã¾ã›ã‚“ï¼ŽStormãƒˆãƒãƒã‚¸ãƒ¼ã¨ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆå‡¦ç†ã¯ï¼Œã©ã‚“ãªè¨€èªžã§ã‚‚å®šç¾©å¯èƒ½ã§ï¼Œã»ã¨ã‚“ã©ã©ã‚Œã§ã‚‚ã‚¢ã‚¯ã‚»ã‚¹å¯èƒ½ã§ã™ï¼Ž

ç¶šãã¯ï¼Œã¾ãŸï¼Œã„ãšã‚Œï¼Žï¼Žï¼Ž