æ–‡å—é€šã‚Šã€Œãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ãŒã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼ã€ãªé‡‘èžHFTã§ã®FPGAã®ä½¿ã‚ã‚Œæ–¹

ã“ã“ã®ã¨ã“ã‚é‡åº¦ã®FPGA ä¸äºŒç—…ã«ã‹ã‹ã£ã¦ã—ã¾ã„ã€å†¬ä¼‘ã¿ä¸ã‚‚DE0ã–ã‚“ã¾ã„ãªæ—¥ã€…ã€‚æ°—ã«ãªã£ã¦ã„ãŸé‡‘èžã®HFTï¼ˆhigh frequency tradingï¼šå¤§æ‰‹æŠ•è³‡éŠ€è¡Œç‰ãŒÎ¼ç§’å˜ä½ã®è¶…é«˜é€Ÿã§æ ªå¼ç‰ã‚’å£²ã‚Šè²·ã„ã—ã¦ã‚‹æã‚ã—ã„å¸‚å ´ï¼‰ã«ãŠã‘ã‚‹FPGAåˆ©ç”¨çŠ¶æ³ã«ã¤ã„ã¦ã€HFT Reviewã«ã“ã£ã¦ã‚Šã—ãŸãƒ¬ãƒãƒ¼ãƒˆï¼ˆHFTæ¥ç•Œã®ãƒ™ãƒ³ãƒ€ãƒ¼å„ç¤¾ã«ã‚¤ãƒ³ã‚¿ãƒ“ãƒ¥ãƒ¼ã—ãŸã‚‚ã®ï¼‰ãŒè¼‰ã£ã¦ã„ãŸã®ã§ã€å‹¢ã„ä½™ã£ã¦é¢ç™½ã‹ã£ãŸéƒ¨åˆ†ã‚’è¶…è¨³ã—ã¦ã—ã¾ã£ãŸã€‚

å…ƒãƒã‚¿ã¯ã“ã¡ã‚‰ï¼š

FPGAå°Žå…¥ã®ã¯ã˜ã¾ã‚Šï¼šãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ã‚³ãƒ³ãƒˆãƒãƒ¼ãƒ«

ãã‚‚ãã‚‚ã€ãªãœé‡‘èžHFTåˆ†é‡Žã§ã“ã“2å¹´ã»ã©ã®é–“ã«FPGAãŒæ€¥é€Ÿã«åºƒã¾ã£ãŸã®ã‹ï¼Ÿã€€ã©ã‚“ãªç”¨é€”ã‹ã‚‰ä½¿ã‚ã‚Œå§‹ã‚ãŸã®ã‹ï¼Ÿã€€ãã®èƒŒæ™¯ã‚„æ´å²ã«ã¤ã„ã¦ã€‚

Traditionally, we started out with FPGA on the NIC card a great place for putting logic such as market data feed parsing.

ã‚‚ã¨ã‚‚ã¨ã¯ã€NICã«æè¼‰ã•ã‚ŒãŸFPGAä¸Šã«ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ãƒã‚¸ãƒƒã‚¯ã‚’è¼‰ã›ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã®ãƒ•ã‚£ãƒ¼ãƒ‰ã®ãƒ‘ãƒ¼ã‚¹ãªã©ã‚’è¡Œã†ã¨ã“ã‚ã‹ã‚‰å§‹ã¾ã£ãŸã€‚

The first area is around low-latency connectivity for inbound and outbound data, where all network connections are possible candidates for an FPGA-enabled Network Interface Card (NIC) to give that extra latency boost. Such a card might be using the FPGA to run a TCP Offload Engine for example, which can both free up CPU cycles and reduce PCI traffic.

Other areas where FPGAs are starting to make a significant impact are market data feed handling, pre-trade risk controls and other processes where firms need to be able to take in data then run calculations or simulations on that data in-line at high speed. Applications are now increasing as people get more comfortable with the technology and firms are looking at pure acceleration of tasks that they would previously have done using CPUs or General Purpose GPUs.

FPGAæè¼‰NICã«ã‚ˆã‚‹TCPã‚ªãƒ•ãƒãƒ¼ãƒ‰ã‚¨ãƒ³ã‚¸ãƒ³ï¼ˆTCPã‚¹ã‚¿ãƒƒã‚¯å‡¦ç†ã‚’NICå´ã§å®Ÿè¡Œã—CPUã‚„PCIãƒã‚¹ã®è² è·ã‚’ä¸‹ã’ã‚‹ï¼‰ã®ã‚ˆã†ãªãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯æŽ¥ç¶šã®ä½Žé…å»¶åŒ–ãŒã€FPGAåˆ©ç”¨ã®ã¯ã˜ã¾ã‚Šã€‚
ã‚‚ã†ã²ã¨ã¤ã®ç”¨é€”ã¯ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã®ãƒ•ã‚£ãƒ¼ãƒ‰ãƒãƒ³ãƒ‰ãƒªãƒ³ã‚°ã‚„ãƒªã‚¹ã‚¯åˆ†æžã«ãŠã‘ã‚‹å„ç¨®æ¼”ç®—ã‚„ã‚·ãƒŸãƒ¥ãƒ¬ãƒ¼ã‚·ãƒ§ãƒ³ã‚’å®Ÿæ™‚é–“ã§å®Ÿè¡Œã™ã‚‹ç”¨é€”ã€‚

â€œMost of them are doing some variation of ticker plantâ€, responds Durwood. â€œSome are just looking at a handful of stocks and absolutely hot-rodding those, others are trying to convert a 150-deep book into automated trading. Itâ€™s still at the immature phase where different people are trying different approaches and taking risks.â€

Taking care of tasks like parsing, filtering, normalisation and session management is where FPGAs can add a real advantage, so market data delivery and distribution is ripe for this technology.

FPGAåˆ©ç”¨äº‹ä¾‹ã®å¤§åŠã§ã¯ã€ticker plantï¼ˆãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã‚’æŠ•è³‡å®¶ã‚„å„ã‚·ã‚¹ãƒ†ãƒ ã«ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ ã‹ã¤ä½Žé…å»¶ã«é…ä¿¡ã™ã‚‹åŸºç›¤ï¼‰ã‹ãã‚Œã«é¡žã™ã‚‹åŸºç›¤ã®æ§‹ç¯‰ã«åˆ©ç”¨ã—ã¦ã„ã‚‹ã€‚ã¾ãŸã€è¤‡æ•°ã®æ ªä¾¡ã‚’ç›£è¦–ã—ã¦hot-roddingï¼ˆï¼Ÿï¼‰ã—ãŸã‚Šã€150-deepï¼ˆï¼Ÿï¼‰ã®ãƒˆãƒ¬ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ãƒ»ãƒ–ãƒƒã‚¯ã‚’å…ƒã«è‡ªå‹•å–å¼•ã‚’å®Ÿè£…ã—ã‚ˆã†ã¨ã—ã¦ã„ã‚‹ãƒ•ã‚¡ãƒ¼ãƒ ã‚‚ã‚ã‚‹ã€‚
ãƒ‡ãƒ¼ã‚¿ã®ãƒ‘ãƒ¼ã‚¹ã‚„ãƒ•ã‚£ãƒ«ã‚¿ãƒªãƒ³ã‚°ã€ãƒŽãƒ¼ãƒžãƒ©ã‚¤ã‚ºã€ã‚»ãƒƒã‚·ãƒ§ãƒ³ç®¡ç†ãªã©ã¯FPGAã®å¾—æ„ã¨ã™ã‚‹ã¨ã“ã‚ã€‚ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã®é…ä¿¡ã¯FPGAã«ã„ã¾ã‚‚ã£ã¨ã‚‚é©ã—ã¦ã„ã‚‹ç”¨é€”ã ã€‚

Many times, Market Data Managers implement direct feeds in a 1:1 pairing with the most demanding clients.  But the explosion of low latency needs makes the 1:1 pairing of feeds and clients untenable. Most enterprises need to simultaneously feed dozens (even hundreds) of servers including: surveillance risk systems, historical tick databases and back up servers.  On top of all of this, they continue to experience â€˜bursty marketsâ€™, a trend that is expected to continue with the increasing number of automated trading programs.

Recently, Market Data managers threw more CPUs at the problem.  They constructed multi-threaded programs, and tried to off load the CPUs.   But now, there is a type of recapitulation happening in the market.  The market is turning to new architectures, such as pure FPGA architectures to re-invent the market data infrastructure.  The FPGA architectures of tomorrow offer a highly parallelized approach to processing market data. This enables a deterministic latency infrastructure.  

Deterministic latency, (keeping the same speeds regardless of data rates, number of venues or distribution points) is the new goal for Market Data Managers.  And it gives the Market Data Managers the ability to offer guaranteed service levels to their users.   Most importantly, the new FPGA architectures gives the algo trader a dependable ultra-low latency reaction time to changing market signals, under all market conditions.

ã“ã‚Œã¾ã§ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ãƒžãƒãƒ¼ã‚¸ãƒ£ï¼ˆæŠ•è³‡éŠ€è¡Œã‚„è¨¼åˆ¸ä¼šç¤¾ã«ãŠã„ã¦ticker plantã®æ§‹ç¯‰ã‚’æ‹…å½“ã™ã‚‹éƒ¨é–€ï¼Ÿï¼‰ã¯ã€å¤§æ‰‹æŠ•è³‡å®¶ã¨ã®é–“ã§1:1ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯æŽ¥ç¶šã‚’è¡Œã„ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ãƒ•ã‚£ãƒ¼ãƒ‰ã‚’æä¾›ã—ã¦ããŸã€‚ã—ã‹ã—ï¼ˆHFTã®æ™®åŠã«ã‚ˆã‚Šï¼‰ä½Žé…å»¶æ€§ãŒå¿…é ˆã¨ãªã£ãŸãŠã‹ã’ã§ã€1:1ã®æŽ¥ç¶šã¯é›£ã—ããªã£ãŸã€‚å¤šãã®æŠ•è³‡å®¶ã¯ã€ãƒªã‚¹ã‚¯åˆ†æžã‚·ã‚¹ãƒ†ãƒ ã‚„ã€æ™‚ä¾¡å±¥æ´ãƒ‡ãƒ¼ã‚¿ãƒ™ãƒ¼ã‚¹ã€ãƒãƒƒã‚¯ã‚¢ãƒƒãƒ—ã‚µãƒ¼ãƒãƒ¼ãªã©ã€æ™‚ã«ã¯æ•°ç™¾å°ã®ã‚µãƒ¼ãƒãƒ¼ã‚’ä¿æœ‰ã—ã¦ãŠã‚Šã€ãã‚Œã‚‰ã«åŒæ™‚ã«ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã‚’é…ä¿¡ã—ãªã‘ã‚Œã°ãªã‚‰ãªã„ã€‚åŠ ãˆã¦ã€è‡ªå‹•å–å¼•ãƒ—ãƒã‚°ãƒ©ãƒ ã®å¢—åŠ ã«ã‚ˆã£ã¦ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆã®å‹•ãã®ãƒãƒ¼ã‚¹ãƒˆæ€§ã‚‚é«˜ã¾ã‚Šã¤ã¤ã‚ã‚‹ã€‚
ã¤ã„æœ€è¿‘ã¾ã§ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ãƒžãƒãƒ¼ã‚¸ãƒ£ã¯ã€ã‚ˆã‚Šå¤šãã®ãƒ—ãƒã‚»ãƒƒã‚µã‚’ã‚µãƒ¼ãƒãƒ¼ã«æè¼‰ã™ã‚‹ã“ã¨ã§ã€ã“ã®å•é¡Œã«å¯¾å‡¦ã—ã‚ˆã†ã¨ã—ã¦ããŸã€‚ä¾‹ãˆã°ãƒžãƒ«ãƒã‚¹ãƒ¬ãƒƒãƒ‰ãƒ»ãƒ—ãƒã‚°ãƒ©ãƒŸãƒ³ã‚°ã«ã‚ˆã‚ŠCPUé–“ã§è² è·åˆ†æ•£ã™ã‚‹æ‰‹æ³•ãªã©ã§ã‚ã‚‹ã€‚ã—ã‹ã—ã„ã¾ã€ã²ã¨ã¤ã®é€²åŒ–ãŒèµ·ãã¦ã„ã‚‹ã€‚FPGAã®ã¿ã‚’ãƒ™ãƒ¼ã‚¹ã¨ã™ã‚‹ã‚¢ãƒ¼ã‚ãƒ†ã‚¯ãƒãƒ£ã«ã‚ˆã£ã¦ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿é…ä¿¡ã®ã‚¤ãƒ³ãƒ•ãƒ©ã‚’æ§‹ç¯‰ã—ç›´ã™å‹•ãã§ã‚ã‚‹ã€‚FPGAã®é«˜åº¦ãªä¸¦åˆ—æ€§ã«ã‚ˆã£ã¦ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã®é…ä¿¡ã«ã¨ã‚‚ãªã†ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®æ±ºå®šæ€§ã‚’ç¢ºä¿å¯èƒ½ã«ãªã‚‹ã€‚
ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®æ±ºå®šæ€§ã¨ã¯ã€ãƒ‡ãƒ¼ã‚¿ã®è»¢é€é€Ÿåº¦ã‚„é…ä¿¡å…ˆã®æ•°ãªã©ã«ã‚ˆã‚‰ãšã€ãƒ‡ãƒ¼ã‚¿ã®é…å»¶ã‚’ä¸€å®šä»¥ä¸‹ã«ä¿è¨¼ã§ãã‚‹æ€§è³ªã§ã‚ã‚‹ã€‚ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®æ±ºå®šæ€§ã¯ã€é¡§å®¢ã¸ã®ã‚µãƒ¼ãƒ“ã‚¹ãƒ¬ãƒ™ãƒ«ã‚’ä¿è¨¼ã™ã‚‹æ‰‹æ®µã¨ãªã‚‹ãŸã‚ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ãƒžãƒãƒ¼ã‚¸ãƒ£ã«ã¨ã£ã¦ç¾åœ¨ã‚‚ã£ã¨ã‚‚é‡è¦ãªæŒ‡æ¨™ã¨ãªã£ã¦ã„ã‚‹ã€‚åŠ ãˆã¦ã€FPGAã®åˆ©ç”¨ã«ã‚ˆã£ã¦ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãŒã©ã®ã‚ˆã†ãªçŠ¶æ³ã«ã‚ã‚ã†ã¨ã‚‚çµ¶ãˆé–“ãªãå‹•ããƒžãƒ¼ã‚±ãƒƒãƒˆã‚·ã‚°ãƒŠãƒ«ã«å¿œã˜ã¦è¶…ä½Žé…å»¶ã§å¯¾å¿œã§ãã‚‹æ‰‹æ®µã‚’ãƒˆãƒ¬ãƒ¼ãƒ€ãƒ¼ã«æä¾›ã™ã‚‹ã“ã¨ãŒã§ãã‚‹ã€‚

ã“ã“ã§ç¤ºã•ã‚Œã¦ã‚‹ã‚ˆã†ã«ã€HFTã«ãŠã‘ã‚‹FPGAå°Žå…¥ã®å‹•æ©Ÿã¯ã‹ãªã‚Šç‰¹æ®Šã ã€‚ã“ã®åˆ†é‡Žã®æƒ…å ±ã‚’è¦‹ã¦ã„ã‚‹ã¨ãƒžã‚¤ã‚¯ãƒç§’ã‚„æ•°ç™¾ãƒŠãƒŽç§’ã¨ã„ã£ãŸã‚ªãƒ¼ãƒ€ãƒ¼ã®æ•°å€¤ãŒã‚ˆãå‡ºã¦ãã¦ãŠã‚Šã€ã‚‚ã¯ã‚„ãƒãƒ¼ãƒ‰ãƒªã‚¢ãƒ«ã‚¿ã‚¤ãƒ åˆ¶å¾¡ã®ä¸–ç•Œã§ã‚ã‚‹ã€‚ã“ã†ãªã‚‹ã¨æ—¢å˜ã®CPUã‚„ãã®ä¸Šã§å‹•ãä¸€èˆ¬çš„ãªOSã§ã¯ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ç¢ºå®Ÿãªäºˆæ¸¬ãŒé›£ã—ã„ã€‚ä¾‹ãˆã°ãƒžãƒ¼ã‚±ãƒƒãƒˆãŒçž¬é–“çš„ã«æ¿€ã—ãå‹•ã„ã¦OSè² è·ãŒä¸Šæ˜‡ã—ã€ã‚ªãƒ¼ãƒ€ãƒ¼ã‚’å‡ºã™ã®ãŒæ•°msé…ã‚Œã¦ã—ã¾ã£ãŸ...ã§ã¯æ¸ˆã¾ã•ã‚Œãªã„ã€‚å®Ÿéš›ã«ã€ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿é…ä¿¡ã®æ•°msã®é…ã‚Œã«ã‚ˆã£ã¦NYSEが4億円の罰金ã‚’èª²ã›ã‚‰ã‚ŒãŸã‚Šã—ã¦ã„ã‚‹ã€‚

ã“ã†ã—ãŸç‰¹æ®Šç’°å¢ƒã®ãªã‹ã§åŸ¹ã‚ã‚ŒãŸFPGAæŠ€è¡“ã§ã‚ã‚‹ãŸã‚ã€ãã‚ŒãŒãã®ã¾ã¾ä¸€èˆ¬ã®ç”¨é€”ï¼ˆä¾‹ãˆã°Webã‚„ãƒ“ãƒƒã‚°ãƒ‡ãƒ¼ã‚¿ï¼‰ã«ã‚‚ã™ãåºƒãŒã‚‹ã¨ã¯æ€ãˆãªã„ã€‚ãŸã ã€é•·æœŸçš„ã«ã¯FPGAã®ã‚³ãƒ¢ãƒ‡ã‚£ãƒ†ã‚£åŒ–ã«ã‚ˆã£ã¦ä¸€èˆ¬ç”¨é€”ã§ã®FPGAå°Žå…¥ã®ãƒãƒ¼ãƒ‰ãƒ«ãŒãã‚“ã¨ä¸‹ãŒã‚Šã¤ã¤ã‚ã‚Šã€ã‚ã‚‹ãƒã‚¤ãƒ³ãƒˆã§ã‚¯ãƒªãƒ†ã‚£ã‚«ãƒ«ãƒžã‚¹ã«é”ã™ã‚‹ã‚“ã˜ã‚ƒãªã„ã‹ãªã€œã¨æ€ã£ãŸã‚Šã€‚

é‡‘èžHFTã«ãŠã‘ã‚‹FPGAåˆ©ç”¨ã®ç¾çŠ¶

ã¤ã¥ã„ã¦ã€HFTãƒ•ã‚¡ãƒ¼ãƒ ãŒã„ã¾ç›´é¢ã—ã¦ã„ã‚‹FPGAåˆ©ç”¨ã®èª²é¡Œã«ã¤ã„ã¦ã€‚

â€œThere are maybe 25-30 end-user firms around the world that are fully capable of developing complete high performance applications in FPGA todayâ€

ï¼ˆé‡‘èžHFTã®ï¼‰é«˜æ€§èƒ½ã‚¢ãƒ—ãƒªã‚’FPGAã§ãƒ•ãƒ«ã«é–‹ç™ºã§ãã‚‹ãƒ™ãƒ³ãƒ€ãƒ¼ã¯25ã€œ30ç¤¾å˜åœ¨ã™ã‚‹ã€‚

Working with order and trade data brings additional complexities in development and testing however, because the order can go through so many state transitions.

Despite such complexities, FPGAs are now being used in FIX engines, rules engines, and even full-blown execution management systems. Ferdinando La Posta, Co-Founder of trading solutions vendor GATELab, explains his firmâ€™s approach.

ã‚ªãƒ¼ãƒ€ãƒ¼ã‚„å–å¼•ãƒ‡ãƒ¼ã‚¿ã®å‡¦ç†ã¨ãªã‚‹ã¨ã€ãŸãã•ã‚“ã®çŠ¶æ…‹é·ç§»ã®å–ã‚Šæ‰±ã„ãŒå¿…è¦ã¨ãªã‚Šã€é–‹ç™ºã‚„ãƒ†ã‚¹ãƒˆã®è¤‡é›‘ã•ãŒå¢—ã™ã€‚
ãã†ã—ãŸé›£ã—ã•ã¯ã‚ã‚‹ã‚‚ã®ã®ã€FPGAã¯FIXãƒ—ãƒãƒˆã‚³ãƒ«ï¼ˆé‡‘èžå–å¼•ã®æ¨™æº–ãƒ—ãƒãƒˆã‚³ãƒ«ï¼‰ã‚¨ãƒ³ã‚¸ãƒ³ã‚„ãƒ«ãƒ¼ãƒ«ã‚¨ãƒ³ã‚¸ãƒ³ã€ã•ã‚‰ã«ã¯åŸ·è¡Œç®¡ç†ã‚·ã‚¹ãƒ†ãƒ å…¨ä½“ã®å®Ÿè£…ã¾ã§ã«åˆ©ç”¨ã•ã‚Œå§‹ã‚ã¦ã„ã‚‹ã€‚

â€œFirms realise they can buy FPGA-based solutions from vendors who have done the straightforward stuff like TCP offload, FIX/FAST translation, feed handling and so on,â€ says Keene. â€œBut HFT firms also realise that the only way to gain an advantage now is to come up with better, more clever, more efficient, more productive and more profitable algorithms. Thatâ€™s going to be their differentiator, but not many HFT firms have had much success putting their own trading algorithms onto FPGAs.

TCPã‚ªãƒ•ãƒãƒ¼ãƒ‰ã‚„FIX/FASTãƒ—ãƒãƒˆã‚³ãƒ«å¤‰æ›ã€ãƒ•ã‚£ãƒ¼ãƒ‰ãƒãƒ³ãƒ‰ãƒªãƒ³ã‚°ã¨ã„ã£ãŸç°¡å˜ãªå‡¦ç†ã‚’æ‰±ã†FPGAã‚½ãƒªãƒ¥ãƒ¼ã‚·ãƒ§ãƒ³ã§ã‚ã‚Œã°ã€ã„ã¾ã‚„ãƒ™ãƒ³ãƒ€ãƒ¼ã‹ã‚‰ã™ãã«è³¼å…¥ã§ãã‚‹ã€‚
ç¾åœ¨ã®FPGAæ´»ç”¨ã®ç„¦ç‚¹ã¯ã€ã‚ˆã‚Šè³¢ãåŠ¹çŽ‡çš„ã§ã€ç”Ÿç”£æ€§ã¨åŽç›Šæ€§ã®é«˜ã„ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ ã‚’FPGAã§å®Ÿè£…ã§ãã‚‹ã‹ã€‚ãã‚Œã“ããŒå·®åˆ¥åŒ–è¦å› ã ãŒã€è‡ªç¤¾ã®ãƒˆãƒ¬ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ ã‚’FPGAã«è¼‰ã›ã‚‹ã“ã¨ã«æˆåŠŸã—ãŸãƒ•ã‚¡ãƒ¼ãƒ ã¯ã‚ã¾ã‚Šå¤šãã¯ãªã„ã€‚

Yes, you had ANDs, ORs and other arithmetic operators, but trying to build up logical or algorithmic expressions using individual components can end up being quite weighty and not really optimised. So to do anything clever like standard deviation, you were basically on your own and had to develop it yourself.â€

FPGAã§ã¯ã€ANDã‚„ORã€ãã®ä»–ã®ï¼ˆç°¡å˜ãªï¼‰æ•°å€¤æ¼”ç®—ãªã©ã®æ¼”ç®—ã¯ç°¡å˜ã«è¨˜è¿°ã§ãã‚‹ã€‚ã—ã‹ã—ã€å€‹ã€…ã®ã‚³ãƒ³ãƒãƒ¼ãƒãƒ³ãƒˆã‚’çµ„ã¿åˆã‚ã›ã¦æ•°å¼ã‚„è«–ç†å¼ã‚’æ§‹æˆã—ã‚ˆã†ã¨ã™ã‚‹ã¨ã€ã™ãã«è¦æ¨¡ãŒå¤§ãããªã£ã¦ã€ãã®ã¾ã¾ã§ã¯ï¼ˆãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ï¼‰æœ€é©åŒ–ã‚‚ãªã•ã‚Œã¦ã„ãªã„ã€‚ã‚ˆã£ã¦ã€æ¨™æº–åå·®ã¿ãŸã„ãªã¡ã‚‡ã£ã¨é«˜åº¦ãªæ¼”ç®—ã‚’è¡Œã†ã«ã¯ã€åŸºæœ¬ã™ã¹ã¦è‡ªåˆ†ã§å›žè·¯ã‚’çµ„ã‚“ã§æœ€é©åŒ–ã—ã¦ã„ãå¿…è¦ãŒã‚ã‚‹ã€‚

But all of this means that there is now a growing demand in the financial markets for programmers and engineers who can work directly in VHDL and Verilog.

â€œNobody knows how to write parallel code except the HPC guys,â€ asserts Keene. â€œA programmer writing an application to do financial transactions is unlikely to know how to write parallel code. FPGAs are still the realm of the electrical engineer as opposed to the computer science engineer. And those guys have now discovered that what they know is really valuable, so theyâ€™re charging an arm and a leg for it.â€

é‡‘èžåˆ†é‡Žã§ã¯ä»Šå¾Œã‚‚VHDLã‚„Verilogã‚’ç›´æŽ¥æ›¸ã‘ã‚‹ãƒ—ãƒã‚°ãƒ©ãƒžãƒ¼ã®ãƒ‹ãƒ¼ã‚ºãŒå¢—ãˆç¶šã‘ã‚‹ï¼Ÿã€€HPCåˆ†é‡Žã®ã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ã˜ã‚ƒãªã„é™ã‚Šã€å¤§è¦æ¨¡ä¸¦åˆ—ãªã‚³ãƒ¼ãƒ‰ã®æ›¸ãæ–¹ã‚’çŸ¥ã£ã¦ã„ã‚‹äººã¯ã»ã¨ã‚“ã©ã„ãªã„ã€‚é‡‘èžå–å¼•ã®ã‚¢ãƒ—ãƒªã‚’æ›¸ã„ã¦ãŸãƒ—ãƒã‚°ãƒ©ãƒžãƒ¼ã¯ã€FPGAã®ä¸¦åˆ—æ€§ã‚’å¼•ãå‡ºã™ã‚³ãƒ¼ãƒ‰ã®æ›¸ãæ–¹ã‚’çŸ¥ã‚‰ãªã„ã€‚FPGAã¯æœªã ã«é›»åå›žè·¯è¨è¨ˆã®ä¸–ç•Œã§ã‚ã£ã¦ã€ã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼ã‚µã‚¤ã‚¨ãƒ³ã‚¹ã®ä¸–ç•Œã§ã¯ãªã„ã€‚ãã“ã«ã™ã”ã„ä¾¡å€¤ãŒã‚ã‚‹ã¨çŸ¥ã£ã¦ã€ï¼ˆå›žè·¯ã‚’æ›¸ã‘ã‚‹ã‚¨ãƒ³ã‚¸ãƒ‹ã‚¢ã¯ï¼‰è‡ªåˆ†ã®ã‚¹ã‚ãƒ«ã‚’é«˜å€¤ã§å£²ã‚Šã¤ã‘ã¦ã„ã‚‹ã€‚

FPGAã¯CPUã§ã¯ãªã„ã®ã§ã€ã¡ã‚‡ã£ã¨è¤‡é›‘ãªã“ã¨ã‚’ã—ã‚ˆã†ã¨ã™ã‚‹ã¨ã™ãã‚«ãƒ™ã«çªãå½“ãŸã‚‹ã€‚ç¾çŠ¶ã§ã¯HFTãƒ•ã‚¡ãƒ¼ãƒ å„ç¤¾ã¯ãã®ã‚«ãƒ™ã‚’ã„ã‹ã«ã—ã¦è¶…ãˆã‚‰ã‚Œã‚‹ã‹ç«¶ã£ã¦ã„ã‚‹çŠ¶æ³ã®ã‚ˆã†ã ã€‚ã¾ãŸã€æ¬¡ã«å‡ºã¦ãã‚‹é«˜ç´šè¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆã¯ã¾ã æ™®åŠã—ã¦ãŠã‚‰ãšã€ä½Žãƒ¬ãƒ™ãƒ«ãªãƒãƒ¼ãƒ‰ã‚¦ã‚§ã‚¢è¨˜è¿°è¨€èªžã§ã‚ã‚‹HDLã‚’ã‚¬ãƒª ã‚¬ãƒªæ›¸ã„ã¦æœ€é©åŒ–ã§ãã‚‹ãƒ‡ã‚¸ã‚¿ãƒ«å›žè·¯è¨è¨ˆè€…ãŒãƒ¢ãƒ†ãƒ¢ãƒ†ãªæ§˜åãŒä¼ºãˆã‚‹ã€‚

Cã‚„Haskellã€ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼è¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆ

é‡‘èžä»¥å¤–ã®FPGAç•Œéšˆã§é¢ç™½ã„è©±é¡Œã®ã²ã¨ã¤ã¯ã€é«˜ä½åˆæˆã ã€‚ã¤ã¾ã‚Šã€ä½Žãƒ¬ãƒ™ãƒ«ãªHDLã‚’åœ°å‘³ã«æ›¸ã„ã¦ã„ãã®ã§ã¯ãªãã€Cè¨€èªžã§ãƒ•ãƒ„ãƒ¼ã«æ›¸ã„ãŸã‚³ãƒ¼ãƒ‰ã‹ã‚‰HDLã®é †åºå›žè·¯ã‚„çµ„ã¿åˆã‚ã›å›žè·¯ã‚’åˆæˆã™ã‚‹ã‚¢ãƒ—ãƒãƒ¼ãƒã§ã‚ã‚‹ã€‚ã—ã‹ã—HFTåˆ†é‡Žã§ã¯ã‚ã‚“ã¾ã‚Šæ™®åŠã—ã¦ãªã„ã¿ãŸã„ã€‚

â€œAt one extreme, you give someone a low-level language, you give them all the detail they need, the data sheets, the specifications, an oscilloscope and a layout tool, so that they can build a board and develop it from there. At the other end of the spectrum is the claim that you can take a serial process, written in C for example, and somehow magically turn it into something that can run in a massively parallel way, which is what you need to do to put it on the chip.

ï¼ˆFPGAé–‹ç™ºã®ã‚¢ãƒ—ãƒãƒ¼ãƒã®ã²ã¨ã¤ã¨ã—ã¦ã€HDLã®ã‚ˆã†ãªï¼‰ä½Žãƒ¬ãƒ™ãƒ«ã®è¨€èªžã‚’ä½¿ã„ã€ç´°ã‹ã„ã¨ã“ã‚ã®ä½œã‚Šã“ã¿ã‚’ã¯ã˜ã‚ã€ãƒ‡ãƒ¼ã‚¿ã‚·ãƒ¼ãƒˆã‚„ä»•æ§˜æ›¸ã€ã‚ªã‚·ãƒã€ãƒãƒƒãƒ—ã®ãƒ¬ã‚¤ã‚¢ã‚¦ãƒˆãƒ„ãƒ¼ãƒ«ãªã©ã‚’ç”¨ã„ã¦ãƒœãƒ¼ãƒ‰ã®é–‹ç™ºã¾ã§è¡Œãªã†ï¼ˆä¼çµ±çš„ãªï¼‰ã‚¢ãƒ—ãƒãƒ¼ãƒãŒã‚ã‚‹ã€‚
ã‚‚ã†ä¸€æ–¹ã®ã‚¢ãƒ—ãƒãƒ¼ãƒã¨ã—ã¦ã¯ã€Cè¨€èªžãªã©ã®æ‰‹ç¶šãåž‹ã®é«˜ç´šè¨€èªžã‚’ç”¨ã„ã¦ã€ã©ã†ã«ã‹ã—ã¦ãã“ã‹ã‚‰é«˜ã„ä¸¦åˆ—æ€§ã‚’å¼•ãå‡ºã—ã¦FPGAä¸Šã§å‹•ä½œã•ã›ã‚‹ã¨ã„ã†æ‰‹æ³•ãŒã‚ã‚‹ï¼ˆé«˜ç´šè¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆï¼‰ã€‚

â€œIn the ASIC prototyping world, there have been attempts to get from C to gates for many years. And there have been any number of companies founded and failed on the basis of their C-to-gates technology, because it just doesnâ€™t work out. From our point of view, as FPGA guys, we donâ€™t understand how you can even begin to minimise latency in terms of clock cycles without designing as close down to the actual logic as possible. There are companies claiming you can do it in C or even in MATLAB, but we donâ€™t agree,â€ he says.

ASICãƒ—ãƒãƒˆã‚¿ã‚¤ãƒ”ãƒ³ã‚°ã®ä¸–ç•Œã§ã¯ã€Cè¨€èªžã‹ã‚‰ã‚²ãƒ¼ãƒˆãƒ¬ãƒ™ãƒ«ã¸ã®åˆæˆã®æ‰‹æ³•ãŒé•·å¹´ã«æ¸¡ã£ã¦æ¤œè¨Žã•ã‚Œã¦ããŸã€‚Cãƒ™ãƒ¼ã‚¹ã®é«˜ä½åˆæˆã®ãƒ™ãƒ³ãƒ€ãƒ¼ãŒã„ãã¤ã‚‚ç¾ã‚Œã¦ã¯ã€ã†ã¾ãè¡Œã‹ãšã«æ¶ˆãˆã¦ã„ã£ãŸã€‚ã„ã¾ã®FPGAã«æ±‚ã‚ã‚‰ã‚Œã¦ã„ã‚‹ã®ã¯ä½Žé…å»¶æ€§ã§ã‚ã‚‹ã“ã¨ã‚’è€ƒãˆã¦ã‚‚ã€ã‚²ãƒ¼ãƒˆãƒ¬ãƒ™ãƒ«ã§è¨è¨ˆã›ãšã«ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã‚’æœ€é©åŒ–ã§ãã‚‹ã¨ã¯è€ƒãˆã«ãã„ã€‚Cã‚„MATLABã§è¨˜è¿°ã—ãŸã‚³ãƒ¼ãƒ‰ã‚’FPGAã«è½ã¨ã›ã‚‹ã¨ã„ã£ãŸä¸»å¼µã«ã¯åŒæ„ã§ããªã„ã€‚

ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã‚„ä¸¦åˆ—æ€§ã®æœ€é©åŒ–ãŒå‘½ã®HFTã§ã¯ã€Cã‹ã‚‰ç”Ÿæˆã—ãŸHDLã§FPGAã‚’ä½¿ã†ãªã‚“ã¦ã€ã‚‚ã£ã¦ã®ã»ã‹ã‚‰ã—ã„ã€‚ã‚ã¾ã‚Šé«˜ä½åˆæˆã«ã¤ã„ã¦çŸ¥ã‚‰ãªã„ã‘ã©ã€Cã§æ™®é€šã«æ›¸ã‘ã°ã€ãã“ã‹ã‚‰é«˜åº¦ã«ä¸¦åˆ—åŒ–ã•ã‚ŒãŸHDLãŒç”Ÿæˆã•ã‚Œã‚‹ã¨ã¯æƒ³åƒã—ã«ãã„ã€‚ãŸã¶ã‚“é †åºå›žè·¯ã ã‚‰ã‘ã«ãªã‚‹ã‚“ã ã‚ã†ãªã€‚ã‚‚ã£ã¨ã‚‚ã€ã‚³ãƒ¢ãƒ‡ã‚£ãƒ†ã‚£åŒ–ã—ãŸFPGAã‚’ç”¨ã„ã‚‹ä¸€èˆ¬ç”¨é€”ã§ã¯é«˜ä½åˆæˆã®åˆ©ç”¨ãŒä¸»æµã«ãªã‚‹ã‚“ã˜ã‚ƒãªã„ã‹ãªã€ã¨æ€ã†ã€‚ãªã«ã›é«˜ç´šè¨€èªžã¯ãƒ‡ãƒãƒƒã‚°ãŒãƒ©ã‚¯ã ã—ã€‚ã€‚

ä¸€æ–¹ã§ã€Cã®ã‚ˆã†ãªæ‰‹ç¶šãåž‹è¨€èªžã§ã¯ãªãã€ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼è¨€èªžã‚„é–¢æ•°åž‹è¨€èªžã‚’ä½¿ã£ã¦é«˜ä½åˆæˆã—ã‚ˆã†ã¨ã„ã†ã‚¢ãƒ—ãƒãƒ¼ãƒã‚‚ã‚ã‚‹ã€‚

â€œAt Maxeler we take a very different approach. You shouldn't be thinking in terms of sequential C++ code, but instead think about the flow of data through your algorithm, which at the end of the day is all that matters. MaxCompiler does all the heavy lifting for you, like making sure the data is in the right place at the right time, and presents the programmer with a high level abstraction of the dataflow that is easy to conceptualise. Because of this you spend your time designing great algorithms rather than getting your hands dirty with all the messy details.â€ claims Spooner.

Maxelerã§ã¯ã€ã“ã‚Œã‚‰ã„ãšã‚Œã¨ã‚‚é•ã†ã‚¢ãƒ—ãƒãƒ¼ãƒã‚’ã¨ã£ã¦ã„ã‚‹ã€‚çµå±€ã®ã¨ã“ã‚ã€é‡è¦ãªã®ã¯æ¬²ã—ã„ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ ã«åŸºã¥ã„ã¦ãƒ‡ãƒ¼ã‚¿ã®æµã‚Œã‚’å®šç¾©ã™ã‚‹ã“ã¨ã§ã‚ã£ã¦ã€å¿…ãšã—ã‚‚ãã‚Œã‚’C++ã®ã‚ˆã†ãªæ‰‹ç¶šãåž‹è¨€èªžã‚’ä½¿ã£ã¦å®Ÿè£…ã™ã‚‹å¿…è¦ã¯ãªã„ã€‚MaxCompilerã§ã¯ã€ç†è§£ã—ã‚„ã™ãæŠ½è±¡åº¦ã®é«˜ã„ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼ã‚’ãƒ—ãƒã‚°ãƒ©ãƒžãƒ¼ãŒè¡¨ç¾ã§ãã‚‹ç’°å¢ƒã‚’æä¾›ã—ã€ã‹ã¤ãƒ‡ãƒ¼ã‚¿ãŒæ£ã—ã„ã‚¿ã‚¤ãƒŸãƒ³ã‚°ã§æ£ã—ã„å ´æ‰€ã«æµã‚Œã‚‹ã“ã¨ã‚’ãƒ—ãƒã‚°ãƒ©ãƒžãƒ¼ã«ä»£ã‚ã£ã¦ç®¡ç†ã™ã‚‹ã€‚ï¼ˆHDLã®ã‚ˆã†ã«ï¼‰ç´°ã‹ãé¢å€’ãªè©³ç´°ã‚’è¨˜è¿°ã™ã‚‹å¿…è¦ã¯ãªãã€ã‚ˆã‚Šå¤šãã®æ™‚é–“ã‚’ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ ã®è¨è¨ˆã«ä½¿ãˆã‚‹ã€‚

ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼åŽ¨ãªä¿ºã¯ã“ã†ã„ã†ã®åå¿œã—ã¦ã—ã¾ã†ãªãã€‚ã€‚ä¸€æ–¹ã€Parallel Scientificã¨ã„ã†ãƒ™ãƒ³ãƒ€ãƒ¼ã§ã¯Haskell FPGAコンパイラã¨ã„ã†ã‚‚ã®ã‚‚ä½œã£ã¦ã„ã‚‹ã€‚

GPGPUã¨FPGAã®é•ã„

ã¨ã“ã‚ã§ã€FPGAã®ãƒ©ã‚¤ãƒãƒ«ã§ã‚ã‚‹GPGPUã¯ä½¿ã‚ã‚Œã¦ãªã„ã®ï¼Ÿ

â€œGPUs have a stronghold in the parallel processing of graphics, but where they fall down is that they donâ€™t help you with latency at all,â€ explains Spooner. â€œWhile they have a massive amount of parallelism, itâ€™s coarse-grain parallelism, itâ€™s not pipeline parallelism.â€
 
â€œIf youâ€™ve got one message, you can only process it in one core. So it doesnâ€™t matter how many different cores you have, itâ€™s only in one of them. Which means that GPUs give you a throughput play rather than a latency play. If you want to reduce latency, you need fine-grain parallelism,â€ says Spooner. "At Maxeler we use Dataflow engines (DFEs), which provide the same fine grain parallelism, but are ready-to-compute rather than just blank chips.â€

Another problem with GPUs is that they tend to be more prone to failures and errors in calculations. So although they can work well for certain parallel computing tasks like Monte Carlo simulations, they are unable to deliver the level of determinism offered by FPGAs.

GPUã¯ã‚°ãƒ©ãƒ•ã‚£ã‚¯ã‚¹ã®ä¸¦åˆ—å‡¦ç†ã«ã¯å¼·ã„ãŒã€ä½Žé…å»¶ã®å®Ÿç¾ã«ã¯å‘ã„ã¦ã„ãªã„ã€‚ä¸¦åˆ—åº¦ã¯é«˜ã„ãŒã€ãã‚Œã¯ç²’åº¦ã®ç²—ã„ä¸¦åˆ—åº¦ã§ã‚ã£ã¦ã€ãƒ‘ã‚¤ãƒ—ãƒ©ã‚¤ãƒ³ãƒ¬ãƒ™ãƒ«ã®ä¸¦åˆ—åº¦ã§ã¯ãªã„ã€‚
ä¾‹ãˆã°GPUã§ã¯ã€ã²ã¨ã¤ã®ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã‚’å—ã‘å–ã£ãŸã¨ãã€ãã‚Œã‚’1ã¤ã®ã‚³ã‚¢ã§ã—ã‹å‡¦ç†ã§ããªã„ã€‚ã‚³ã‚¢ãŒã„ãã¤ã‚ã£ã¦ã‚‚æ„å‘³ãŒãªã„ã€‚ã¤ã¾ã‚ŠGPUã¯ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã‚ˆã‚Šã‚‚ã‚¹ãƒ«ãƒ¼ãƒ—ãƒƒãƒˆã‚’é‡è¦–ã—ãŸè¨è¨ˆã«ãªã£ã¦ã„ã‚‹ã€‚ä½Žé…å»¶ãŒå¿…è¦ãªã‚‰ã€ã‚ˆã‚Šç´°ç²’åº¦ã®ä¸¦åˆ—æ€§ãŒå¿…è¦ã ã€‚ãã“ã§Maxelerã§ã¯ã€ãã†ã—ãŸç´°ç²’åº¦ã®ä¸¦åˆ—æ€§ã‚’ã‚¼ãƒã‹ã‚‰è¨è¨ˆã™ã‚‹ã®ã§ã¯ãªãã€ã™ãã«è¨ˆç®—ã«ä½¿ãˆã‚‹ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼ã‚¨ãƒ³ã‚¸ãƒ³ï¼ˆDFEï¼‰ã‚’ç”¨ã„ã¦å®Ÿç¾ã—ã¦ã„ã‚‹ã€‚
GPUã®ã‚‚ã†ã²ã¨ã¤ã‚‚å•é¡Œç‚¹ã¯ã€æ¼”ç®—ä¸ã®ã‚¨ãƒ©ãƒ¼ã‚„éšœå®³ã«å¼±ã„ã¨ã“ã‚ã€‚ãã®ã›ã„ã§ã€ãƒ¢ãƒ³ãƒ†ã‚«ãƒ«ãƒã‚·ãƒŸãƒ¥ãƒ¬ãƒ¼ã‚·ãƒ§ãƒ³ã®ã‚ˆã†ãªä¸¦åˆ—æ¼”ç®—ã«ã¯å‘ã„ã¦ã‚‹ã‚‚ã®ã®ã€FPGAã®ã‚ˆã†ãªãƒ¬ãƒ™ãƒ«ã®ï¼ˆãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ï¼‰æ±ºå®šæ€§ã¯å®Ÿç¾ã§ããªã„ã€‚

ã‚¤ãƒ³ãƒ†ãƒ«ã®ä¸»å¼µ

ã“ã®ãƒ¬ãƒãƒ¼ãƒˆã§ã¯ã‚¤ãƒ³ãƒ†ãƒ«ã¸ã®ã‚¤ãƒ³ã‚¿ãƒ“ãƒ¥ãƒ¼ã‚‚æŽ²è¼‰ã•ã‚Œã¦ã‚‹ã€‚ä»Šã®ã¨ã“ã‚ã‚¤ãƒ³ãƒ†ãƒ«ã¯ã€FPGAã§ã‚‚GPGPUã§ã‚‚ãªãCPUã‚’ä½¿ãˆï¼ã£ã¦ã‚¹ã‚¿ãƒ³ã‚¹ã§ã‚ã‚‹ã“ã¨ãŒã‚ˆãåˆ†ã‹ã‚‹ï¼ˆç¤¾å†…ã˜ã‚ƒç ”ç©¶ã—ã¦ã‚‹ã ã‚ã†ã‘ã©â€¦ï¼‰ã€‚

FPGA-based feed handlers are very common, but when people want to do more, they end up sending the data to a CPU socket. There were visions at one time about doing a lot of the calculation on the FPGA, but programming FPGAs are fairly challenging. Itâ€™s certainly not as easy as performant code with higher level languages.
 
We found that people started looking at using Cuda as an alternative for the math side of things, but this too can be a more challenging development environment. So, people take applications that were targeted at an FPGA for math, and attempt to run those computations on GPGPUs. But then they were losing the benefit of doing everything on the FPGA card and sending out the data as quickly as though that is possible, because once you leave the card, you incur a latency hit going across the bus.
 
Fragmentation of the code base is a bad thing, and this type of GPU + FPGA + CPU solution makes it worse.

FPGAãƒ™ãƒ¼ã‚¹ã®ãƒ•ã‚£ãƒ¼ãƒ‰ãƒãƒ³ãƒ‰ãƒ©ã¯ã¨ã¦ã‚‚åºƒãåˆ©ç”¨ã•ã‚Œã¦ã„ã‚‹ãŒã€ãã‚Œä»¥ä¸Šã®ã“ã¨ã‚’ã‚„ã‚ã†ã¨ã™ã‚‹ã¨ã€CPUã«ãƒ‡ãƒ¼ã‚¿ã‚’é€ã‚‹ã“ã¨ã«ãªã‚‹ã€‚ã‹ã¤ã¦ã¯FPGAä¸Šã§ã•ã¾ã–ã¾ãªæ¼”ç®—ã‚’è¡Œã†ã¨ã„ã†æ§‹æƒ³ã‚‚ã‚ã£ãŸãŒã€FPGAã®ãƒ—ãƒã‚°ãƒ©ãƒŸãƒ³ã‚°ã¯ç°¡å˜ã§ã¯ãªã„ã€‚é«˜ãƒ¬ãƒ™ãƒ«è¨€èªžã®ã‚ˆã†ãªç”Ÿç”£æ€§ã®é«˜ã„ã‚³ãƒ¼ãƒ‰ã«æ¯”ã¹ã‚‹ã¨é›£æ˜“åº¦ãŒé«˜ã„ã€‚
FPGAä¸Šã§ã®æ•°å€¤æ¼”ç®—ã‚’CUDAã§è¨˜è¿°ã™ã‚‹è©¦ã¿ã‚‚ã‚ã£ãŸãŒã€ã“ã‚Œã‚‚é–‹ç™ºç’°å¢ƒã«é›£ç‚¹ãŒã‚ã‚‹ã€‚ãã“ã§ã€FPGAå‘ã‘ã«æ›¸ã„ãŸæ•°å€¤æ¼”ç®—ã‚¢ãƒ—ãƒªã‚’GPGPUã§å‹•ã‹ãã†ã¨ã™ã‚‹äººã‚‚ã„ãŸãŒã€ã™ã‚‹ã¨ã€Œã™ã¹ã¦ã‚’FPGAã‚«ãƒ¼ãƒ‰ä¸Šã§å‡¦ç†ã™ã‚‹ã€ã¨ã„ã†ãƒ¡ãƒªãƒƒãƒˆãŒå¤±ã‚ã‚Œã€ã‚«ãƒ¼ãƒ‰å¤–éƒ¨ã¨ã®ãƒ‡ãƒ¼ã‚¿ã®ã‚„ã‚Šå–ã‚Šã§ãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ãŒç™ºç”Ÿã—ã¦ã—ã¾ã†ã€‚
GPU + FPGA + CPUã‚’çµ„ã¿åˆã‚ã›ã‚‹ã‚½ãƒªãƒ¥ãƒ¼ã‚·ãƒ§ãƒ³ã¯ã€ã‚³ãƒ¼ãƒ‰ãƒ™ãƒ¼ã‚¹ã®åˆ†æ–ã‚’ã¾ãããƒ€ãƒ¡ãªã‚½ãƒªãƒ¥ãƒ¼ã‚·ãƒ§ãƒ³ã ã€‚

FPGAæè¼‰ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã®ç™»å ´

é‡‘èžHFTã®FPGAç•Œéšˆã§ã®ã„ã¾ã‚‚ã£ã¨ã‚‚ç†±ã„è©±é¡Œã¯ã€FPGAæè¼‰ã®ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã€å…·ä½“çš„ã«ã¯Arista 7124FXã®ç™»å ´ã§ã‚ã‚‹ã€‚ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã¨ã¯ã¤ã¾ã‚Šãƒ¬ã‚¤ãƒ¤7ã‚¹ã‚¤ãƒƒãƒã®ã“ã¨ã§ã€ãƒãƒ¼ãƒ‰ãƒãƒ©ãƒ³ã‚µãƒ¼ã®ã‚ˆã†ã«ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ãƒ¬ã‚¤ãƒ¤ã®æƒ…å ±ï¼ˆURIã¨ã‹ã‚¯ã‚¨ãƒªãƒ‘ãƒ©ãƒ¡ãƒ¼ã‚¿ã¨ã‹ã‚¯ãƒƒã‚ãƒ¼ã¨ã‹ï¼‰ã‚’è¦‹ã¦ã‚¹ã‚¤ãƒƒãƒãƒ³ã‚°ã—ãŸã‚Šã™ã‚‹ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯æ©Ÿå™¨ã‚’æŒ‡ã™ã€‚

With the announcement of the 7124FX switch from Arista, itâ€™s now possible to put logic on a switch.  Of course, all data needs to cross a switch when it enters or leaves your cabinet, or travels inter-server in a co-lo location.  Some applications can really benefit if you can perform actions on the data as itâ€™s making that mandatory switch.  Anything that could benefit from data transformation (a good example is normalizing market data, or filtering down a feed) or inspection with/without manipulation (for example risk checking) is a great application for the switch. 

A simple example: â€œSend this buy order when the price of MSFT drops below 27.42â€.  Imagine if you could program this into the switch: you could go from tick-to-trade in nanoseconds, without even fully entering your cabinet!  This is exciting stuff and it will be reality very soon!â€

ï¼ˆ2012å¹´ã®ï¼‰Arista 7124FXï¼ˆFPGAæè¼‰ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒï¼‰ã®ç™»å ´ã«ã‚ˆã£ã¦ã€ã‚¹ã‚¤ãƒƒãƒä¸Šã«ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ãƒ»ãƒã‚¸ãƒƒã‚¯ã‚’è¼‰ã›ã‚‹ã“ã¨ãŒå¯èƒ½ã«ãªã£ãŸã€‚
ã‚¹ã‚¤ãƒƒãƒã¯ã€ã‚³ãƒã‚±ãƒ¼ã‚·ãƒ§ãƒ³ï¼ˆãƒ‡ãƒ¼ã‚¿ã‚»ãƒ³ã‚¿ãƒ¼ï¼‰å†…ã®ã‚µãƒ¼ãƒãƒ¼ã‚„ãƒ©ãƒƒã‚¯ã‚’å‡ºå…¥ã‚Šã™ã‚‹ã‚ã‚‰ã‚†ã‚‹ãƒ‡ãƒ¼ã‚¿ãŒé€šã‚ŠæŠœã‘ã‚‹å ´æ‰€ã€‚ãã“ã§ã•ã¾ã–ã¾ãªå‡¦ç†ã‚’å®Ÿè¡Œã§ãã‚‹ã“ã¨ã§å¤šå¤§ãªãƒ¡ãƒªãƒƒãƒˆãŒå¾—ã‚‰ã‚Œã‚‹ç”¨é€”ã‚‚å¤šã„ã€‚ä¾‹ãˆã°ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ã®ãƒŽãƒ¼ãƒžãƒ©ã‚¤ã‚ºã‚„ãƒ•ã‚£ãƒ«ã‚¿ãƒªãƒ³ã‚°ã€ãƒ‡ãƒ¼ã‚¿å†…å®¹ã®ç›£è¦–ã‚„åŠ å·¥ï¼ˆãƒªã‚¹ã‚¯ãƒã‚§ãƒƒã‚¯ãªã©ï¼‰ã¯ã€ã“ã®ã‚¹ã‚¤ãƒƒãƒã«æœ€é©ãªåˆ©ç”¨ä¾‹ã ã€‚
ä¾‹ãˆã°ã€ŒMSFTãŒ27.42ä»¥ä¸‹ã«ä¸‹ãŒã£ãŸã‚‰è²·ã„ã‚’å…¥ã‚Œã‚‹ã€ã¨ã„ã£ãŸãƒã‚¸ãƒƒã‚¯ã‚’ã‚¹ã‚¤ãƒƒãƒã«çµ„ã¿è¾¼ã‚ã‚‹ã€‚ãƒžãƒ¼ã‚±ãƒƒãƒˆãƒ‡ãƒ¼ã‚¿ãŒã‚µãƒ¼ãƒãƒ¼ã®ãƒ©ãƒƒã‚¯ã«å±Šãå‰ã«ã€ãƒŠãƒŽã‚»ã‚«ãƒ³ãƒ‰ã®å˜ä½ã§å–å¼•ãŒå¯èƒ½ã«ãªã‚‹ã€‚ã“ã‚“ãªã‚¹ã‚´ã„ã“ã¨ãŒã‚‚ã†ã™ãå®Ÿç¾å¯èƒ½ã«ãªã‚‹ã€‚

FPGAã‚’è¼‰ã›ãŸã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã€ã‚‚ã¯ã‚„ãƒãƒ¼ãƒ‰ãƒãƒ©ãƒ³ã‚µãƒ¼ãªã‚“ã¦ã¤ã¾ã‚‰ãªã„ä»•äº‹ã¯ã—ã¦ã„ãªã„ã€‚æ ªä¾¡ãŒã‚µãƒ¼ãƒãƒ¼ã«å±Šãå‰ã«å‹æ‰‹ã«æ ªã®å£²ã‚Šè²·ã„ã—ã¦ã‚‹ï¼ˆãªã«ãã‚Œã“ã‚ã„ï¼‰ã€‚ä¿ºã‚‚ã¡ã‚‡ã£ã¨å‰ã«IPスイッチみたいなFPGAサーバーがあれば面白いだろうなぁ〜と妄想ã—ã¦ãŸã‚‰ã€ã“ã®Aristaã®FPGAã‚¹ã‚¤ãƒƒãƒã¯ã¾ã•ã—ããã®ã¾ã‚“ã¾ã®è£½å“ã ã£ãŸã€‚ã¾ãèª°ã§ã‚‚è€ƒãˆã‚‹ã‹ã€‚ã—ã‹ã‚‚Aristaã¯Sunã®ã‚¢ãƒ³ãƒ‡ã‚£ãƒ»ãƒ™ã‚¯ãƒˆãƒ«ã‚·ãƒ£ã‚¤ãƒ ãŒè¨ç«‹ã—ã¦ã¦ã€ã‚„ã£ã±ã‚Šãƒ™ã‚¤ã‚¨ãƒªã‚¢ã®ã‚³ã‚¢ãªæŠ•è³‡å®¶ã®å—…è¦šã¯åŠç«¯ãªã„ãªãã¨æ„Ÿå¿ƒã™ã‚‹ãªã©ã€‚

http://www.hftreview.com/pg/blog/arista/read/56155/accelerating-transactions-through-fpgaenabled-switching

Accelerating Transactions Through FPGA-Enabled Switching - Arista Networks's blog - HFT Review via kwout

With the goal of accelerating transactions as they pass through the network, it was crucial for us to find a truly in-line solution; rather than simply attaching the FPGA as a client of the switching chipset, the processor sits directly in line with the traffic flow, leveraging both the high functionality of the FPGA and the more traditional network forwarding, multicasting and filtering capabilities that are inherent to the ASIC itself.

ã‚¹ã‚¤ãƒƒãƒã®ãƒãƒƒãƒ—ã‚»ãƒƒãƒˆã®ã‚³ãƒ—ãƒã¨ã—ã¦FPGAã‚’è¼‰ã›ã‚‹ã®ã§ã¯ãªãã€ãƒˆãƒ©ãƒ³ã‚¶ã‚¯ã‚·ãƒ§ãƒ³ãŒé€šéŽã™ã‚‹ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã®å¿ƒè‡“éƒ¨ã«FPGAã‚’ç½®ãã“ã¨ãŒé‡è¦ã€‚FPGAãŒãƒˆãƒ©ãƒ•ã‚£ãƒƒã‚¯ãƒ•ãƒãƒ¼ã‚’ãƒ€ã‚¤ãƒ¬ã‚¯ãƒˆã«æ‰±ã†ã®ã§ã€ASICãŒå¾“æ¥ã‹ã‚‰å¾—æ„ã¨ã™ã‚‹ãƒ‘ã‚±ãƒƒãƒˆã®ãƒ•ã‚©ãƒ¯ãƒ¼ãƒ‡ã‚£ãƒ³ã‚°ã‚„ãƒžãƒ«ãƒã‚ãƒ£ã‚¹ãƒˆã€ãƒ•ã‚£ãƒ«ã‚¿ãƒªãƒ³ã‚°ã®æ©Ÿèƒ½ã«åŠ ãˆã¦ã€FPGAã®é«˜æ©Ÿèƒ½æ€§ã‚’æ´»ã‹ã™ã“ã¨ãŒã§ãã‚‹ã€‚

ã“ã®ã‚¹ã‚¤ãƒƒãƒã¯10GbEãŒ24ãƒãƒ¼ãƒˆã‚ã‚‹ã®ã§ã€ã¤ã¾ã‚Šã¯~~240Gbpsã§æµã‚Œã‚‹ãƒˆãƒ©ãƒ•ã‚£ãƒƒã‚¯ã«on the flyã§ã‚¢ãƒ—ãƒªãƒã‚¸ãƒƒã‚¯ã‚’é©ç”¨ã§ãã‚‹~~ï¼ˆè¨‚æ£ï¼šFPGAãŒã¤ãªãŒã£ã¦ã‚‹ã®ã¯8ãƒãƒ¼ãƒˆã ã‘ã§ã—ãŸï¼‰ã‚ã‘ã ã€‚ãªã‚“ã ãã‚Œã¯ã€‚ã€‚

Once youâ€™ve got the ability to do this processing directly within the network, you can provide services that are leveraged across the pool of servers. Feed-handling (normalisation, translation), line arbitration, and symbol based routing are some of the obvious network-centric applications.

ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ãƒˆãƒ©ãƒ•ã‚£ãƒƒã‚¯ã‚’ãƒ€ã‚¤ãƒ¬ã‚¯ãƒˆã«æ‰±ãˆã‚‹èƒ½åŠ›ã‚’ç”Ÿã‹ã—ã¦ã€ã‚µãƒ¼ãƒãƒ¼ç¾¤ã«ã•ã¾ã–ã¾ãªã‚µãƒ¼ãƒ“ã‚¹ã‚’æä¾›ã§ãã‚‹ã€‚ãƒ•ã‚£ãƒ¼ãƒ‰ãƒãƒ³ãƒ‰ãƒªãƒ³ã‚°ï¼ˆãƒŽãƒ¼ãƒžãƒ©ã‚¤ã‚¼ãƒ¼ã‚·ãƒ§ãƒ³ã‚„å¤‰æ›ãªã©ï¼‰ã«åŠ ãˆã¦ã€ãƒ©ã‚¤ãƒ³ã‚¢ãƒ¼ãƒ“ãƒˆãƒ¬ãƒ¼ã‚·ãƒ§ãƒ³ï¼ˆã‚¢ãƒ¼ãƒ“ãƒˆãƒ©ãƒ¼ã‚¸ã®ã“ã¨ï¼Ÿï¼‰ã‚„ã€ã‚·ãƒ³ãƒœãƒ«ãƒ™ãƒ¼ã‚¹ã®ãƒ«ãƒ¼ãƒ†ã‚£ãƒ³ã‚°ã‚‚å¯èƒ½ã ã€‚

ã‚·ãƒ³ãƒœãƒ«ãƒ™ãƒ¼ã‚¹ã®ãƒ«ãƒ¼ãƒ†ã‚£ãƒ³ã‚°ã€ã¤ã¾ã‚Šã€æ ªä¾¡æƒ…å ±ãŒæµã‚Œã¦ããŸã‚‰ã€ãã®ä¼æ¥ã‚³ãƒ¼ãƒ‰ã‚’è¦‹ã¦ãƒ‘ã‚±ãƒƒãƒˆã‚’ãƒ«ãƒ¼ãƒ†ã‚£ãƒ³ã‚°ã£ã¦é¢ç™½ã„ã€‚@kibayos さんも指摘ã®é€šã‚Šã€ã‚‚ã¯ã‚„ã‚ªãƒ¼ãƒãƒ¼ãƒ¬ã‚¤ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ãŒIPãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã‚’ç½®ãæ›ãˆã¤ã¤ã‚ã‚‹ã‚ˆã†ãªã€‚ä¾‹ãˆã°ã“ã‚Œã‚’MapReduceã®keyã«ã‚ã¦ã¯ã‚ã‚Œã°ã€é¬¼ã®ã‚ˆã†ãªè¶…é«˜é€Ÿã§shufflingã—ã¦ãã‚Œã‚‹ã‚¹ã‚¤ãƒƒãƒãŒç°¡å˜ã«ã¤ãã‚Œã‚‹ï¼ˆã‚‚ã£ã¨ã‚‚ãƒ‡ãƒ¼ã‚¿å—ã‘å–ã‚Šå´ã«å¤§é‡ã«SSDä¸¦ã¹ãªã„ã¨ãƒœãƒˆãƒ«ãƒãƒƒã‚¯ã«ãªã‚‹ã ã‚ã†ã‘ã©ï¼‰ã€‚MRã«é™ã‚‰ãšã€ãƒ†ãƒ¼ãƒ–ãƒ«ã®ãƒ—ãƒ©ã‚¤ãƒžãƒªã‚ãƒ¼ã«ãªã‚Šãã†ãªã‚ã‚‰ã‚†ã‚‹IDã§ã‚·ãƒ£ãƒƒãƒ•ãƒ«ã‚„ã‚¸ãƒ§ã‚¤ãƒ³ã—ã¦ãã‚Œã‚‹ã‚¹ã‚¤ãƒƒãƒ...ã“ã‚“ãªã®ãŒã‚¯ãƒ©ã‚¦ãƒ‰ã«ãšã‚‰ã‚Šã¨ä¸¦ã‚“ã§ãŸã‚‰æ¥½ã—ãã†ã ãªï¼

ãã‚“ãªã‚ã‘ã§ã€ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ãŒã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼ã ã¨ã„ã†åè¨€ãŒç¾å®Ÿã®ã‚‚ã®ã¨ãªã‚Šã¤ã¤ã‚ã‚‹ãªãã€ã¨ã„ã†é‡‘èžHFTã®ç¾çŠ¶ã§ã‚ã£ãŸã€‚

ã‚¹ãƒ†ã‚£ãƒ«ãƒã‚¦ã‚¹ã®æ›¸åº«ã®æ›¸åº«

ã¯ã¦ãªãƒ€ã‚¤ã‚¢ãƒªãƒ¼ã§æ›¸ã„ã¦ãŸã€Œã‚¹ãƒ†ã‚£ãƒ«ãƒã‚¦ã‚¹ã®æ›¸åº«ã€ã‚’ç§»è»¢ã—ã¦ãã¾ã—ãŸã€‚

æ–‡å—é€šã‚Šã€Œãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ãŒã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼ã€ãªé‡‘èžHFTã§ã®FPGAã®ä½¿ã‚ã‚Œæ–¹

FPGAå°Žå…¥ã®ã¯ã˜ã¾ã‚Šï¼šãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ã‚³ãƒ³ãƒˆãƒãƒ¼ãƒ«

é‡‘èžHFTã«ãŠã‘ã‚‹FPGAåˆ©ç”¨ã®ç¾çŠ¶

Cã‚„Haskellã€ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼è¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆ

GPGPUã¨FPGAã®é•ã„

ã‚¤ãƒ³ãƒ†ãƒ«ã®ä¸»å¼µ

FPGAæè¼‰ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã®ç™»å ´

FPGAå°Žå…¥ã®ã¯ã˜ã¾ã‚Šï¼šãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ã‚³ãƒ³ãƒˆãƒ­ãƒ¼ãƒ«

é‡‘èžHFTã«ãŠã‘ã‚‹FPGAåˆ©ç”¨ã®ç¾çŠ¶

Cã‚„Haskellã€ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒ­ãƒ¼è¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆ

GPGPUã¨FPGAã®é•ã„

ã‚¤ãƒ³ãƒ†ãƒ«ã®ä¸»å¼µ

FPGAæ­è¼‰ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã®ç™»å ´

FPGAå°Žå…¥ã®ã¯ã˜ã¾ã‚Šï¼šãƒ¬ã‚¤ãƒ†ãƒ³ã‚·ã®ã‚³ãƒ³ãƒˆãƒãƒ¼ãƒ«

é‡‘èžHFTã«ãŠã‘ã‚‹FPGAåˆ©ç”¨ã®ç¾çŠ¶

Cã‚„Haskellã€ãƒ‡ãƒ¼ã‚¿ãƒ•ãƒãƒ¼è¨€èªžã«ã‚ˆã‚‹é«˜ä½åˆæˆ

GPGPUã¨FPGAã®é•ã„

ã‚¤ãƒ³ãƒ†ãƒ«ã®ä¸»å¼µ

FPGAæè¼‰ã‚¢ãƒ—ãƒªã‚±ãƒ¼ã‚·ãƒ§ãƒ³ã‚¹ã‚¤ãƒƒãƒã®ç™»å ´