æåéãããããã¯ã¼ã¯ãã³ã³ãã¥ã¼ã¿ã¼ããªéèHFTã§ã®FPGAã®ä½¿ããæ¹
ããã®ã¨ããé度ã®FPGAä¸äºç ã«ããã£ã¦ãã¾ããå¬ä¼ã¿ä¸ãDE0ããã¾ããªæ¥ã ãæ°ã«ãªã£ã¦ããéèã®HFTï¼high frequency tradingï¼å¤§ææè³éè¡çãμç§åä½ã®è¶ é«éã§æ ªå¼çã売ãè²·ããã¦ãæãããå¸å ´ï¼ã«ãããFPGAå©ç¨ç¶æ³ã«ã¤ãã¦ãHFT Reviewã«ãã£ã¦ãããã¬ãã¼ãï¼HFTæ¥çã®ãã³ãã¼å社ã«ã¤ã³ã¿ãã¥ã¼ãããã®ï¼ãè¼ã£ã¦ããã®ã§ãå¢ãä½ã£ã¦é¢ç½ãã£ãé¨åãè¶ è¨³ãã¦ãã¾ã£ãã
å ãã¿ã¯ãã¡ãï¼
- FPGA & Hardware Accelerated Trading, Part One - Who, What, Where and Why?
- FPGA & Hardware Accelerated Trading, Part Two - Alternative Approaches
- FPGA & Hardware Accelerated Trading, Part Three - Programming
- FPGA & Hardware Accelerated Trading, Part Four - Challenges and Constraints
- FPGA & Hardware Accelerated Trading, Part Five – The View from Intel
- FPGA & Hardware Accelerated Trading, Part Six – What Does The Future Hold?
FPGAå°å ¥ã®ã¯ãã¾ãï¼ã¬ã¤ãã³ã·ã®ã³ã³ããã¼ã«
ããããããªãéèHFTåéã§ãã2å¹´ã»ã©ã®éã«FPGAãæ¥éã«åºã¾ã£ãã®ãï¼ãã©ããªç¨éãã使ããå§ããã®ãï¼ããã®èæ¯ãæ´å²ã«ã¤ãã¦ã
Traditionally, we started out with FPGA on the NIC card a great place for putting logic such as market data feed parsing.
- ãã¨ãã¨ã¯ãNICã«æè¼ãããFPGAä¸ã«ã¢ããªã±ã¼ã·ã§ã³ãã¸ãã¯ãè¼ãããã¼ã±ãããã¼ã¿ã®ãã£ã¼ãã®ãã¼ã¹ãªã©ãè¡ãã¨ããããå§ã¾ã£ãã
The first area is around low-latency connectivity for inbound and outbound data, where all network connections are possible candidates for an FPGA-enabled Network Interface Card (NIC) to give that extra latency boost. Such a card might be using the FPGA to run a TCP Offload Engine for example, which can both free up CPU cycles and reduce PCI traffic. Other areas where FPGAs are starting to make a significant impact are market data feed handling, pre-trade risk controls and other processes where firms need to be able to take in data then run calculations or simulations on that data in-line at high speed. Applications are now increasing as people get more comfortable with the technology and firms are looking at pure acceleration of tasks that they would previously have done using CPUs or General Purpose GPUs.
- FPGAæè¼NICã«ããTCPãªããã¼ãã¨ã³ã¸ã³ï¼TCPã¹ã¿ãã¯å¦çãNICå´ã§å®è¡ãCPUãPCIãã¹ã®è² è·ãä¸ããï¼ã®ãããªãããã¯ã¼ã¯æ¥ç¶ã®ä½é 延åããFPGAå©ç¨ã®ã¯ãã¾ãã
- ããã²ã¨ã¤ã®ç¨éã¯ããã¼ã±ãããã¼ã¿ã®ãã£ã¼ããã³ããªã³ã°ããªã¹ã¯åæã«ãããå種æ¼ç®ãã·ãã¥ã¬ã¼ã·ã§ã³ãå®æéã§å®è¡ããç¨éã
âMost of them are doing some variation of ticker plantâ, responds Durwood. âSome are just looking at a handful of stocks and absolutely hot-rodding those, others are trying to convert a 150-deep book into automated trading. Itâs still at the immature phase where different people are trying different approaches and taking risks.â Taking care of tasks like parsing, filtering, normalisation and session management is where FPGAs can add a real advantage, so market data delivery and distribution is ripe for this technology.
- FPGAå©ç¨äºä¾ã®å¤§åã§ã¯ãticker plantï¼ãã¼ã±ãããã¼ã¿ãæè³å®¶ãåã·ã¹ãã ã«ãªã¢ã«ã¿ã¤ã ãã¤ä½é 延ã«é ä¿¡ããåºç¤ï¼ãããã«é¡ããåºç¤ã®æ§ç¯ã«å©ç¨ãã¦ãããã¾ããè¤æ°ã®æ ªä¾¡ãç£è¦ãã¦hot-roddingï¼ï¼ï¼ãããã150-deepï¼ï¼ï¼ã®ãã¬ã¼ãã£ã³ã°ã»ããã¯ãå ã«èªååå¼ãå®è£ ãããã¨ãã¦ãããã¡ã¼ã ãããã
- ãã¼ã¿ã®ãã¼ã¹ããã£ã«ã¿ãªã³ã°ããã¼ãã©ã¤ãºãã»ãã·ã§ã³ç®¡çãªã©ã¯FPGAã®å¾æã¨ããã¨ããããã¼ã±ãããã¼ã¿ã®é ä¿¡ã¯FPGAã«ãã¾ãã£ã¨ãé©ãã¦ããç¨éã ã
Many times, Market Data Managers implement direct feeds in a 1:1 pairing with the most demanding clients. But the explosion of low latency needs makes the 1:1 pairing of feeds and clients untenable. Most enterprises need to simultaneously feed dozens (even hundreds) of servers including: surveillance risk systems, historical tick databases and back up servers. On top of all of this, they continue to experience âbursty marketsâ, a trend that is expected to continue with the increasing number of automated trading programs. Recently, Market Data managers threw more CPUs at the problem. They constructed multi-threaded programs, and tried to off load the CPUs. But now, there is a type of recapitulation happening in the market. The market is turning to new architectures, such as pure FPGA architectures to re-invent the market data infrastructure. The FPGA architectures of tomorrow offer a highly parallelized approach to processing market data. This enables a deterministic latency infrastructure. Deterministic latency, (keeping the same speeds regardless of data rates, number of venues or distribution points) is the new goal for Market Data Managers. And it gives the Market Data Managers the ability to offer guaranteed service levels to their users. Most importantly, the new FPGA architectures gives the algo trader a dependable ultra-low latency reaction time to changing market signals, under all market conditions.
- ããã¾ã§ãã¼ã±ãããã¼ã¿ããã¼ã¸ã£ï¼æè³éè¡ã証å¸ä¼ç¤¾ã«ããã¦ticker plantã®æ§ç¯ãæ å½ããé¨éï¼ï¼ã¯ã大ææè³å®¶ã¨ã®éã§1:1ã®ãããã¯ã¼ã¯æ¥ç¶ãè¡ãããã¼ã±ãããã¼ã¿ãã£ã¼ããæä¾ãã¦ããããããï¼HFTã®æ®åã«ããï¼ä½é 延æ§ãå¿ é ã¨ãªã£ããããã§ã1:1ã®æ¥ç¶ã¯é£ãããªã£ããå¤ãã®æè³å®¶ã¯ããªã¹ã¯åæã·ã¹ãã ããæ価履æ´ãã¼ã¿ãã¼ã¹ãããã¯ã¢ãããµã¼ãã¼ãªã©ãæã«ã¯æ°ç¾å°ã®ãµã¼ãã¼ãä¿æãã¦ããããããã«åæã«ãã¼ã±ãããã¼ã¿ãé ä¿¡ããªããã°ãªããªããå ãã¦ãèªååå¼ããã°ã©ã ã®å¢å ã«ãã£ã¦ããã¼ã±ããã®åãã®ãã¼ã¹ãæ§ãé«ã¾ãã¤ã¤ããã
- ã¤ãæè¿ã¾ã§ããã¼ã±ãããã¼ã¿ããã¼ã¸ã£ã¯ãããå¤ãã®ããã»ããµããµã¼ãã¼ã«æè¼ãããã¨ã§ããã®åé¡ã«å¯¾å¦ãããã¨ãã¦ãããä¾ãã°ãã«ãã¹ã¬ããã»ããã°ã©ãã³ã°ã«ããCPUéã§è² è·åæ£ããææ³ãªã©ã§ãããããããã¾ãã²ã¨ã¤ã®é²åãèµ·ãã¦ãããFPGAã®ã¿ããã¼ã¹ã¨ããã¢ã¼ããã¯ãã£ã«ãã£ã¦ããã¼ã±ãããã¼ã¿é ä¿¡ã®ã¤ã³ãã©ãæ§ç¯ãç´ãåãã§ãããFPGAã®é«åº¦ãªä¸¦åæ§ã«ãã£ã¦ããã¼ã±ãããã¼ã¿ã®é ä¿¡ã«ã¨ããªãã¬ã¤ãã³ã·ã®æ±ºå®æ§ã確ä¿å¯è½ã«ãªãã
- ã¬ã¤ãã³ã·ã®æ±ºå®æ§ã¨ã¯ããã¼ã¿ã®è»¢éé度ãé ä¿¡å ã®æ°ãªã©ã«ãããããã¼ã¿ã®é 延ãä¸å®ä»¥ä¸ã«ä¿è¨¼ã§ããæ§è³ªã§ãããã¬ã¤ãã³ã·ã®æ±ºå®æ§ã¯ã顧客ã¸ã®ãµã¼ãã¹ã¬ãã«ãä¿è¨¼ããæ段ã¨ãªãããããã¼ã±ãããã¼ã¿ããã¼ã¸ã£ã«ã¨ã£ã¦ç¾å¨ãã£ã¨ãéè¦ãªææ¨ã¨ãªã£ã¦ãããå ãã¦ãFPGAã®å©ç¨ã«ãã£ã¦ããã¼ã±ãããã©ã®ãããªç¶æ³ã«ãããã¨ã絶ãéãªãåããã¼ã±ããã·ã°ãã«ã«å¿ãã¦è¶ ä½é 延ã§å¯¾å¿ã§ããæ段ããã¬ã¼ãã¼ã«æä¾ãããã¨ãã§ããã
ããã§ç¤ºããã¦ãããã«ãHFTã«ãããFPGAå°å ¥ã®åæ©ã¯ããªãç¹æ®ã ããã®åéã®æ å ±ãè¦ã¦ããã¨ãã¤ã¯ãç§ãæ°ç¾ããç§ã¨ãã£ããªã¼ãã¼ã®æ°å¤ãããåºã¦ãã¦ããããã¯ããã¼ããªã¢ã«ã¿ã¤ã å¶å¾¡ã®ä¸çã§ããããããªãã¨æ¢åã®CPUããã®ä¸ã§åãä¸è¬çãªOSã§ã¯ã¬ã¤ãã³ã·ã®ç¢ºå®ãªäºæ¸¬ãé£ãããä¾ãã°ãã¼ã±ãããç¬éçã«æ¿ããåãã¦OSè² è·ãä¸æãããªã¼ãã¼ãåºãã®ãæ°msé ãã¦ãã¾ã£ã...ã§ã¯æ¸ã¾ãããªããå®éã«ããã¼ã±ãããã¼ã¿é ä¿¡ã®æ°msã®é ãã«ãã£ã¦NYSEが4億円の罰金ã課ããããããã¦ããã
ããããç¹æ®ç°å¢ã®ãªãã§å¹ãããFPGAæè¡ã§ãããããããããã®ã¾ã¾ä¸è¬ã®ç¨éï¼ä¾ãã°Webãããã°ãã¼ã¿ï¼ã«ãããåºããã¨ã¯æããªãããã ãé·æçã«ã¯FPGAã®ã³ã¢ãã£ãã£åã«ãã£ã¦ä¸è¬ç¨éã§ã®FPGAå°å ¥ã®ãã¼ãã«ãããã¨ä¸ããã¤ã¤ããããããã¤ã³ãã§ã¯ãªãã£ã«ã«ãã¹ã«éããããããªãããªãã¨æã£ããã
éèHFTã«ãããFPGAå©ç¨ã®ç¾ç¶
ã¤ã¥ãã¦ãHFTãã¡ã¼ã ããã¾ç´é¢ãã¦ããFPGAå©ç¨ã®èª²é¡ã«ã¤ãã¦ã
âThere are maybe 25-30 end-user firms around the world that are fully capable of developing complete high performance applications in FPGA todayâ
- ï¼éèHFTã®ï¼é«æ§è½ã¢ããªãFPGAã§ãã«ã«éçºã§ãããã³ãã¼ã¯25ã30社åå¨ããã
Working with order and trade data brings additional complexities in development and testing however, because the order can go through so many state transitions. Despite such complexities, FPGAs are now being used in FIX engines, rules engines, and even full-blown execution management systems. Ferdinando La Posta, Co-Founder of trading solutions vendor GATELab, explains his firmâs approach.
- ãªã¼ãã¼ãåå¼ãã¼ã¿ã®å¦çã¨ãªãã¨ãããããã®ç¶æ é·ç§»ã®åãæ±ããå¿ è¦ã¨ãªããéçºããã¹ãã®è¤éããå¢ãã
- ããããé£ããã¯ãããã®ã®ãFPGAã¯FIXãããã³ã«ï¼éèåå¼ã®æ¨æºãããã³ã«ï¼ã¨ã³ã¸ã³ãã«ã¼ã«ã¨ã³ã¸ã³ãããã«ã¯å·è¡ç®¡çã·ã¹ãã å ¨ä½ã®å®è£ ã¾ã§ã«å©ç¨ããå§ãã¦ããã
âFirms realise they can buy FPGA-based solutions from vendors who have done the straightforward stuff like TCP offload, FIX/FAST translation, feed handling and so on,â says Keene. âBut HFT firms also realise that the only way to gain an advantage now is to come up with better, more clever, more efficient, more productive and more profitable algorithms. Thatâs going to be their differentiator, but not many HFT firms have had much success putting their own trading algorithms onto FPGAs.
- TCPãªããã¼ããFIX/FASTãããã³ã«å¤æããã£ã¼ããã³ããªã³ã°ã¨ãã£ãç°¡åãªå¦çãæ±ãFPGAã½ãªã¥ã¼ã·ã§ã³ã§ããã°ããã¾ããã³ãã¼ããããã«è³¼å ¥ã§ããã
- ç¾å¨ã®FPGAæ´»ç¨ã®ç¦ç¹ã¯ãããè³¢ãå¹ççã§ãçç£æ§ã¨åçæ§ã®é«ãã¢ã«ã´ãªãºã ãFPGAã§å®è£ ã§ãããããããããå·®å¥åè¦å ã ããèªç¤¾ã®ãã¬ã¼ãã£ã³ã°ã¢ã«ã´ãªãºã ãFPGAã«è¼ãããã¨ã«æåãããã¡ã¼ã ã¯ãã¾ãå¤ãã¯ãªãã
Yes, you had ANDs, ORs and other arithmetic operators, but trying to build up logical or algorithmic expressions using individual components can end up being quite weighty and not really optimised. So to do anything clever like standard deviation, you were basically on your own and had to develop it yourself.â
- FPGAã§ã¯ãANDãORããã®ä»ã®ï¼ç°¡åãªï¼æ°å¤æ¼ç®ãªã©ã®æ¼ç®ã¯ç°¡åã«è¨è¿°ã§ãããããããåã ã®ã³ã³ãã¼ãã³ããçµã¿åããã¦æ°å¼ãè«çå¼ãæ§æãããã¨ããã¨ãããã«è¦æ¨¡ã大ãããªã£ã¦ããã®ã¾ã¾ã§ã¯ï¼ã¬ã¤ãã³ã·ã®ï¼æé©åããªããã¦ããªãããã£ã¦ãæ¨æºåå·®ã¿ãããªã¡ãã£ã¨é«åº¦ãªæ¼ç®ãè¡ãã«ã¯ãåºæ¬ãã¹ã¦èªåã§åè·¯ãçµãã§æé©åãã¦ããå¿ è¦ãããã
But all of this means that there is now a growing demand in the financial markets for programmers and engineers who can work directly in VHDL and Verilog. âNobody knows how to write parallel code except the HPC guys,â asserts Keene. âA programmer writing an application to do financial transactions is unlikely to know how to write parallel code. FPGAs are still the realm of the electrical engineer as opposed to the computer science engineer. And those guys have now discovered that what they know is really valuable, so theyâre charging an arm and a leg for it.â
- éèåéã§ã¯ä»å¾ãVHDLãVerilogãç´æ¥æ¸ããããã°ã©ãã¼ã®ãã¼ãºãå¢ãç¶ããï¼ãHPCåéã®ã¨ã³ã¸ãã¢ãããªãéãã大è¦æ¨¡ä¸¦åãªã³ã¼ãã®æ¸ãæ¹ãç¥ã£ã¦ãã人ã¯ã»ã¨ãã©ããªããéèåå¼ã®ã¢ããªãæ¸ãã¦ãããã°ã©ãã¼ã¯ãFPGAã®ä¸¦åæ§ãå¼ãåºãã³ã¼ãã®æ¸ãæ¹ãç¥ããªããFPGAã¯æªã ã«é»ååè·¯è¨è¨ã®ä¸çã§ãã£ã¦ãã³ã³ãã¥ã¼ã¿ã¼ãµã¤ã¨ã³ã¹ã®ä¸çã§ã¯ãªããããã«ããã価å¤ãããã¨ç¥ã£ã¦ãï¼åè·¯ãæ¸ããã¨ã³ã¸ãã¢ã¯ï¼èªåã®ã¹ãã«ãé«å¤ã§å£²ãã¤ãã¦ããã
FPGAã¯CPUã§ã¯ãªãã®ã§ãã¡ãã£ã¨è¤éãªãã¨ããããã¨ããã¨ããã«ãã«çªãå½ãããç¾ç¶ã§ã¯HFTãã¡ã¼ã å社ã¯ãã®ã«ããããã«ãã¦è¶ ããããã競ã£ã¦ããç¶æ³ã®ããã ãã¾ãã次ã«åºã¦ããé«ç´è¨èªã«ããé«ä½åæã¯ã¾ã æ®åãã¦ããããä½ã¬ãã«ãªãã¼ãã¦ã§ã¢è¨è¿°è¨èªã§ããHDLãã¬ãªã¬ãªæ¸ãã¦æé©åã§ãããã¸ã¿ã«åè·¯è¨è¨è ãã¢ãã¢ããªæ§åã伺ããã
CãHaskellããã¼ã¿ããã¼è¨èªã«ããé«ä½åæ
éè以å¤ã®FPGAçéã§é¢ç½ã話é¡ã®ã²ã¨ã¤ã¯ãé«ä½åæã ãã¤ã¾ããä½ã¬ãã«ãªHDLãå°å³ã«æ¸ãã¦ããã®ã§ã¯ãªããCè¨èªã§ããã¼ã«æ¸ããã³ã¼ãããHDLã®é åºåè·¯ãçµã¿åããåè·¯ãåæããã¢ããã¼ãã§ããããããHFTåéã§ã¯ããã¾ãæ®åãã¦ãªãã¿ããã
âAt one extreme, you give someone a low-level language, you give them all the detail they need, the data sheets, the specifications, an oscilloscope and a layout tool, so that they can build a board and develop it from there. At the other end of the spectrum is the claim that you can take a serial process, written in C for example, and somehow magically turn it into something that can run in a massively parallel way, which is what you need to do to put it on the chip.
- ï¼FPGAéçºã®ã¢ããã¼ãã®ã²ã¨ã¤ã¨ãã¦ãHDLã®ãããªï¼ä½ã¬ãã«ã®è¨èªã使ããç´°ããã¨ããã®ä½ããã¿ãã¯ããããã¼ã¿ã·ã¼ããä»æ§æ¸ããªã·ãããããã®ã¬ã¤ã¢ã¦ããã¼ã«ãªã©ãç¨ãã¦ãã¼ãã®éçºã¾ã§è¡ãªãï¼ä¼çµ±çãªï¼ã¢ããã¼ããããã
- ããä¸æ¹ã®ã¢ããã¼ãã¨ãã¦ã¯ãCè¨èªãªã©ã®æç¶ãåã®é«ç´è¨èªãç¨ãã¦ãã©ãã«ããã¦ããããé«ã並åæ§ãå¼ãåºãã¦FPGAä¸ã§åä½ãããã¨ããææ³ãããï¼é«ç´è¨èªã«ããé«ä½åæï¼ã
âIn the ASIC prototyping world, there have been attempts to get from C to gates for many years. And there have been any number of companies founded and failed on the basis of their C-to-gates technology, because it just doesnât work out. From our point of view, as FPGA guys, we donât understand how you can even begin to minimise latency in terms of clock cycles without designing as close down to the actual logic as possible. There are companies claiming you can do it in C or even in MATLAB, but we donât agree,â he says.
- ASICãããã¿ã¤ãã³ã°ã®ä¸çã§ã¯ãCè¨èªããã²ã¼ãã¬ãã«ã¸ã®åæã®ææ³ãé·å¹´ã«æ¸¡ã£ã¦æ¤è¨ããã¦ãããCãã¼ã¹ã®é«ä½åæã®ãã³ãã¼ãããã¤ãç¾ãã¦ã¯ããã¾ãè¡ããã«æ¶ãã¦ãã£ãããã¾ã®FPGAã«æ±ãããã¦ããã®ã¯ä½é 延æ§ã§ãããã¨ãèãã¦ããã²ã¼ãã¬ãã«ã§è¨è¨ããã«ã¬ã¤ãã³ã·ãæé©åã§ããã¨ã¯èãã«ãããCãMATLABã§è¨è¿°ããã³ã¼ããFPGAã«è½ã¨ããã¨ãã£ã主張ã«ã¯åæã§ããªãã
ã¬ã¤ãã³ã·ã並åæ§ã®æé©åãå½ã®HFTã§ã¯ãCããçæããHDLã§FPGAã使ããªãã¦ããã£ã¦ã®ã»ããããããã¾ãé«ä½åæã«ã¤ãã¦ç¥ããªããã©ãCã§æ®éã«æ¸ãã°ãããããé«åº¦ã«ä¸¦ååãããHDLãçæãããã¨ã¯æ³åãã«ããããã¶ãé åºåè·¯ã ããã«ãªããã ãããªããã£ã¨ããã³ã¢ãã£ãã£åããFPGAãç¨ããä¸è¬ç¨éã§ã¯é«ä½åæã®å©ç¨ã主æµã«ãªãããããªãããªãã¨æãããªã«ãé«ç´è¨èªã¯ãããã°ãã©ã¯ã ããã
ä¸æ¹ã§ãCã®ãããªæç¶ãåè¨èªã§ã¯ãªãããã¼ã¿ããã¼è¨èªãé¢æ°åè¨èªã使ã£ã¦é«ä½åæãããã¨ããã¢ããã¼ããããã
âAt Maxeler we take a very different approach. You shouldn't be thinking in terms of sequential C++ code, but instead think about the flow of data through your algorithm, which at the end of the day is all that matters. MaxCompiler does all the heavy lifting for you, like making sure the data is in the right place at the right time, and presents the programmer with a high level abstraction of the dataflow that is easy to conceptualise. Because of this you spend your time designing great algorithms rather than getting your hands dirty with all the messy details.â claims Spooner.
- Maxelerã§ã¯ãããããããã¨ãéãã¢ããã¼ããã¨ã£ã¦ãããçµå±ã®ã¨ãããéè¦ãªã®ã¯æ¬²ããã¢ã«ã´ãªãºã ã«åºã¥ãã¦ãã¼ã¿ã®æµããå®ç¾©ãããã¨ã§ãã£ã¦ãå¿ ããããããC++ã®ãããªæç¶ãåè¨èªã使ã£ã¦å®è£ ããå¿ è¦ã¯ãªããMaxCompilerã§ã¯ãç解ããããæ½è±¡åº¦ã®é«ããã¼ã¿ããã¼ãããã°ã©ãã¼ã表ç¾ã§ããç°å¢ãæä¾ãããã¤ãã¼ã¿ãæ£ããã¿ã¤ãã³ã°ã§æ£ããå ´æã«æµãããã¨ãããã°ã©ãã¼ã«ä»£ãã£ã¦ç®¡çãããï¼HDLã®ããã«ï¼ç´°ããé¢åãªè©³ç´°ãè¨è¿°ããå¿ è¦ã¯ãªããããå¤ãã®æéãã¢ã«ã´ãªãºã ã®è¨è¨ã«ä½¿ããã
ãã¼ã¿ããã¼å¨ãªä¿ºã¯ããããã®åå¿ãã¦ãã¾ããªãããä¸æ¹ãParallel Scientificã¨ãããã³ãã¼ã§ã¯Haskell FPGAコンパイラã¨ãããã®ãä½ã£ã¦ããã
GPGPUã¨FPGAã®éã
ã¨ããã§ãFPGAã®ã©ã¤ãã«ã§ããGPGPUã¯ä½¿ããã¦ãªãã®ï¼
âGPUs have a stronghold in the parallel processing of graphics, but where they fall down is that they donât help you with latency at all,â explains Spooner. âWhile they have a massive amount of parallelism, itâs coarse-grain parallelism, itâs not pipeline parallelism.â âIf youâve got one message, you can only process it in one core. So it doesnât matter how many different cores you have, itâs only in one of them. Which means that GPUs give you a throughput play rather than a latency play. If you want to reduce latency, you need fine-grain parallelism,â says Spooner. "At Maxeler we use Dataflow engines (DFEs), which provide the same fine grain parallelism, but are ready-to-compute rather than just blank chips.â Another problem with GPUs is that they tend to be more prone to failures and errors in calculations. So although they can work well for certain parallel computing tasks like Monte Carlo simulations, they are unable to deliver the level of determinism offered by FPGAs.
- GPUã¯ã°ã©ãã£ã¯ã¹ã®ä¸¦åå¦çã«ã¯å¼·ãããä½é 延ã®å®ç¾ã«ã¯åãã¦ããªãã並å度ã¯é«ãããããã¯ç²åº¦ã®ç²ã並å度ã§ãã£ã¦ããã¤ãã©ã¤ã³ã¬ãã«ã®ä¸¦å度ã§ã¯ãªãã
- ä¾ãã°GPUã§ã¯ãã²ã¨ã¤ã®ã¡ãã»ã¼ã¸ãåãåã£ãã¨ããããã1ã¤ã®ã³ã¢ã§ããå¦çã§ããªããã³ã¢ãããã¤ãã£ã¦ãæå³ããªããã¤ã¾ãGPUã¯ã¬ã¤ãã³ã·ãããã¹ã«ã¼ããããéè¦ããè¨è¨ã«ãªã£ã¦ãããä½é 延ãå¿ è¦ãªããããç´°ç²åº¦ã®ä¸¦åæ§ãå¿ è¦ã ãããã§Maxelerã§ã¯ãããããç´°ç²åº¦ã®ä¸¦åæ§ãã¼ãããè¨è¨ããã®ã§ã¯ãªããããã«è¨ç®ã«ä½¿ãããã¼ã¿ããã¼ã¨ã³ã¸ã³ï¼DFEï¼ãç¨ãã¦å®ç¾ãã¦ããã
- GPUã®ããã²ã¨ã¤ãåé¡ç¹ã¯ãæ¼ç®ä¸ã®ã¨ã©ã¼ãé害ã«å¼±ãã¨ããããã®ããã§ãã¢ã³ãã«ã«ãã·ãã¥ã¬ã¼ã·ã§ã³ã®ãããªä¸¦åæ¼ç®ã«ã¯åãã¦ããã®ã®ãFPGAã®ãããªã¬ãã«ã®ï¼ã¬ã¤ãã³ã·ã®ï¼æ±ºå®æ§ã¯å®ç¾ã§ããªãã
ã¤ã³ãã«ã®ä¸»å¼µ
ãã®ã¬ãã¼ãã§ã¯ã¤ã³ãã«ã¸ã®ã¤ã³ã¿ãã¥ã¼ãæ²è¼ããã¦ããä»ã®ã¨ããã¤ã³ãã«ã¯ãFPGAã§ãGPGPUã§ããªãCPUã使ãï¼ã£ã¦ã¹ã¿ã³ã¹ã§ãããã¨ãããåããï¼ç¤¾å ããç 究ãã¦ãã ãããã©â¦ï¼ã
FPGA-based feed handlers are very common, but when people want to do more, they end up sending the data to a CPU socket. There were visions at one time about doing a lot of the calculation on the FPGA, but programming FPGAs are fairly challenging. Itâs certainly not as easy as performant code with higher level languages. We found that people started looking at using Cuda as an alternative for the math side of things, but this too can be a more challenging development environment. So, people take applications that were targeted at an FPGA for math, and attempt to run those computations on GPGPUs. But then they were losing the benefit of doing everything on the FPGA card and sending out the data as quickly as though that is possible, because once you leave the card, you incur a latency hit going across the bus. Fragmentation of the code base is a bad thing, and this type of GPU + FPGA + CPU solution makes it worse.
- FPGAãã¼ã¹ã®ãã£ã¼ããã³ãã©ã¯ã¨ã¦ãåºãå©ç¨ããã¦ãããããã以ä¸ã®ãã¨ããããã¨ããã¨ãCPUã«ãã¼ã¿ãéããã¨ã«ãªãããã¤ã¦ã¯FPGAä¸ã§ãã¾ãã¾ãªæ¼ç®ãè¡ãã¨ããæ§æ³ããã£ãããFPGAã®ããã°ã©ãã³ã°ã¯ç°¡åã§ã¯ãªããé«ã¬ãã«è¨èªã®ãããªçç£æ§ã®é«ãã³ã¼ãã«æ¯ã¹ãã¨é£æ度ãé«ãã
- FPGAä¸ã§ã®æ°å¤æ¼ç®ãCUDAã§è¨è¿°ãã試ã¿ããã£ããããããéçºç°å¢ã«é£ç¹ããããããã§ãFPGAåãã«æ¸ããæ°å¤æ¼ç®ã¢ããªãGPGPUã§åãããã¨ãã人ãããããããã¨ããã¹ã¦ãFPGAã«ã¼ãä¸ã§å¦çãããã¨ããã¡ãªããã失ãããã«ã¼ãå¤é¨ã¨ã®ãã¼ã¿ã®ããåãã§ã¬ã¤ãã³ã·ãçºçãã¦ãã¾ãã
- GPU + FPGA + CPUãçµã¿åãããã½ãªã¥ã¼ã·ã§ã³ã¯ãã³ã¼ããã¼ã¹ã®åæãã¾ãããã¡ãªã½ãªã¥ã¼ã·ã§ã³ã ã
FPGAæè¼ã¢ããªã±ã¼ã·ã§ã³ã¹ã¤ããã®ç»å ´
éèHFTã®FPGAçéã§ã®ãã¾ãã£ã¨ãç±ã話é¡ã¯ãFPGAæè¼ã®ã¢ããªã±ã¼ã·ã§ã³ã¹ã¤ãããå ·ä½çã«ã¯Arista 7124FXã®ç»å ´ã§ãããã¢ããªã±ã¼ã·ã§ã³ã¹ã¤ããã¨ã¯ã¤ã¾ãã¬ã¤ã¤7ã¹ã¤ããã®ãã¨ã§ããã¼ããã©ã³ãµã¼ã®ããã«ã¢ããªã±ã¼ã·ã§ã³ã¬ã¤ã¤ã®æ å ±ï¼URIã¨ãã¯ã¨ãªãã©ã¡ã¼ã¿ã¨ãã¯ããã¼ã¨ãï¼ãè¦ã¦ã¹ã¤ããã³ã°ããããããããã¯ã¼ã¯æ©å¨ãæãã
With the announcement of the 7124FX switch from Arista, itâs now possible to put logic on a switch. Of course, all data needs to cross a switch when it enters or leaves your cabinet, or travels inter-server in a co-lo location. Some applications can really benefit if you can perform actions on the data as itâs making that mandatory switch. Anything that could benefit from data transformation (a good example is normalizing market data, or filtering down a feed) or inspection with/without manipulation (for example risk checking) is a great application for the switch. A simple example: âSend this buy order when the price of MSFT drops below 27.42â. Imagine if you could program this into the switch: you could go from tick-to-trade in nanoseconds, without even fully entering your cabinet! This is exciting stuff and it will be reality very soon!â
- ï¼2012å¹´ã®ï¼Arista 7124FXï¼FPGAæè¼ã¢ããªã±ã¼ã·ã§ã³ã¹ã¤ããï¼ã®ç»å ´ã«ãã£ã¦ãã¹ã¤ããä¸ã«ã¢ããªã±ã¼ã·ã§ã³ã»ãã¸ãã¯ãè¼ãããã¨ãå¯è½ã«ãªã£ãã
- ã¹ã¤ããã¯ãã³ãã±ã¼ã·ã§ã³ï¼ãã¼ã¿ã»ã³ã¿ã¼ï¼å ã®ãµã¼ãã¼ãã©ãã¯ãåºå ¥ããããããããã¼ã¿ãéãæããå ´æãããã§ãã¾ãã¾ãªå¦çãå®è¡ã§ãããã¨ã§å¤å¤§ãªã¡ãªãããå¾ãããç¨éãå¤ããä¾ãã°ãã¼ã±ãããã¼ã¿ã®ãã¼ãã©ã¤ãºããã£ã«ã¿ãªã³ã°ããã¼ã¿å 容ã®ç£è¦ãå å·¥ï¼ãªã¹ã¯ãã§ãã¯ãªã©ï¼ã¯ããã®ã¹ã¤ããã«æé©ãªå©ç¨ä¾ã ã
- ä¾ãã°ãMSFTã27.42以ä¸ã«ä¸ãã£ããè²·ããå ¥ãããã¨ãã£ããã¸ãã¯ãã¹ã¤ããã«çµã¿è¾¼ããããã¼ã±ãããã¼ã¿ããµã¼ãã¼ã®ã©ãã¯ã«å±ãåã«ãããã»ã«ã³ãã®åä½ã§åå¼ãå¯è½ã«ãªãããããªã¹ã´ããã¨ãããããå®ç¾å¯è½ã«ãªãã
FPGAãè¼ããã¢ããªã±ã¼ã·ã§ã³ã¹ã¤ããããã¯ããã¼ããã©ã³ãµã¼ãªãã¦ã¤ã¾ããªãä»äºã¯ãã¦ããªããæ ªä¾¡ããµã¼ãã¼ã«å±ãåã«åæã«æ ªã®å£²ãè²·ããã¦ãï¼ãªã«ãããããï¼ã俺ãã¡ãã£ã¨åã«IPスイッチみたいなFPGAサーバーがあれば面白いだろうなぁ〜と妄想ãã¦ããããã®Aristaã®FPGAã¹ã¤ããã¯ã¾ããããã®ã¾ãã¾ã®è£½åã ã£ããã¾ã誰ã§ãèãããããããAristaã¯Sunã®ã¢ã³ãã£ã»ãã¯ãã«ã·ã£ã¤ã ãè¨ç«ãã¦ã¦ããã£ã±ããã¤ã¨ãªã¢ã®ã³ã¢ãªæè³å®¶ã®å è¦ã¯å端ãªããªãã¨æå¿ãããªã©ã
Accelerating Transactions Through FPGA-Enabled Switching - Arista Networks's blog - HFT Review via kwout
ãã®FPGAã¹ã¤ããã«ã¤ãã¦ã¯HFT Reviewã®別のインタビュー記事ã§ããå°ã詳ãã解説ããã¦ã¦ããã¡ããããã¸ãèå³æ·±ãã
With the goal of accelerating transactions as they pass through the network, it was crucial for us to find a truly in-line solution; rather than simply attaching the FPGA as a client of the switching chipset, the processor sits directly in line with the traffic flow, leveraging both the high functionality of the FPGA and the more traditional network forwarding, multicasting and filtering capabilities that are inherent to the ASIC itself.
- ã¹ã¤ããã®ãããã»ããã®ã³ããã¨ãã¦FPGAãè¼ããã®ã§ã¯ãªãããã©ã³ã¶ã¯ã·ã§ã³ãééãããããã¯ã¼ã¯ã®å¿èé¨ã«FPGAãç½®ããã¨ãéè¦ãFPGAããã©ãã£ãã¯ããã¼ããã¤ã¬ã¯ãã«æ±ãã®ã§ãASICãå¾æ¥ããå¾æã¨ãããã±ããã®ãã©ã¯ã¼ãã£ã³ã°ããã«ããã£ã¹ãããã£ã«ã¿ãªã³ã°ã®æ©è½ã«å ãã¦ãFPGAã®é«æ©è½æ§ãæ´»ãããã¨ãã§ããã
ãã®ã¹ã¤ããã¯10GbEã24ãã¼ãããã®ã§ãã¤ã¾ãã¯240Gbpsã§æµãããã©ãã£ãã¯ã«on the flyã§ã¢ããªãã¸ãã¯ãé©ç¨ã§ããï¼è¨æ£ï¼FPGAãã¤ãªãã£ã¦ãã®ã¯8ãã¼ãã ãã§ããï¼ããã ããªãã ããã¯ãã
Once youâve got the ability to do this processing directly within the network, you can provide services that are leveraged across the pool of servers. Feed-handling (normalisation, translation), line arbitration, and symbol based routing are some of the obvious network-centric applications.
- ãããã¯ã¼ã¯ãã©ãã£ãã¯ããã¤ã¬ã¯ãã«æ±ããè½åãçããã¦ããµã¼ãã¼ç¾¤ã«ãã¾ãã¾ãªãµã¼ãã¹ãæä¾ã§ããããã£ã¼ããã³ããªã³ã°ï¼ãã¼ãã©ã¤ã¼ã¼ã·ã§ã³ãå¤æãªã©ï¼ã«å ãã¦ãã©ã¤ã³ã¢ã¼ããã¬ã¼ã·ã§ã³ï¼ã¢ã¼ããã©ã¼ã¸ã®ãã¨ï¼ï¼ããã·ã³ãã«ãã¼ã¹ã®ã«ã¼ãã£ã³ã°ãå¯è½ã ã
ã·ã³ãã«ãã¼ã¹ã®ã«ã¼ãã£ã³ã°ãã¤ã¾ããæ ªä¾¡æ å ±ãæµãã¦ãããããã®ä¼æ¥ã³ã¼ããè¦ã¦ãã±ãããã«ã¼ãã£ã³ã°ã£ã¦é¢ç½ãã@kibayos さんも指摘ã®éãããã¯ããªã¼ãã¼ã¬ã¤ãããã¯ã¼ã¯ãIPãããã¯ã¼ã¯ãç½®ãæãã¤ã¤ãããããªãä¾ãã°ãããMapReduceã®keyã«ãã¦ã¯ããã°ã鬼ã®ãããªè¶ é«éã§shufflingãã¦ãããã¹ã¤ãããç°¡åã«ã¤ãããï¼ãã£ã¨ããã¼ã¿åãåãå´ã«å¤§éã«SSD並ã¹ãªãã¨ããã«ããã¯ã«ãªãã ãããã©ï¼ãMRã«éããããã¼ãã«ã®ãã©ã¤ããªãã¼ã«ãªããããªããããIDã§ã·ã£ããã«ãã¸ã§ã¤ã³ãã¦ãããã¹ã¤ãã...ãããªã®ãã¯ã©ã¦ãã«ãããã¨ä¸¦ãã§ãã楽ãããã ãªï¼
ãããªããã§ããããã¯ã¼ã¯ãã³ã³ãã¥ã¼ã¿ã¼ã ã¨ããåè¨ãç¾å®ã®ãã®ã¨ãªãã¤ã¤ãããªããã¨ããéèHFTã®ç¾ç¶ã§ãã£ãã