Enfabrica Corp. today emerged from stealth at the MemCon conference to launch a class of processors for networking in cloud computing environments that are optimized for moving data between CPUs, graphical processor units (GPUs), accelerators and memory.
Enfabrica CEO Rochan Sankar said the Accelerated Compute Fabric (ACF) enables scalable, streaming, multi-terabit-per-second movement of data in a way that also reduces the total cost of cloud networking. That’s critical as the amount of data being generated by applications infused with artificial intelligence (AI) capabilities that require access to larger amounts of data continues to increase, he added.
At the same time, DevOps workflows will need to evolve as it becomes possible to compose AI fabrics of compute, memory and network resources capable of scaling to tens of thousands of nodes versus relying on top-of-rack network switches, network interface cards (NICs), PCIe switches and CPU-controlled DRAM that creates too many bottlenecks when moving large amounts of data.
Enfabrica’s first chip, the Accelerated Compute Fabric Switch (ACF-S), supports standards-based interfaces, including multi-port 800 Gigabit Ethernet networking and high-radix PCIe Gen5 and CXL 2.0+ interfaces. The company claims that Enfabrica’s ACF-S is the first data center silicon product in the industry to deliver headless memory scaling to any accelerator by incorporating CXK memory bridging to enable a single GPU rack to have direct, low-latency, uncontended access to local DDR5 DRAM at more than 50 times greater capacity than GPU-native High-Bandwidth Memory (HBM). That approach should enable customers to cut their cost of GPU compute by an estimated 50% for large language model (LLM) inferencing and 75% for deep learning recommendation model (DLRM) inferencing, the company claimed.
Simulation tests suggested ACF-enabled systems could achieve the same target inference performance using only half the number of GPUs and CPU hosts compared to the latest GPU servers.
The cost of building AI applications today is prohibitive for many organizations, but as processor advances are made, there may come a time when the cost of building and training AI models substantially decreases. Advances in networking processor technologies will, as a result, accelerate the pace at which network operations are integrated into programmable DevOps workflows versus relying on dedicated networking teams to manage top-of-rack switches.
It’s not clear yet whether cloud service providers will embrace ACF, but the chip makes it clear there is a significant opportunity to reduce the amount of networking infrastructure needed to build and deploy AI applications.
Less clear, of course, is to what degree machine learning operations (MLOps) might ultimately be folded into DevOps workflows. While developers of AI models create a unique type of software artifact, they still need to be integrated with application code running in a production environment via application programming interfaces (APIs). One way or another, the total cost of building and deploying applications infused with AI models will need to decline.
In the meantime, the networking bottlenecks that conspire to make deploying distributed computing applications difficult today may soon become overcome as latency issues are increasingly overcome.