At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. He specifically delved into Venice DB, the NoSQL data store used for feature persistence. The presenter shared the lessons learned from evolving and operating the platform, including cluster management and library versioning.
LinkedIn has many AI/ML features, including People You May Know, all of which are powered by the company’s AI/ML platform, which provides support for feature ingestion, generation, and storage, as well as model training, validation, and inference. The platform focuses on enhancing the productivity of data scientists and engineers, providing opinionated and unified end-to-end capabilities that support development, experimentation, and operation for AI/ML workloads.
Félix GV, a principal staff engineer at LinkedIn, provided an overview of the AI/ML platform’s architecture and key technologies used by various subsystems. Frame is a virtual feature store that supports several storage backends: offline (Iceberg table), streaming (Kafka topic), and online (Venice store, Pinot table). LinkedIn open-sourced much of Frame's functionality as the Feathr project, which had a 1.0 release published recently.
LinkedIn’s AI/ML Platform Architecture (Source: QCon London Website)
The AI/ML platform uses the FedEx subsystem for feature "productionization", including preparation/transformation and pushing feature data to VeniceDB for feature serving. King Kong is used for model training, and Model Cloud is used for model serving. Model Cloud provides observability and monitoring, benchmarking and analysis, GPU support, as well as self-service onboarding.
GV discussed the role and evolution of VeniceDB as a derived data platform specifically created to support online storage for AI/ML use cases. The project was open-sourced in September 2022 and has received 800 new commits since then. Venice supports dataset versioning, which allows large datasets to be pushed from offline sources and seamless switching between dataset versions once the ingestion job is complete.
Data Ingestion into VeniceDB (Source: QCon London Website)
VeniceDB has evolved to reduce read latency from below ten milliseconds (p99) using the default thin client library to below ten microseconds (p99) using a RAM-backed Da Vinci client library that consumes updates directly from the data-ingestion Kafka topic. The VeniceDB team ensures strong backward compatibility of the three client libraries it supports to allow users to easily migrate between them if they want to benefit from lower-latency reads.
GV shared challenges and lessons learned by the team operating VeniceDB at LinkedIn. The data platform team strongly emphasizes keeping control over the infrastructure layout and cluster assignments, as this allows them to manage the clusters without inconveniencing the client teams. This is especially important considering different workload bounds, with either storage or traffic being the limiting factor.
The choice of a compression algorithm plays an essential role in storage-bound workloads, and the team observed using ZSTD offers significantly better results in most cases. Similarly, for storing embeddings, the serialization protocol can massively reduce memory and compute utilization, and the team achieved substantial improvement by using its own optimized Avro utilities.
Lastly, GV remarked on the importance of effective client library version management. The team has adopted an aggressive policy for deprecating old versions, with automated dependency upgrade promotion/application.
InfoQ spoke with Félix GV following his presentation:
InfoQ: LinkedIn has long used machine learning, but with new AI techniques now gaining popularity. How do you see the future of the AI/ML platform at LinkedIn and the industry overall?
Félix GV: Although I did not get a chance to cover it in-depth in the talk, stream-processed AI is a rapidly growing category of workloads. We have several hundred such datasets in Venice and the pace of adoption continues to accelerate. These workloads bring in new challenges, such as requiring tighter thresholds of freshness, and we are working on making the system more deterministic along this dimension, even while bursty batch pushes keep coming in.
InfoQ: What technical challenges do you expect to face as more advanced AI/ML technologies take hold? Is there any work happening at LinkedIn to prepare for GenAI/LLMs/etc.?
Félix GV: There are a variety of new AI-specific challenges. For example, GPUs are expensive, in demand, and in short supply, so we need to make good use of those we have. That is one of the motivations for building the Model Cloud service which I’ve presented during the talk: so that we may increase GPU utilization and thus serve more AI workloads out of the same amount of resources. As far as LLMs are concerned, LinkedIn is already using them in production. For example, a job seeker coach functionality was released to premium members last fall. Stay tuned for more!
Access recorded QCon London talks with a Video-Only Pass.