Case Study: How Lacework Scaled Data Streaming with Redpanda
Lacework, a cloud security services company, offers a data-driven platform for application security at scale in the cloud. The Lacework platform provides security throughout the application life cycle by collecting and analyzing data from customer environments, and informing customers of potential issues quickly.
“When we say our platform secures applications in the cloud throughout the entire lifecycle, we really mean the full life cycle,” says Chip Turner, engineering director at Lacework. “Every aspect of the Lacework platform depends on real-time data flow to protect our customers. From the start, we analyze the code that builds infrastructure, catching violations, and providing customers with actionable guidance. During build, we integrate with CI/CD pipelines and registries to scan containers and host images, and block vulnerable, risky images before production. And in production, we continuously monitor our customers’ entire cloud infrastructure for unusual and malicious activity that could indicate an attack or issue that should be remediated quickly.”
To provide this end-to-end application lifecycle protection, Lacework collects and processes data from a wide variety of sources, with diverse formats, schema, and fidelity. These include logs and monitoring data from hundreds of thousands of Lacework agents running in its customers’ cloud infrastructure.
As Lacework released new capabilities, acquired more customers in a growing number of markets, and expanded to new clouds like Google Cloud, it needed a more scalable, reliable, and efficient solution than the proprietary streaming data solution it had initially used to build its application protection platform. Critically, Lacework needed to deliver high performance for its rapidly growing 1 GBps+ streaming data workloads, without incurring high costs.
This is the story of how Lacework came to rely on Redpanda, the Apache Kafka-compatible streaming data platform for developers, as a hub of data exchange between its different analytical systems and microservices, trusting it to help deliver security insights back to Lacework’s customers.
Finding a New Streaming Data Solution
In late 2021, as the Lacework team considered its options for a new, cloud-agnostic streaming data solution, the most critical factor was finding a platform that could handle high throughput peak loads with predictable latency while meeting the organization’s stringent service-level objectives (SLOs) and availability requirements. These internal KPIs are necessary to ensure security efficacy for Lacework’s customers.
“The data volumes we collect from customers are highly variable by the hour — in fact, 10x spikes are quite common,” said Chip Turner, engineering director at Lacework. “And we must isolate each customer’s data to ensure timely collection of data and insights.”
Along with those challenges, Turner added, “our ingestion pipeline is expected to rapidly deliver data to multiple destinations such as the data warehouse, long-term storage, and downstream pipelines. So our streaming solution has to optimize for both throughput and latency.”
After running four proof of concepts with different streaming solutions, some well-known and others newer to market, the Lacework team found that only Redpanda could excel in all the necessary dimensions of scalability, reliability and efficiency.
“We conducted a thorough evaluation of streaming data options and Redpanda was the clear winner,” said Turner. “In fact, during our benchmarking, we hit the limitations of our existing benchmarking tool, but Redpanda was barely breaking a sweat.
“Now, Lacework can easily run 14.5GB per second of data through Redpanda at peak loads. We’re really seeing the benefit of Redpanda’s performance architecture, the fact that it’s written in C++ and does intelligent memory handling. It makes a huge difference.”
The Lacework team was relatively new to the Kafka APIs. Fortunately, Redpanda offered a much simpler and easier deployment, using a single binary and with no dependencies like JVM or Apache ZooKeeper.
“While our engineering teams were ramping on the Kafka APIs, we knew the operational challenges of building core infrastructure on top of Java-based dependencies and legacy technologies would add unnecessary complexity and risk,” said Turner. “Redpanda was clearly designed both with modern developers and modern platform teams in mind. The single binary has everything you need to get to production, fast.”
Additionally, Lacework is a multicloud organization and needed a solution with flexible deployment options that wouldn’t lock them into a single vendor or cloud provider. Having a cloud-agnostic streaming data platform was critical for Lacework’s future growth.
Building the Streaming ‘Pipe Dream’ with Redpanda
As its first use case for the new streaming data platform, the Lacework team built its production data injection stack around Redpanda, an architecture the team members affectionately dubbed “Pipe Dream,” because it “looked too good to be true.”
In the new architecture, multiple Redpanda clusters ensure high availability and disaster recovery; if a cluster were to ever go down, Lacework can seamlessly direct traffic away from a faulty cluster to a healthy cluster.
A pipeline configuration service manages the Redpanda topic life cycle — creating and migrating topics, scaling up and down partition counts, setting appropriate replication factors and balancing topics across Redpanda clusters. This service also describes the overall topology for pipelines, from producers to consumers.
In turn, a pipeline scheduler leases pipelines ( topics/partitions) to physical workers, ensuring topics are efficiently consumed into databases, sinks such as Amazon Simple Storage Service (S3) for long-term storage, or Redpanda topics as the start of other pipelines.
The new Lacework system is highly efficient; it estimates the resource requirements to process each partition, schedules tasks to the appropriate worker to maximize utilization (such as packaging multiple small topics together into single node or reserving larger nodes for demanding high throughput tasks), and reallocates pipelines to alternative workers if a worker is unhealthy.
“The Lacework solution built on Redpanda has solved a number of issues we faced with Kafka consumer groups and automatic rebalancing,” said Turner. “It improves resource utilization and achieves a high degree of fault isolation, even between partitions of the same topic.”
If a cluster is in an unhealthy state, Lacework’s strict SLO and high availability requirements mean they need to shift traffic to another cluster immediately, ideally with minimal disruption, sometimes for hundreds or thousands of topics.
Manually migrating topics from one cluster to another is error-prone and tedious. With the new architecture, Lacework has developed virtual topics abstraction — a set of metadata describing physical topics in a physical cluster over time. This abstraction allows the company’s team to automatically migrate topics regardless of consumer status or lag, from one cluster to another with minimal hassle or production impact at scale.
“Our virtual Redpanda topics provide the illusion of a single topic on a single cluster, but can live in multiple clusters over time,” said Turner. “This allows us to more aggressively scale up topics to meet temporary increases in demand and just as easily scale down.
“With Redpanda and our new architecture, we have new levels of operational agility that are game-changing, so we can meet our stringent SLOs while keeping our developers focused on high-value work versus managing data infrastructure.”
[Learn more about the Lacework architecture in this Redpanda user conference talk]
Looking to the Future: Tiered Storage, K8s and Beyond
Since starting with Redpanda in December 2021, Lacework has grown its CPU cores footprint more than 1,200% and is now using Redpanda to host all of its streaming data. However, this is only the beginning of Lacework’s journey with Redpanda. Now that the organization’s data pipelines are running hyper-efficiently, the team’s next priority is to leverage Redpanda’s tiered storage to help reduce storage consumption on expensive EC2 nodes.
“With Redpanda’s tiered storage capability, we can save up to 30% or more of our storage costs by reducing cores,” says Turner. “These are the innovations that sold us on Redpanda as a partner for streaming data — they’re helping us control costs as we grow usage, which means they’re invested in us for the long run.”
As Lacework embarks on an infrastructure modernization project in 2023 to migrate to a managed Kubernetes environment with Amazon EKS, it will be taking Redpanda along for the ride.
“Today, we are utilizing storage-optimized i3en AWS instances, run as bare metal self-managed EC2,” said Turner. “To improve our infrastructure efficiency and operability, we are planning a move to EKS in 2023. The team loves that Redpanda doesn’t care what infrastructure it’s running on and can be run with tooling and runbooks consistent with the rest of our infrastructure.”