P99 Conf: 3 Ways to Squash Application Latency

"Latency lurks everywhere," warned Pekka Enberg in his P99Conf talk, offering three ways users can minimize slow performance on their systems.

Nov 22nd, 2024 1:02pm by Joab Jackson

Featued image for: P99 Conf: 3 Ways to Squash Application Latency

Feature image via Pixabay.

We’ve all been frustrated by latency, either as users of an application, or as developers building such apps.

At ScyllaDB‘s annual P99 virtual conference for system performance management, Pekka Enberg, founder and CTO of Turso Database, shared his favorite tips for spotting and removing latency from systems.

“Latency lurks everywhere,” said Enberg, who also has authored a book on latency, as well as worked fine-tuning the Linux kernel. “Every part of your stack will have some component will have some variance in latency.”

Each customer-facing app has a latency budget, which can be gobbled up by one bad component.

Latency is a distribution number: The 99th percentile is less than one failure for 100 successful operations. That may sound like a solid goal, but there is a compounding effect here, Enberg said. For an app with ten components, being in the 99th percentile still results in nine percent of users getting unsatisfactory performance.

And it can add up. Amazon once estimated that it loses 1% of sales for every 100ms of latency.

Screenshot

Enberg has thought plenty about ways of reducing latency and has boiled down his solutions into three different approaches:

Reduce data movement
Avoid work
Avoid waiting

#1 Reduce Data Movement

Moving the data needed to an application is always slow. In the best-case scenario, data can move only at the speed of light, and the network probably can’t even offer that theoretical throughput.

A network round trip from New York to London will be at least 57ms. Within the data center, it can take hundreds of microseconds to move between network cards of different servers, and within the server itself, accessing the server cache can still take nanoseconds.

To reduce latency, minimize data movement as much as possible, Enberg advised. Put the data as close as possible to where it is being used, either by co-location, replication or caching.

Moving the database to the same server that runs the application eliminates any network roundtrip latency. This is a favorite technique for high-frequency trading shops.

If you can’t move the data closer, you can at least copy the data to a closer location via caching and replication.

#2 Avoid Computational Work

Latency can also be lowered by avoiding unnecessary work. Although some processing will always be necessary, there is no reason to get the data entangled in more work than necessary.

This can be done in several ways, Enberg explained:

Reducing algorithmic complexity
Controlling memory management
Optimizing code
Avoiding CPU-intensive computation.

How many steps does your program or algorithm take once data is ingested? One that is too complex will slow things.

screenshot.

Simple algorithms work with simpler data structures, such as arrays, and stacks, queues and hash tables. Linked lists and graphs, on the other hand, do not mesh well with low-latency systems

Although it may save in system resources, memory management can be a burden as well. “There’s actually a lot of things it has to do,” he said. If you use garbage collection, it should run without pause rather than in herky-jerky start-and-top mode.

And don’t even think about using virtual memory, which basically runs at disk speed, since, in fact, it runs on disks and takes up a lot of other CPU resources besides.

Optimizing code is the most obvious way to reduce latency. Look for ways to reduce CPU cycles and cache misses. Find bottlenecks with an optimizer, and then repeat the process.

Also, keep in mind that not all reductions in work result in better performance. Batching, for instance, simplifies the workload for the CPU but actually increases latency.

One long-running process could be split into parallel processes, reducing CPU-intensive computation. One long-running process monopolizes the CPU, and creates longer wait times for other related processes.

#3 Avoid Waiting

Any component waiting around for another incurs latency, Enberg notes. So, avoid designing a system where waiting occurs anywhere.

Synchronization, in general, is always going to be an issue for low-latency systems.

“Synchronization with mutual exclusion means threats have to wait,” he said.

It should not be used, and if it has to be deployed, then use “wait-free” synchronization. Try using partitions to eliminate the need for synchronization, and use read-only shared data structures wherever possible.

If you have to use synchronization, keep the critical bits that must be synchronized as short as possible.

Another tip: Don’t wait for the operating system. Avoid context switching, where the OS switches between multiple applications. “Don’t create too many threads and be easy on system calls,” he said.

Also, to this end, use non-blocking I/O, busy polling (where your system polls for an event rather than waits for it), and kernel by-pass techniques such as XDP and DPDK wherever possible.

Likewise, you don’t want your application to wait for the network.

One trick Enberg pointed out was to disable congestion control (“Nagle’s Algorithm“) in the TCP stack by using the TCP_NODELAY flag.

And while you are at it, disable head-of-line blocking, which means the large processes at the head of the queue will not hold up the rest of the processes behind them.

Enberg goes on to discuss ways of hiding latency (by paralyzing the workloads and other techniques), and how to fine tune a system for maximum performance.

screenshot

His talk and other talks from the events can be viewed, with free registration, on the P99Conf site.

Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 25 years, including stints at IDG and Government Computer News. Before that, he...