This page provides guidance on how you can improve Cloud Storage FUSE performance.
Improve read and write performance
To improve read and write performance, we recommend the following:
-
Enable caching: Cloud Storage FUSE offers four optional client-side cache types that store specific types of data and metadata locally to help improve performance:
File caching: stores copies of frequently accessed files.
Stat caching: stores file metadata.
Type caching: stores file type information.
List caching: stores directory listings.
Cloud Storage FUSE caching works with any user-specified directory that's backed by your choice of storage. Cloud Storage FUSE cache performance matches underlying storage used by the cache with minimal overhead.
Accelerate reads by enabling parallel downloads: accelerate large file reads over 1 GB in size by enabling parallel downloads. For more information, see Improve read performance using parallel downloads.
Run sequential read workloads when possible: Cloud Storage FUSE performs better for sequential read workloads than random read workloads. Cloud Storage FUSE uses a heuristic to detect when a file is being read sequentially, which enables Cloud Storage FUSE to issue fewer, larger read requests to Cloud Storage using the same TCP connection.
Adjust file sizes based on read type: to optimize sequential read performance, we recommend that you upload and read files between 5 MB and 200 MB in size. To optimize random read performance, we recommend that you upload and read files around 2 MB in size.
Mount buckets with hierarchical namespace enabled: to increase read and write performance speeds and ensure atomicity for higher initial queries per second (QPS) operations, we recommend mounting buckets with hierarchical namespace enabled. To learn more about how hierarchical namespace-enabled buckets can improve Cloud Storage FUSE performance, see Mount buckets with hierarchical namespace enabled.
Improve first-time read performance
Before running your workload, we recommend that you first recursively list the files in your mounted bucket to populate the stat and type caches ahead of time and improve performance on the first run in a faster, batched method:
ls -R MOUNT_POINT > /dev/null
Use file caching to improve throughput
Cloud Storage FUSE has higher latency than a local file system. Throughput is reduced when you read or write small files one at a time, as it results in several separate API calls. Reading or writing multiple large files at a time can help increase throughput. Use the Cloud Storage FUSE file cache feature to improve performance for small and random I/Os. To learn more about file caching and how to enable the feature, see Use Cloud Storage FUSE file caching.
Mount buckets with hierarchical namespace enabled
To ensure atomicity for higher initial queries per second (QPS) operations
such as checkpointing and directory renames or changes, we recommend mounting
buckets with hierarchical namespace enabled. Hierarchical
namespace organizes your data into a hierarchical file system structure,
making operations within the bucket more efficient. List object calls
(BucketHandle.Objects
) are replaced with get folder calls,
resulting in quicker response times and fewer overall list calls for every
operation.
Increase read-ahead size to improve large read throughput
You can improve large read throughput by increasing the amount of data that's
prefetched with each read request using the
read_ahead_kb
Linux kernel parameter on your local machine. We recommend
increasing the read_ahead_kb
kernel parameter to 1 MB instead of using the
default amount of 128 KB that's set on most Linux distributions. Either sudo
or root
permissions are required to successfully increase the kernel parameter.
To increase the read_ahead_kb
kernel parameter to 1 MB for a specific
Cloud Storage FUSE mounted directory, use the following command where
/path/to/mount/point
is your Cloud Storage FUSE mount point. Your bucket must be
mounted to Cloud Storage FUSE before you run the command, otherwise, the kernel
parameter doesn't increase.
export MOUNT_POINT=/path/to/mount/point
echo 1024 | sudo tee /sys/class/bdi/0:$(stat -c "%d" $MOUNT_POINT)/read_ahead_kb
Achieve maximum throughput
To achieve maximum throughput, use a machine with enough CPU resources to drive throughput and saturate the network interface card (NIC). Insufficient CPU resources can cause Cloud Storage FUSE throttling.
If you're using Google Kubernetes Engine, increase the CPU allocation to the Cloud Storage FUSE sidecar container if your workloads need higher throughput. You can increase the resources used by the sidecar container or allocate unlimited resources.
Assess IOPS needs in queries-per-second
Filestore is a better option than Cloud Storage FUSE for workloads that require high instantaneous input/output operations per second (IOPS), also known as queries-per-second in Cloud Storage. Filestore is also the better option for very high IOPS on a single file system with lower latency.
Alternatively, you can also use the Cloud Storage FUSE file cache feature to build on the underlying cache media's performance characteristics if it provides high IOPS and low latency.
Perform load tests
For instructions on how to perform load tests on Cloud Storage FUSE, see Performance Benchmarks in the GitHub documentation.
What's next
- Read about Cloud Storage FUSE caching.
- Learn more about Cloud Storage FUSE semantics and troubleshooting in GitHub.