-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a buffer pool for fs.ReadStream to boost performance? #14136
Comments
I'm not sure what the question/request is here, but you can pass your own |
The issue isn't the size of the buffer (yes, its size can be changed by setting the |
Unfortunately, while this test may yield promising results in an in vitro setting, actual adoption of pooling is much more difficult. It would require, on the consumer's part, to inform the core when the consumer is done with a specific buffer. This not only detracts from the JS paradigm of relying on GC, but can also reduce the overall security of the platform, as refcounting can be very difficult to get right in more complex projects. Plus, from the perspective of compatibility, it would require an opt-in, which diminishes gains in actual deployment. To demonstrate how easily it is to get refcounting wrong, the post has: Writable.prototype.write = function ( chunk, encoding, cb ) {
...
if ( !(chunk && typeof chunk._unref === 'function') ) {
return this._unref();
}
...
} This simply does not work, as many asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed. There have in fact been issues filed against Node.js (that I am unable to locate right now) about this behavior, but it is resolved that the status quo provides the best performance and is thus kept. |
I think this issue was prematurely closed and should be reopened. Node would benefit from more discussion on the subject.
The question of GC vs static allocation is not a binary decision, and there are ways to design APIs which avoid the problems @TimothyGu describes. Javascript itself does not force anyone to "rely on GC" and Node already provides for static allocation when it comes to file system If some aspects of Node's APIs currently do not support pulling data into a user-provided buffer (which would facilitate static allocation), then perhaps this is more an indication of a problem in the design of the API in question, then with the principle of static allocation. One can definitely write Javascript to use static allocation for the heavy "data plane" work, and to use GC for the light "control plane" work, and I think Node needs to move in this direction to remain relevant. It doesn't have to be xor. Encouraging users to rely on GC for data plane work makes no sense from a throughput point of view, and is going to lead to more issues relating to memory fragmentation and allocator choices (see #6078, #12805 and #8871 for some recent examples). |
See #6923 for raw numbers to convince. |
@jorangreef My objections were based on OP's approach for buffer pulling. I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue) – which I agree can be very helpful for throughout when done right – and if you can offer some implementable proposals it would be even better. It would also offer a path to reconcile with Web Streams API's BYOB (bring your own buffer) readers in the future, if that happens. |
@jorangreef I just reopened #3403. |
Thanks very much @TimothyGu , #3403 is the issue for pulling into user-controlled buffers, appreciate your reopening that.
I agree with you on this. |
REQUEST: Would you consider moving to a pool scheme since it'll boost performance without changing the API?
fs.ReadStream
allocates64KiB
(default, change viahighWaterMark
) sized Buffer objects as it reads through files which are released downstream when no longer needed. With larger files, this can result in a lot of time spent allocatingBuffer
objects only to have them picked up in GC, which in by itself can add up to a lot.Reducing allocations with a buffer pool should boost performance across the board with respect to file size without foreseeable changes to the public API.
See post for more details.
Performance vs File Size
Higher is better
Percentage Boost in Performance
Average 67% improvement across all sizes
The text was updated successfully, but these errors were encountered: