Use a buffer pool for fs.ReadStream to boost performance? #14136

venkatperi · 2017-07-08T17:44:53Z

Version:
Platform:
Subsystem:

REQUEST: Would you consider moving to a pool scheme since it'll boost performance without changing the API?

fs.ReadStream allocates 64KiB (default, change via highWaterMark) sized Buffer objects as it reads through files which are released downstream when no longer needed. With larger files, this can result in a lot of time spent allocating Buffer objects only to have them picked up in GC, which in by itself can add up to a lot.

Reducing allocations with a buffer pool should boost performance across the board with respect to file size without foreseeable changes to the public API.

See post for more details.

Performance vs File Size

Higher is better

Percentage Boost in Performance

Average 67% improvement across all sizes

The text was updated successfully, but these errors were encountered:

mscdex · 2017-07-08T18:32:52Z

I'm not sure what the question/request is here, but you can pass your own highWaterMark to fs.createReadStream(), which controls the size of allocated Buffer pools for fs.ReadStream.

venkatperi · 2017-07-08T19:01:58Z

The issue isn't the size of the buffer (yes, its size can be changed by setting the highWaterMark) , rather the cost of allocation/release of buffers.

TimothyGu · 2017-07-10T08:36:21Z

Unfortunately, while this test may yield promising results in an in vitro setting, actual adoption of pooling is much more difficult. It would require, on the consumer's part, to inform the core when the consumer is done with a specific buffer. This not only detracts from the JS paradigm of relying on GC, but can also reduce the overall security of the platform, as refcounting can be very difficult to get right in more complex projects. Plus, from the perspective of compatibility, it would require an opt-in, which diminishes gains in actual deployment.

To demonstrate how easily it is to get refcounting wrong, the post has:

Writable.prototype.write = function ( chunk, encoding, cb ) {
  ...
  if ( !(chunk && typeof chunk._unref === 'function') ) {
    return this._unref();
  }
  ...
 }

This simply does not work, as many asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed. There have in fact been issues filed against Node.js (that I am unable to locate right now) about this behavior, but it is resolved that the status quo provides the best performance and is thus kept.

jorangreef · 2017-07-21T13:03:08Z

I think this issue was prematurely closed and should be reopened. Node would benefit from more discussion on the subject.

This [...] detracts from the JS paradigm of relying on GC

The question of GC vs static allocation is not a binary decision, and there are ways to design APIs which avoid the problems @TimothyGu describes. Javascript itself does not force anyone to "rely on GC" and Node already provides for static allocation when it comes to file system read() and write() operations. If anything, the long-term trend in Node's API design has been more and more towards providing APIs which accept not just buffers, but also offsets into buffers to avoid pressuring the GC through needless allocations.

If some aspects of Node's APIs currently do not support pulling data into a user-provided buffer (which would facilitate static allocation), then perhaps this is more an indication of a problem in the design of the API in question, then with the principle of static allocation.

One can definitely write Javascript to use static allocation for the heavy "data plane" work, and to use GC for the light "control plane" work, and I think Node needs to move in this direction to remain relevant. It doesn't have to be xor.

Encouraging users to rely on GC for data plane work makes no sense from a throughput point of view, and is going to lead to more issues relating to memory fragmentation and allocator choices (see #6078, #12805 and #8871 for some recent examples).

jorangreef · 2017-07-21T13:12:04Z

See #6923 for raw numbers to convince.

TimothyGu · 2017-07-21T14:28:46Z

@jorangreef My objections were based on OP's approach for buffer pulling. I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue) – which I agree can be very helpful for throughout when done right – and if you can offer some implementable proposals it would be even better. It would also offer a path to reconcile with Web Streams API's BYOB (bring your own buffer) readers in the future, if that happens.

TimothyGu · 2017-07-21T14:34:09Z

@jorangreef I just reopened #3403.

jorangreef · 2017-07-21T14:44:09Z

I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue)
...
@jorangreef I just reopened #3403.

Thanks very much @TimothyGu , #3403 is the issue for pulling into user-controlled buffers, appreciate your reopening that.

asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed

I agree with you on this.

vsemozhetbyt added buffer Issues and PRs related to the buffer subsystem. fs Issues and PRs related to the fs subsystem / file system. labels Jul 8, 2017

TimothyGu closed this as completed Jul 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a buffer pool for fs.ReadStream to boost performance? #14136

Use a buffer pool for fs.ReadStream to boost performance? #14136

venkatperi commented Jul 8, 2017 •

edited

Loading

mscdex commented Jul 8, 2017

venkatperi commented Jul 8, 2017

TimothyGu commented Jul 10, 2017 •

edited

Loading

jorangreef commented Jul 21, 2017 •

edited

Loading

jorangreef commented Jul 21, 2017

TimothyGu commented Jul 21, 2017

TimothyGu commented Jul 21, 2017

jorangreef commented Jul 21, 2017

Use a buffer pool for fs.ReadStream to boost performance? #14136

Use a buffer pool for fs.ReadStream to boost performance? #14136

Comments

venkatperi commented Jul 8, 2017 • edited Loading

Performance vs File Size

Percentage Boost in Performance

mscdex commented Jul 8, 2017

venkatperi commented Jul 8, 2017

TimothyGu commented Jul 10, 2017 • edited Loading

jorangreef commented Jul 21, 2017 • edited Loading

jorangreef commented Jul 21, 2017

TimothyGu commented Jul 21, 2017

TimothyGu commented Jul 21, 2017

jorangreef commented Jul 21, 2017

venkatperi commented Jul 8, 2017 •

edited

Loading

TimothyGu commented Jul 10, 2017 •

edited

Loading

jorangreef commented Jul 21, 2017 •

edited

Loading