Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do substreams have backpressure? #61

Open
hackergrrl opened this issue Aug 21, 2020 · 10 comments
Open

Do substreams have backpressure? #61

hackergrrl opened this issue Aug 21, 2020 · 10 comments

Comments

@hackergrrl
Copy link

I couldn't tell by looking at the source, but I wrote this and it seems like the source gets read entirely regardless of the sink's reading capacity:

const MRPC = require('muxrpc')
const pull = require('pull-stream')
const toPull = require('stream-to-pull-stream')
const net = require('net')

const manifest = {
  stuff: 'source'
}

let data = [1,2,3,4,5,6,7,8,9]
const api = {
  stuff() {
    return function (abort, cb) {
      if (!data.length) {
        console.log('source pulled: end')
        return cb(true)
      }
      const d = data.shift()
      console.log('source pulled data:', d)
      cb(null, d)
    }
  }
}

const client = MRPC(manifest, null) ()
const server = MRPC(null, manifest) (api)

net.createServer(stream => {
  stream = toPull.duplex(stream)
  pull(stream, server.createStream(), stream)
}).listen(8080)

const stream = toPull.duplex(net.connect(8080))

const onClose = () => {
  console.log('rpc closed')
}

pull(stream, client.createStream(onClose), stream)

pull(
  client.stuff(),
  function (read) {
    read(null, function next (end, data) {
      console.log('got', end, data)
      if (end) return
      setTimeout(() => {
        read(null, next)
      }, 500)
    })
  }
)

gives

source pulled data: 1
source pulled data: 2
source pulled data: 3
source pulled data: 4
source pulled data: 5
source pulled data: 6
source pulled data: 7
source pulled data: 8
source pulled data: 9
source pulled: end
source pulled: end
got undefined 1
got null 2
got null 3
got null 4
got null 5
got null 6
got null 7
got null 8
got null 9
got true undefined
@hackergrrl
Copy link
Author

@dominictarr @christianbundy Any ideas on how we might go about implementing this?

One idea is to have an ACK for each message, which would make it easy to signal backpressure, since the sink would just withhold its ACK until it's ready for more. However, it means an extra half round-trip for each stream chunk, and a change to the protocol.

@dominictarr
Copy link
Contributor

I did a rewrite on the v7 branch that was gonna implement this, but I when I deployed it it ramped up too slowly and broke stuff. I rolled back to v6 and never got around to finishing it. I think the v7 version is a big improvement too, but it needs some research into ways to detect when to change the rate of flow.

on v7 it sent "credit" default 64k, and when that drained half way it would send more. but for a high latency connection this slows it down a lot... it's kinda like a train, it has lots of momentum. if you wanna go fast, it takes longer to stop. ideally, you'd have a way of measuring a the bandwidth and latency of a connection, so that you can start going at the most sensible speed. then decrease or increase from there.

@dominictarr
Copy link
Contributor

hmm, I think that could be added to the control stream (on the 7 branch)

@dominictarr
Copy link
Contributor

oh yeah, ACK is very pull-stream style. that would work to create back pressure, but except over very low latency connections it would be VERY SLOW.

for example, I just did a speed test for (on 3g here)
Screenshot_2020-08-22 Speedtest by Ookla - The Global Broadband Speed Test

1Mbps (bits) so 125k/s (bytes) that means band width lets say a packet is 1kish, so 100ish packets per second. but ping is 0.086 seconds, or 11 per second. so if it was one ack per packet you'd only be able to send 11k. that's 10 times slower!

so sending 10 packets per ack would be better... but it's a little bit more tricky than that because you want more data to be flowing while the ack is travelling.

@dominictarr
Copy link
Contributor

of course those values are only a particular connection. we need to detect what they should be on each connection, and they might change over time too

@hackergrrl
Copy link
Author

hackergrrl commented Aug 21, 2020

Ha, so it's basically re-implementing TCP flow control.

Does this mean sbot doesn't have backpressure on its source RPCs? Doesn't this cause problems for e.g. createHistoryStream on a large dataset?

@staltz
Copy link
Member

staltz commented Aug 22, 2020

@noffle I remember @cryptix saying exactly that: createHistoryStream is problematic because you get all messages as a firehose

@clehner
Copy link
Contributor

clehner commented Aug 22, 2020

@dominictarr
Copy link
Contributor

correct it's tcp on top of tcp. legacy replication was particularily bad because it opened potentially thousands of streams, most of which don't send any data, so maintaining back pressure for all of that would add a lot of over head. on the other hand, EBT replication is over a single stream, so far less overhead.

In that thread @cryptix links to yamux... reading their "spec" they barely describe back pressure works... so it sounds similar to the naive method that v7 currently uses...

@dominictarr
Copy link
Contributor

now having some time for this to process in my subconscious I've realized that it just needs to add a ping signal in the control stream... not that hard...

@dominictarr dominictarr mentioned this issue Sep 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants