Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High transaction volumes causing problems for miners #2333

Closed
sammy007 opened this issue May 1, 2017 · 47 comments
Closed

High transaction volumes causing problems for miners #2333

sammy007 opened this issue May 1, 2017 · 47 comments
Labels
A-block-creation Area: Block creation and submission by miners D-economics Design issue: Economics I-dos Problems and improvements with respect to Denial-of-Service. I-performance Problems and improvements with respect to performance M-user-support User support issue tx propagation/mining latency
Milestone

Comments

@sammy007
Copy link

sammy007 commented May 1, 2017

There is tx spam ongoing, I see many blocks with up to 2 MB size with just ~50 tx. RPC getblocktemplate poll leads to timeouts (I have 5s timeout). This is not ok and I believe getblocktemplate must take less than 5s on decent hardware Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz. Load average is 1.5.

You seems don't care about it, dev channel is silent and only cosmetic changes discussed there so far. It seems I must create a patch to ignore transactions at all and mine empty blocks if you are going to pretend that it's fine.

Consequences:

  • Mining centralisation
  • Empty block mining
  • Tons of orphans (uncle bs proposal can't help)
  • Tons of rejects
  • Fee market
  • Thousands of transactions hanging out in mempool
  • Full nodes shortage
  • Lazy mining leads to forks
@sammy007
Copy link
Author

sammy007 commented May 1, 2017

Mining 1-2 MB blocks in ZCash makes no economic sense at all, bitcoin network pays several thousands dollars in fees for 1 MB blocks.

If it takes more than 5 seconds to build new block template I encourage miners to mine empty blocks instead. So please either optimise it or remove 20% tax if you are not capable to deploy fix or raise fees to some decent level like 1000x from current price via hard fork.

@sammy007
Copy link
Author

sammy007 commented May 1, 2017

New block to mine on <skipped> at height 106477 with 57 tx, 10.00000000 reward, 0.01994709 fees (25.5959ms, 3.260499057s)

The former is a time spent on building merkle tree on a pool side, the latter is how much time spent to call getblocktemplate via loopback interface: 3.26 seconds.

It takes 3.26 seconds to produce block template with just 57 transactions in zcashd on decent hardware. Up to 3% of block time.

@a1binos
Copy link

a1binos commented May 1, 2017

+1 . I notice pool freezing, because of transaction spam.

@sammy007
Copy link
Author

sammy007 commented May 1, 2017

getmempoolinfo

{
  "size": 305,
  "bytes": 11080568,
  "usage": 23548336
}

I am working on a patch to completely ignore mempool for now, will mine empty blocks.

@zapv0lt
Copy link

zapv0lt commented May 1, 2017

I'm seeing similar issues with slow getblocktemplate today. It hasn't been reported above, but I'm also seeing extremely large RSS after running a node for several weeks. After some digging, I think this could be related: bitcoin/bitcoin#6876.

Leaving aside the issue of whether or not this is actually intentional transaction spam, I think it's fair to expect getblocktemplate to handle these worst-case situations without become extremely slow.

@gobitfly
Copy link

gobitfly commented May 1, 2017

I can confirm the issue as well. Those large spam blocks cause huge CPU usage and even crashed some of my nodes on lower-spec machines.

@jackgavigan
Copy link
Contributor

jackgavigan commented May 1, 2017

Looks like it started around block #106077 (0000000030728e6b5b5484473812dd3c8ff4a857693fa1a48b457bfd72661c17) at 19:35:35 GMT yesterday (Sunday 30th Apr).

Looking at a single sample, an amount of ZEC is unshielded (e.g. txn c18cd332d4e09b9e53a4a9e23d32206d997aa7ece9d7bd41872d1b6fd2bbca79 unshielded 82.74963511 ZEC into t1QMCGKi92v85Jmar2PD3SjQF8Eadbd2y4q), then sent in a single transaction with 100s of small outputs (e.g. t1QMCGKi92v85Jmar2PD3SjQF8Eadbd2y4q sent 82.74963511 ZEC to 1869 outputs - tx c00782ba18f2ae70b60b983fb273c2160020a10e4cab27db39e8a5c59320071c), which then seem to get (partially?) re-aggregated into other addresses.

Some of the addresses involved t1VoqSxHh1ZnWBRFyHh6diWM4whimFzhjcJ have been around for a while (and have been part of similar patterns of transactions in the past).

@jackgavigan jackgavigan changed the title Ongoing transaction spam High transaction volumes causing problems for miners May 1, 2017
@daira
Copy link
Contributor

daira commented May 1, 2017

@sammy007 wore:

[...] if you are going to pretend that it's fine.
[...] or raise fees to some decent level like 1000x from current price via hard fork.

We are not "pretending that it's fine". This started at 9:08 UTC which was only 2 hours before you filed the ticket; please give us time to analyse the issue thoroughly. There's not going to be any knee-jerk reaction where we raise fees (at all, let alone by 1000-fold) unless we have strong evidence that it is required. Nor would raising fees require a hard fork, if it were required.

@daira daira added I-dos Problems and improvements with respect to Denial-of-Service. A-pow Area: Proof-of-Work and mining tx propagation/mining latency D-economics Design issue: Economics labels May 1, 2017
@sammy007
Copy link
Author

sammy007 commented May 1, 2017

Spam started more than 12 hours ago. And this is not my duty to watch network state.
Provide an option to ignore txes of particular size or an option to require higher fees for nodes.
GBT uses mempool, we need such txes to never reach it.

@daira daira added the I-performance Problems and improvements with respect to performance label May 1, 2017
@daira
Copy link
Contributor

daira commented May 1, 2017

Indeed the large blocks do seem to have started around block 106077, which was at 2017-04-30 19:35:35 UTC, which is just under 20 hours ago. I stand corrected.

@jackgavigan
Copy link
Contributor

@daira: My bad. It looks like the activity comes in waves.

@miningpoolhub
Copy link

This is quite critical problem. Pool is suffering.

@str4d
Copy link
Contributor

str4d commented May 2, 2017

We've been looking into this for the last ten hours or so. Here's what we've established:

  • The transactions appear to be the result of a miner collecting six month's worth of mining rewards into a single address, most likely via a script started over the weekend.
  • The script appears to use batching: around 300 transactions are created at a time containing ~250 inputs each (for a size of around 37070 bytes each), and the next batch is not created until the previous batch has been completely removed from the mempool.
  • The mining network is clearing these batches from the mempool in about six blocks (as expected given the transaction sizes and block size limit).
  • Per-transaction-input validation time is around 0.2-0.4ms for an up-to-date node, or 0.6-0.9ms for a node syncing the block chain at the same time (both measured on my Core i7-6600U laptop).
  • getblocktemplate latency appears to be affected by the mempool load, but we are still investigating exactly how.

We are still in the middle of investigating several parts of the code, and will post another update later on.

@str4d
Copy link
Contributor

str4d commented May 2, 2017

Here is the log output for getblocktemplate with -debug=bench after applying this patch to master, while there were 255 of these transactions in my local mempool:

 - getblocktemplate called
   - Preamble: 0.01ms
   - Longpoll: 0.14ms
 keypool reserve 2
   - CreateNewBlock called
     - Preamble: 0.02ms
     - Locked preamble: 0.23ms
     - Mempool access: 3310.92ms
       - Prev tx access: 3291.40ms (0.05ms x 61769)
       - Coins access: 6.29ms (0.00ms x 61769)
     - Block filling: 6536.69ms
       - Check inputs: 6480.59ms (104.53ms x 62)
       - Update coins: 38.83ms (0.63ms x 62)
 CreateNewBlock(): total size 1985159
     - Block creation: 0.27ms
       - Connect 63 transactions: 445.62ms (7.073ms/tx, 0.033ms/txin) [1.26s]
     - Verify 13326 txins: 1067.32ms (0.080ms/txin) [5.09s]
     - Block validation: 1385.73ms
     - Total time: 11548.25ms
 keypool return 2
   - Block update: 11601.63ms
   - Ret prep txs: 21.15ms
   - Ret prep aux: 3.60ms
   - Total time: 11626.59ms

The bottlenecks are:

  • Prev tx access: CCoinsViewCache::HaveCoins(txin.prevout.hash)
  • Check inputs: ContextualCheckInputs(tx)
  • Block validation: TestBlockValidity(block)

@str4d
Copy link
Contributor

str4d commented May 2, 2017

As the primary bottleneck is transparent signature validation, I've opened #2335 to switch to libsecp256k1 for validation, which should be "anywhere between 2.5 and 5.5 times faster" per bitcoin/bitcoin#6954.

@sammy007 @a1binos @zapv0lt @gobitfly @miningpoolhub please test this PR in your infrastructure and let me know how much latency this removes compared to the current release (either overall, or using the benchmark patch I linked in my last comment).

@sammy007
Copy link
Author

sammy007 commented May 2, 2017

Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz str4d:2333-validation-speed

1.60983597s
419.145133ms
847.123985ms
22.432544ms
1.482908816s

Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz v1.0.8-1 -TestBlockValidity

3.808361534s
2.762839296s
4.369284388s
3.075969ms
4.253306917s

@Xandrah
Copy link

Xandrah commented May 2, 2017

This is getting pretty severe now, zcashd becomes unresponsive and takes 100% CPU on 4 cores. My pool is not functioning correctly because zcashd is freezing, you cannot even do zcash-cli getinfo without it hanging for 10 seconds.

the process is now back to 100% cpu usage, even going as high as 400% an error in the console window is [POOL] [ZCASH] (Thread 1) Could not parse rpc data from daemon instance 0
Request Data: {"method":{getblocksubsidy",params":[],"id":1493752719588
Request Data: Work queue depth exceeded

Check top, and zcashd is taking 100% CPU Usage. 100% per thread at 400% CPU

ZCASHD is on ubuntu server 64bit with 48GB ram allocated to the virtual machine.

ZCASHD debug.log spammed, with new blocks, when this is happening cpu usage spikes to 100% per CPU core.

2017-05-02 21:14:54 CreateNewBlock(): total size 1982104
2017-05-02 21:14:56 keypool return 5
2017-05-02 21:14:57 keypool reserve 5
2017-05-02 21:15:00 CreateNewBlock(): total size 1982104
2017-05-02 21:15:01 keypool return 5
2017-05-02 21:15:03 keypool reserve 5
2017-05-02 21:15:06 CreateNewBlock(): total size 1982104
2017-05-02 21:15:08 keypool return 5
2017-05-02 21:15:09 keypool reserve 5
2017-05-02 21:15:12 CreateNewBlock(): total size 1982104
2017-05-02 21:15:13 keypool return 5
2017-05-02 21:15:15 keypool reserve 5
2017-05-02 21:15:18 CreateNewBlock(): total size 1982104
2017-05-02 21:15:19 keypool return 5
2017-05-02 21:15:21 keypool reserve 5
2017-05-02 21:15:24 CreateNewBlock(): total size 1982104
2017-05-02 21:15:25 keypool return 5
2017-05-02 21:15:27 keypool reserve 5
2017-05-02 21:15:29 CreateNewBlock(): total size 1982104
2017-05-02 21:15:31 keypool return 5
2017-05-02 21:15:33 keypool reserve 5
2017-05-02 21:15:36 CreateNewBlock(): total size 1982104
2017-05-02 21:15:37 keypool return 5

@str4d
Copy link
Contributor

str4d commented May 3, 2017

@Xandrah per our chat in #zcash on Rocket.Chat, the reason for the lock-up (and likely for the ballooning memory usage) is because getblocktemplate is being called every few seconds, but each call is taking a few seconds to return. getblocktemplate holds the global lock cs_main, so while it is running, other calls (including subsequent getblocktemplate calls) will back up, eventually exceeding the work queue depth and causing the error you saw. You can increase the work queue depth from its default of 16 with -rpcworkqueue=##, but getblocktemplate needs to be called less frequently if it is taking longer to respond.

@WaveringAna
Copy link

Got about the same results as Sammy with the pr.

@Kukarism
Copy link

Kukarism commented May 3, 2017

If the "activity" seems to be "in waves", i'd suggest that this is f2pool processing dayly payments. Its hashrate grew up noticebly the last few days. Unlike flypool - its the only matching in hash power opponent, which processes just-in-time on-the-fly micropayments when miners reach their "threshold" - f2pool makes all the payments (hundreds of tons of them, eventually) just 1 time per day, nearly the morning of China time.
Suggestion: let f2pool implement "threshold" payments distributed equally (in common) during all the day (like Flypool), or somehow let it rearrange miners on several groups which get their coins on some different predefined time patterns.

@zookozcash
Copy link

Red flag: Nathan mistakenly thought that the Incident Response team had required this ticket to be prioritised for 1.0.9. In fact the opposite was true, the Incident Response team (me as Executive Coordinator) had decided that we would take steps to fix the problem in the short term explicitly under the assumption that this would not imply that this ticket would be prioritised for 1.0.9.

@zookozcash
Copy link

The next step on this issue is diagnosis, experimentation, and benchmarking. Why does this issue not happen in Bitcoin?

@jameshilliard
Copy link

The next step on this issue is diagnosis, experimentation, and benchmarking. Why does this issue not happen in Bitcoin?

For some historical context 0.11.x bitcoind had major performance issues(it couldn't properly handle mining/propagation of 1MB blocks without the relay network and had some pretty severe performance degradation with mempools larger than a few megabytes) these were fairly well known issues at the time, and these issues were largely resolved with the 0.12 bitcoin core release(diagnosis, experimentation, benchmarking and of course writing the code to resolve them was done).

When I was operating a pool with 0.11.x I had to do a lot of mostly manual mempool monitoring and limiting to prevent GBT call times from being excessive. The performance issues were not due to a single cause but multiple factors and required major rewrites of a number of internal bitcoin core components to resolve.

Since zcash is a fork of 0.11.2 I would expect it to have those same issues bitcoin had back then.

You may want to rebase on top of bitcoin 0.14.1 since it has many performance improvements since 0.11.2.

@nathan-at-least
Copy link
Contributor

@jameshilliard Thanks for the context! Looks like we have our work cut out for us.

@nathan-at-least nathan-at-least added this to the 1.0.10 milestone May 31, 2017
zkbot added a commit that referenced this issue Jun 17, 2017
Switch to libsecp256k1-based validation for ECDSA

Cherry-picked from the following upstream PRs:

- bitcoin/bitcoin#6983
- bitcoin/bitcoin#6954

Part of #2333.
zkbot added a commit that referenced this issue Jun 21, 2017
Add ability for node to reject tx from mempool by number of tx inputs

Implement short-term solution described in #2343 so that users can respond promptly to critical short-term problems caused by quadratic validation scaling, such as the getblocktemplate latency, block propagation latency, and mempool size inflation issues described in #2333.
@daira
Copy link
Contributor

daira commented Jun 22, 2017

#2342, adding the -mempooltxinputlimit config option to limit the number of transparent inputs in transactions, is included in 1.0.10. Updating the milestone for remaining issues.

@daira
Copy link
Contributor

daira commented Nov 13, 2018

The original problem was mitigated by adding -mempooltxinputlimit, and then solved by fixing the quadratic hashing problem at Overwinter activation (which also removed -mempooltxinputlimit). If there are other tx validation performance improvements that could be pulled in from Bitcoin, that should be a separate ticket.

@daira daira closed this as completed Nov 13, 2018
@daira
Copy link
Contributor

daira commented Nov 13, 2018

Note, incidentally, that we solved this without raising fees (denominated in ZEC) at all.

@daira daira added this to the v1.1.0 milestone Nov 13, 2018
@str4d str4d added A-block-creation Area: Block creation and submission by miners and removed A-pow Area: Proof-of-Work and mining labels Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-block-creation Area: Block creation and submission by miners D-economics Design issue: Economics I-dos Problems and improvements with respect to Denial-of-Service. I-performance Problems and improvements with respect to performance M-user-support User support issue tx propagation/mining latency
Projects
None yet
Development

No branches or pull requests