Description
openedon Nov 6, 2024
Description:
We encountered an issue where cometbft
crashes with a panic caught in fendermint
. This issue occurs because, in BeginBlock
, we attempt to resolve the CometBFT validator ID to a public key. However, when fendermint
’s data
folder is deleted and fendermint
is restarted, cometbft
attempts to start block replay but is not ready for the RPC API connection that fendermint
requires for this process.
Steps to Reproduce:
- Run both
cometbft
andfendermint
. - Wait until a few blocks have been produced.
- Stop
fendermint
and delete itsdata
folder. - Restart
fendermint
.
Observed Errors:
cometbft
Logs Before Crash:
I[2024-11-06|15:13:38.920] ABCI Replay Blocks module=consensus appHeight=0 storeHeight=5 stateHeight=5 I[2024-11-06|15:13:51.760] Applying block module=consensus height=1 E[2024-11-06|15:13:51.762] Stopping abci.socketClient for error: read message: EOF module=abci-client connection=consensus I[2024-11-06|15:13:51.762] service stop module=abci-client connection=consensus msg="Stopping socketClient service" impl=socketClient E[2024-11-06|15:13:51.762] consensus connection terminated. Did the application crash? Please restart CometBFT module=proxy err="read message: EOF"
fendermint
Panic:
2024-11-06T14:13:51.762219Z ERROR fendermint/abci/src/application.rs:212: failed to execute ABCI request: Error { msg: "HTTP error", source: "error trying to connect: tcp connect error: Connection refused (os error 61)", } thread 'tokio-runtime-worker' panicked at /Users/alexei/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-abci-0.7.0/src/v037/server.rs:145:70: called Result::unwrap() on an Err value: HTTP error
Caused by: error trying to connect: tcp connect error: Connection refused (os error 61)
Location: /Users/alexei/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10:9
Caused by: 0: HTTP error 1: error trying to connect: tcp connect error: Connection refused (os error 61) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace 2024-11-06T14:13:51.995565Z ERROR fendermint/app/src/main.rs:24: panicking stacktrace=" 0: std::backtrace_rs::backtrace::libunwind::trace\n
Cause:
The issue seems to be due to this line in validators.rs
, where fendermint
tries to resolve the validator ID to a public key by connecting to the cometbft
RPC API during BeginBlock
. If cometbft
is not fully ready (due to replay or a fresh start with deleted data), this connection fails, causing fendermint
to panic and terminate.
Metadata
Assignees
Labels
Type
Projects
Status
Backlog
Activity