Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: network connection tracepoints #25832

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

0xB10C
Copy link
Contributor

@0xB10C 0xB10C commented Aug 12, 2022

This adds five new tracepoints with documentation and tests for network connections:

  • established connections with net:inbound_connection and net:outbound_connection
  • closed connections (both closed by us or by the peer) with net:closed_connnection
  • inbound connections that we choose to evict with net:evicted_inbound_connection
  • connections that are misbehaving and punished with net:misbehaving_connection

I've been using these tracepoints for a few months now to monitor connection lifetimes, re-connection frequency by IP and netgroup, misbehavior, peer discouragement, and eviction and more. Together with the two existing P2P message tracepoints they allow for a good overview of local P2P network activity. Also sort-of addresses #22006 (comment).

I've been back and forth on which arguments to include. For example, net:evicted_connection could also include some of the eviction metrics like e.g. last_block_time, min_ping_time, ... but I've left them out for now. If wanted, this can be added here or in a follow-up. I've tried to minimize a potential performance impact by measuring executed instructions with gdb where possible (method described here). I don't think a few hundred extra instructions are too crucial, as connection opens/closes aren't too frequent (compared to e.g. P2P messages). Note: e.g. CreateNodeFromAcceptedSocket() usually executes between 80k and 90k instructions for each new inbound connection.

tracepoint instructions
net:inbound_connection 390 ins
net:outbound_connection between 700 and 1000 ins
net:closed_connnection 473 ins
net:evicted_inbound_connection not measured; likely similar to net:closed_connnection
net:misbehaving_connection not measured

Also added a bpftrace (tested with v0.14.1) log_p2p_connections.bt example script that produces output similar to:

Attaching 6 probes...
Logging opened, closed, misbehaving, and evicted P2P connections
OUTBOUND conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, total_out=1
INBOUND conn from 127.0.0.1:45324: id=1, type=inbound, network=0, total_in=1
MISBEHAVING conn id=1, message='getdata message size = 50001'
CLOSED conn to 127.0.0.1:15287: id=0, type=block-relay-only, network=0, established=1231006505
EVICTED conn to 127.0.0.1:45324: id=1, type=inbound, network=0, established=1612312312

@jb55
Copy link
Contributor

jb55 commented Aug 12, 2022 via email

kouloumos added a commit to kouloumos/bitcoin that referenced this pull request Aug 17, 2022
@russeree
Copy link
Contributor

russeree commented Aug 18, 2022

Just reporting in that everything is functional. Console log looks correct.

Tests Passed
image

0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
0xB10C added a commit to 0xB10C/peer-observer that referenced this pull request Sep 12, 2022
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 13, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/25832.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild
Concept ACK dergoegge, i-am-yuvi
Stale ACK jb55, virtu, maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30381 ([WIP] net: return result from addnode RPC by willcl-ark)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 16, 2022

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

@0xB10C
Copy link
Contributor Author

0xB10C commented Jan 20, 2023

Note to self: might be worth checking if we can also pass AS information and not only the net_group.

This is possible by accessing the CNodeStats, but don't think we should do that in the tracepoint. We have the IP and can lookup the AS in a tracing scirpt, if necessary.


Fixed up the linter complains and rebased. This is ready for further review.

@dergoegge
Copy link
Member

Concept ACK

@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from 8528781 to ac438b1 Compare March 21, 2023 09:16
@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from ac438b1 to 6de4fc7 Compare March 21, 2023 11:36
@0xB10C
Copy link
Contributor Author

0xB10C commented Mar 21, 2023

The linter started complaining about "E275 missing whitespace after keyword". Fixed this up in the individual commits.

ac438b1 -> 6de4fc7 (2022-05-connection-tracepoints.pr.1 -> 2022-05-connection-tracepoints.pr.2; compare)

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@virtu
Copy link
Contributor

virtu commented Mar 24, 2023

ACK 6de4fc7

Code reviewed tracepoints, tests, and demo script. Tracepoint placement looks good, so do tests. Successfully ran functional tests and the demo script.

Some points:

  • Any reason why only inbound_connection and outbound_connection explicitly cast all arguments in the demo script?
  • I think some of the format specifiers in the demo script are lightly off, leading to outputs with negative netgroup values on my system. I believe it should be %lld for int64 and %llu for uint64.
  • As brought up previously, it would be nice to switch from nKeyedNetGroup to netgroup ids that are consistent over time

@DrahtBot DrahtBot requested a review from jb55 March 24, 2023 13:44
@maflcko
Copy link
Member

maflcko commented Sep 4, 2024

re-ACK 5b59cfc 🌞

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 5b59cfc944f9eac73e1e69bcc660280b6809724a 🌞
+fwdnyZ2go0FfCZLGQhHmGEVETFdRHYk006fB55POSqqbbyiX3tXZFbuDFztmXiPE57ThJyQ/JJcQ8s5WOuHCA==

@i-am-yuvi
Copy link

tested ACK 🔥

Successfully ran demo scripts and functional tests!!

I have been using these tracepoints for 3-4 months from now and I find it very useful. They help you keep track of your peers, also I am working on a tool that detects spy peers and for that these tracepoints are very useful.

Great work @0xB10C.

@DrahtBot DrahtBot mentioned this pull request Sep 27, 2024
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 5b59cfc

doc/tracing.md Outdated Show resolved Hide resolved
int trace_inbound_connection(struct pt_regs *ctx) {
struct NewConnection inbound = {};
bpf_usdt_readarg(1, ctx, &inbound.conn.id);
bpf_usdt_readarg_p(2, ctx, &inbound.conn.addr, MAX_PEER_ADDR_LENGTH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the doc of bpf_usdt_readarg_p(). What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess. But would it add a trailing '\0'? That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating '\0'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it do if the argument is longer than the supplied buffer? I presume the last arg is the length of the buffer, so it is not going to overflow, I guess.

Correct, it would cut it off.

But would it add a trailing '\0'?

No. However there is bpf_probe_read_user_str() which does that. This generally seems to be the better suited methodology here. I'll use that for the changes in this PR. There are other places in our code we can update in a potential follow up.

That is - could it copy MAX_PEER_ADDR_LENGTH from the input to the output and leave it like that? If yes, would that be a problem - a string without a terminating '\0'?

I think, for us, in this test here, this isn't a problem. The Python ctype mapping for Connection defined below has the same size for an address (("addr", ctypes.c_char * MAX_PEER_ADDR_LENGTH)).

doc/tracing.md Outdated Show resolved Hide resolved
doc/tracing.md Show resolved Hide resolved
src/net_processing.cpp Show resolved Hide resolved
test/functional/interface_usdt_net.py Outdated Show resolved Hide resolved
test/functional/interface_usdt_net.py Outdated Show resolved Hide resolved
contrib/tracing/log_p2p_connections.bt Outdated Show resolved Hide resolved
@0xB10C
Copy link
Contributor Author

0xB10C commented Nov 11, 2024

Thanks for the comments @vasild. I plan to address your comments when I rebase on #26593.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/29613964632

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

0xB10C added a commit to 0xB10C/bitcoin that referenced this pull request Nov 13, 2024
A v3 onion address with a `:` and a five digit port has a length of
68 chars. As noted in
bitcoin#25832 (comment)
peers e.g. added via hostname might have a longer CNode::m_addr_name.
These might be cut off in tracing tools.
A v3 onion address with a `:` and a five digit port has a length of
68 chars. As noted in
bitcoin#25832 (comment)
peers e.g. added via hostname might have a longer CNode::m_addr_name.
These might be cut off in tracing tools.
@0xB10C 0xB10C force-pushed the 2022-05-connection-tracepoints branch from 5b59cfc to 15dfd72 Compare November 13, 2024 14:51
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/32930453858

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

HiHat added a commit to HiHat/pocketnet.core that referenced this pull request Dec 16, 2024
Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 15dfd72 (modulo to lint failure)

Comment on lines +119 to +127
uint64_t conn_type_pointer;
uint64_t address_pointer;
bpf_usdt_readarg(1, ctx, &inbound.conn.id);
bpf_usdt_readarg(2, ctx, &address_pointer);
bpf_usdt_readarg(3, ctx, &conn_type_pointer);
bpf_usdt_readarg(4, ctx, &inbound.conn.network);
bpf_usdt_readarg(5, ctx, &inbound.existing);
bpf_probe_read_user_str(&inbound.conn.addr, sizeof(inbound.conn.addr), (void *) address_pointer);
bpf_probe_read_user_str(&inbound.conn.type, sizeof(inbound.conn.type), (void *) conn_type_pointer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocker comment: this cast from uint64_t to (void*) looks odd, maybe it will produce compiler warning in some cases? Also, on 32 bit systems that should be uint32_t. Can these variables be declared as void* instead? Like:

Suggested change
uint64_t conn_type_pointer;
uint64_t address_pointer;
bpf_usdt_readarg(1, ctx, &inbound.conn.id);
bpf_usdt_readarg(2, ctx, &address_pointer);
bpf_usdt_readarg(3, ctx, &conn_type_pointer);
bpf_usdt_readarg(4, ctx, &inbound.conn.network);
bpf_usdt_readarg(5, ctx, &inbound.existing);
bpf_probe_read_user_str(&inbound.conn.addr, sizeof(inbound.conn.addr), (void *) address_pointer);
bpf_probe_read_user_str(&inbound.conn.type, sizeof(inbound.conn.type), (void *) conn_type_pointer);
void* conn_type_pointer;
void* address_pointer;
bpf_usdt_readarg(1, ctx, &inbound.conn.id);
bpf_usdt_readarg(2, ctx, &address_pointer);
bpf_usdt_readarg(3, ctx, &conn_type_pointer);
bpf_usdt_readarg(4, ctx, &inbound.conn.network);
bpf_usdt_readarg(5, ctx, &inbound.existing);
bpf_probe_read_user_str(&inbound.conn.addr, sizeof(inbound.conn.addr), address_pointer);
bpf_probe_read_user_str(&inbound.conn.type, sizeof(inbound.conn.type), conn_type_pointer);

@DrahtBot DrahtBot requested a review from maflcko January 7, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.