-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix subscription busy wait (melodic-devel) #1608
Conversation
Hmm, this is weird. The results of the tests are quite different from what I get on my machines (I tested on several of them with bionic/melodic). Is there something special about the CI containers? Do the tests run in parallel? |
Let's run them again to see if this was a fluke: @ros-pull-request-builder retest this please
I don't think so. You can run the very same process locally: https://github.com/ros-infrastructure/ros_buildfarm/blob/master/doc/jobs/devel_jobs.rst#run-the-devel-job-locally
No. @peci1 Can you please check #1557 and comment how this patch compares to the other one. |
@peci1 Friendly ping. |
I'm sorry for the "busy wait" on my side :) I think the mechanics of the causes of bug #1545 (solved by this PR) and #1343 (solved by PR #1557) are completely different. In #1343, the problem is mismatch between clock types which sometimes result in zero-time waits in places which are supposed to break a busy-wait loop. This only affects In #1545, the busy wait is caused by returning |
@peci1 Thanks for clarifying. I will keep this PR open until there is a patch which doesn't break API/ABI and (if possible) doesn't affect performance significantly. (Unfortunately I don't have the bandwidth to comment on the approach and suggest alternatives. My time is limited to review / merge patches which are ready as is.) |
Followup in #1684 . |
* Fix a busy-wait loop in subscription queue. * Better fix for slow callbacks CPU throttling. * More versatile callback queue test. * Finished cherry-pick merge to melodic-devel. * libros: moved define * roscpp: implementet #1608 without ABI/API breaks * /test_roscpp: fake_message is in a header now... * test_roscpp: fixed sign-compare warning * stabilized test * CallbackQueue: use SteadyTime instead of WallTime to get independent of system-time changes * style only * Update clients/roscpp/include/ros/callback_queue.h Co-authored-by: Johannes Meyer <[email protected]> Co-authored-by: Martin Pecka <[email protected]> Co-authored-by: CTU base <robot@ctu-base> Co-authored-by: Martin Pecka <[email protected]> Co-authored-by: Christopher Wecht <christopher.wechtstudent.kit.edu> Co-authored-by: Dirk Thomas <[email protected]> Co-authored-by: Johannes Meyer <[email protected]>
* Better fix for slow callbacks CPU throttling. * More versatile callback queue test. * Finished cherry-pick merge to melodic-devel. * libros: moved define * roscpp: implementet ros#1608 without ABI/API breaks * /test_roscpp: fake_message is in a header now... * test_roscpp: fixed sign-compare warning * stabilized test * CallbackQueue: use SteadyTime instead of WallTime to get independent of system-time changes * style only * Update clients/roscpp/include/ros/callback_queue.h Co-authored-by: Johannes Meyer <[email protected]> Co-authored-by: Martin Pecka <[email protected]> Co-authored-by: CTU base <robot@ctu-base> Co-authored-by: Martin Pecka <[email protected]> Co-authored-by: Christopher Wecht <christopher.wechtstudent.kit.edu> Co-authored-by: Dirk Thomas <[email protected]> Co-authored-by: Johannes Meyer <[email protected]>
PR showing a solution for bug #1545 .
This is not yet meant to be merged - treat it like I'm offering a solution open for comments.
This PR consists of two commits.
The first one solves the issue in a backwards-compatible way, however I've noticed that it's performance when calling CallbackQueue::addCallback rapidly drops compared to upstream (to thousands/sec compared to hundreds of thousands/sec).
The second one takes it further, breaking ABI and API. This solution doesn't suffer from the performance issues, or at least not as much (there is some mutex locking added, so it is probably a bit slower, but taking into account that the upstream now fully throttles the CPU...)
What should the tests show: calling callOne() too often (as in upstream) is quite expensive, and since upstream enters a busy-wait-loop around callOne, it throttles 10 cores to maximum. The fixed versions should be much more friendly to your CPU, which is shown by the fact that it does not call excessive numbers of callOne (roughly 2-3 times the number of subscription callbacks pushed to the queue).
Another interesting metric showed to be the number of subscription callbacks that it's possible to push in the queue in the given time-frame. As I stated earlier, the version from first commit has a very strong drop in this metric. It might be interesting to test it with some real pusblisher/subscriber system, but I haven't yet had time to test it... The numbers are in the logs below. In the "callbacks out of xxx" ,
xxx
stands for the number of subscription callbacks pushed to the queue.Upstream version test log:
This PR test log: