-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-task metrics #56320
Add per-task metrics #56320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I believe you may need to add the reporting functions to public and docs, and the PR to NEWs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I'm not too satisfied with the metrics being directly in jl_task_t
; hopefully @vtjnash has a suggestion for what to do there.
We now have to separate out spin time, wait time, compilation time, and GC time (in another PR I guess) and the task CPU time will be actual user CPU time.
It'd be helpful to add an example of using this to the original post. |
src/task.c
Outdated
@@ -1146,6 +1146,11 @@ JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, jl_value_t *completion | |||
t->ptls = NULL; | |||
t->world_age = ct->world_age; | |||
t->reentrant_timing = 0; | |||
t->is_timing_enabled = jl_atomic_load_relaxed(&jl_task_timing_enabled) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting, so once a Task is spawned, enabling/disabling the feature doesn't affect live tasks?
So i guess if we wanted to enable this e.g. in @time
, we would have to explicitly set this field on the Task struct? 🤔 I think that seems okay to me. Either you enable the global field before spawning any Task of interest, or you set the field on the Task when you want to start timing it. 👍 Coolcool.
🤔 And actually, i'm not sure if these numbers would get screwed up if you enable it while a task is running. So maybe you really do need to do this only at Task creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coooooooool! 😁 Looks great from my perspective!
I'll let everyone else finish their reviews and your responses and such, but this LGTM from my perspective. 😎
for the running task only. Other tasks report metrics that are non-continuous but updated only on task switches.
Co-authored-by: Valentin Churavy <[email protected]>
…ot." This reverts commit d23a96f.
…round GCC bug" This reverts commit 4a5495022d4f64726ddaa66590859dfc72bc22f7.
To ensure that the task metrics do not cross a cache line boundary. Also rename `cpu_time_ns` to `running_time_ns`.
5b58e04
to
788f46f
Compare
To prevent miscounting in case of exceptions/errors in task switching.
I've addressed the failure on i686 (see #56626) by following @Keno's suggestion to align the outer struct, which is Please take a final look @vtjnash and @topolarity? In particular, changes to |
CI failures seem unrelated to this PR. |
I wouldn't expect this to have a lot of overhead but it would be nice to have a simple benchmark. Of course there might be a bit of overhead when timing is turned on, but it would be interesting to see how much. |
The overhead is primarily in task switching, where the metrics are updated. Good ol' I don't see any measurable difference in performance when task metrics are off when compared to Besides
|
Close JuliaLang#47351 (builds on top of JuliaLang#48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in JuliaLang#55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: K Pamnany <[email protected]> Co-authored-by: Nathan Daly <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
* Add per-task metrics (JuliaLang#56320) Close JuliaLang#47351 (builds on top of JuliaLang#48416) Adds two per-task metrics: - running time = amount of time the task was actually running (according to our scheduler). Note: currently inclusive of GC time, but would be good to be able to separate that out (in a future PR) - wall time = amount of time between the scheduler becoming aware of this task and the task entering a terminal state (i.e. done or failed). We record running time in `wait()`, where the scheduler stops running the task as well as in `yield(t)`, `yieldto(t)` and `throwto(t)`, which bypass the scheduler. Other places where a task stops running (for `Channel`, `ReentrantLock`, `Event`, `Timer` and `Semaphore` are all implemented in terms of `wait(Condition)`, which in turn calls `wait()`. `LibuvStream` similarly calls `wait()`. This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit `wait()`). The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in `jl_task_t`. They are all atomic, except the `metrics_enabled` flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and marked `const` on the julia side. In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields. Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by `@time` (e.g. writing this same timing data to a ScopedValue like in JuliaLang#55103 but only for tasks lexically scoped to inside the `@time` block). Timings are off by default but can be turned on globally via starting Julia with `--task-metrics=yes` or calling `Base.Experimental.task_metrics(true)`. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings via `Base.Experimental.task_metrics` does not affect existing `Task`s, only new `Task`s. The other new APIs are `Base.Experimental.task_running_time_ns(::Task)` and `Base.Experimental.task_wall_time_ns(::Task)` for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are in `Base.Experimental` to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines). cc @NHDaly @kpamnany @d-netto --------- Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: K Pamnany <[email protected]> Co-authored-by: Nathan Daly <[email protected]> Co-authored-by: Valentin Churavy <[email protected]> * Address review comments --------- Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: K Pamnany <[email protected]> Co-authored-by: Nathan Daly <[email protected]> Co-authored-by: Valentin Churavy <[email protected]>
Close #47351 (builds on top of #48416)
Adds two per-task metrics:
We record running time in
wait()
, where the scheduler stops running the task as well as inyield(t)
,yieldto(t)
andthrowto(t)
, which bypass the scheduler. Other places where a task stops running (forChannel
,ReentrantLock
,Event
,Timer
andSemaphore
are all implemented in terms ofwait(Condition)
, which in turn callswait()
.LibuvStream
similarly callswait()
.This should capture everything (albeit, slightly over-counting task CPU time by including any enqueuing work done before we hit
wait()
).The various metrics counters could be a separate inlined struct if we think that's a useful abstraction, but for now i've just put them directly in
jl_task_t
. They are all atomic, except themetrics_enabled
flag itself (which we now have to check on task start/switch/done even if metrics are not enabled) which is set on task construction and markedconst
on the julia side.In future PRs we could add more per-task metrics, e.g. compilation time, GC time, allocations, potentially a wait-time breakdown (time waiting on locks, channels, in the scheduler run queue, etc.), potentially the number of yields.
Perhaps in future there could be ways to enable this on a per-thread and per-task basis. And potentially in future these same timings could be used by
@time
(e.g. writing this same timing data to a ScopedValue like in #55103 but only for tasks lexically scoped to inside the@time
block).Timings are off by default but can be turned on globally via starting Julia with
--task-metrics=yes
or callingBase.Experimental.task_metrics(true)
. Metrics are collected for all tasks created when metrics are enabled. In other words, enabling/disabling timings viaBase.Experimental.task_metrics
does not affect existingTask
s, only newTask
s.The other new APIs are
Base.Experimental.task_running_time_ns(::Task)
andBase.Experimental.task_wall_time_ns(::Task)
for retrieving the new metrics. These are safe to call on any task (including the current task, or a task running on another thread). All these are inBase.Experimental
to give us room to change up the APIs as we add more metrics in future PRs (without worrying about release timelines).cc @NHDaly @kpamnany @d-netto