Loading...
Searching...
No Matches
Profile Taskflow Programs

Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs.

Enable Taskflow Profiler

All taskflow programs come with a lightweight profiling module to observer worker activities in every executor. To enable the profiler, set the environment variable TF_ENABLE_PROFILER to a file name in which the profiling result will be stored.

~$ TF_ENABLE_PROFILER=result.json ./my_taskflow
~$ cat result.json
[
{"executor":"0","data":[{"worker":12,"level":0,"data":[{"span":[72,117],"name":"12_0","type":"static"},{"span":[121,123],"name":"12_1","type":"static"},{"span":[123,125],"name":"12_2","type":"static"},{"span":[125,127],"name":"12_3","type":"static"}]}]}
]

When the program finishes, it generates and saves the profiling data to result.json in JavaScript Object Notation (JSON) format. You can then paste the JSON data to our web-based interface, Taskflow Profiler, to visualize the execution timelines of tasks and workers. The web interface supports the following features:

  • zoom into a selected window
  • double click to zoom back to the previously selected window
  • filter workers
  • mouse over to show the tooltip of the task
  • rank tasks in decreasing order of criticality (i.e., execution time)

TFProf implements a clustering-based algorithm to efficiently visualize tasks and their execution timelines in a browser. Without losing much visual accuracy, each clustered task indicates a group of adjacent tasks clustered by the algorithm, and you can zoom in to see these tasks.

Enable Taskflow Profiler on a HTTP Server

When profiling large taskflow programs, the method in the previous section may not work because of the limitation of processing large JSON files. For example, a taskflow program of a million tasks can produce several GBs of profiling data, and the profile may respond to your requests very slowly. To solve this problem, we have implemented a C++-based http server optimized for our profiling data. To compile the server, enable the cmake option TF_BUILD_PROFILER. You may visit Building and Installing to understand Taskflow's build environment.

# under the build directory
~$ cmake ../ -DTF_BUILD_PROFILER=ON
~$ make

After successfully compiling the server, you can find the executable at tfprof/server/tfprof. Now, generate profiling data from running a taskflow program but specify the output file with extension .tfp.

~$ TF_ENABLE_PROFILER=my_taskflow.tfp ./my_taskflow
~$ ls
my_taskflow.tfp # my_taskflow.tfp is of binary format

Launch the server program tfprof/server/tfprof and pass (1) the directory of index.html (default at tfprof/) via the option --mount and (2) the my_taskflow.tfp via the option --input.

# under the build/ directory
~$ ./tfprof/server/tfprof --mount ../tfprof/ --input my_taskflow.tfp

Now, open your favorite browser at localhost:8080 to visualize and profile your my_taskflow program.

The compiled profiler is a more powerful version than the pure JavaScript-based interface and it is able to more efficiently handle large profiling data under different queries. We currently support the following two view types:

  • Cluster: visualize the profiling data using a clustering algorithm with a limit
  • Criticality: visualize the top-limit tasks in decreasing order of their execution times

Display Profile Summary

You can display a profile summary by specifying the environment variable TF_ENABLE_PROFILER without any value. Taskflow will print a summary report to standard error for each executor created by the program.

# enable the profiler without a file path to print summary to stderr
~$ TF_ENABLE_PROFILER= ./my_taskflow_program
# Taskflow profile summary printed to stderr
================================================================================
Observer 0 | Wall: 33.00 us | Workers: 12 | Tasks: 8 | Avg Utilization: 8.33%
================================================================================
[Aggregate Task Statistics]
------------------------------------------------------------------
Type Count Total(us) Avg(us) Min(us) Max(us)
------------------------------------------------------------------
async 8 51 6.38 0 33
[Worker Utilization]
----------------------------------------------------------------------------
Worker Tasks Busy(us) Idle(us) Avg(us) Min(us) Max(us) Util%
----------------------------------------------------------------------------
6 8 33 0 4.12 0 33 100.0%
----------------------------------------------------------------------------
Total 8 33 0 8.3% (avg)
[Worker Concurrency (Y=active workers)] bin=2.00 us (wall: 0..33.00 us, 24 bins)
12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 | ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
+------------------------------------------------------------------------
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 (us)
[Task Parallelism (Y=concurrent tasks)] bin=2.00 us (wall: 0..33.00 us, 24 bins)
5 | ██ ██
4 | ██ ██
3 | ██ ██ ██ ██
2 | ██ ██ ██ ██ ██ ██ ██
1 | ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
+------------------------------------------------------------------------
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 (us)

The report consists of four sections:

  1. The overview line at the top reports the total wall-clock duration (Wall), the number of worker threads (Workers), the total number of tasks executed (Tasks), and the average worker utilization (Avg Utilization). The average utilization is the mean of each worker's busy fraction (busy time divided by wall time) across all workers in the executor, including those that ran no tasks. A value of 100% means every worker was busy for the entire execution; a low value indicates that most threads were idle and the workload did not fully exploit the available parallelism.
  2. The aggregated task statistics section reports execution statistics broken down by task type. For each type observed, it shows the number of executions (Count), total execution time (Total), average per-task time (Avg), and the shortest and longest individual task (Min, Max).
  3. The worker utilization section reports per-worker statistics. For each worker that executed at least one task, it shows the task count, total busy time, idle time (wall time minus busy time), average task duration, min/max task duration, and per-worker utilization percentage. The Total row at the bottom aggregates counts and times across all active workers, and its Util% column shows the same average utilization as the overview line. Workers that executed no tasks are omitted from this table.
  4. The concurrency histograms at the bottom visualize how parallelism evolved over wall-clock time using two complementary bar charts. The Worker Concurrency chart shows how many distinct worker threads were simultaneously active in each time bin. The Task Parallelism chart shows how many tasks were concurrently running in each bin, which can exceed the worker count when subflow nesting produces multiple active tasks on the same worker. The number of bins and the time unit are chosen automatically to fill roughly 80 characters of terminal width.