Skip to content

Commit 1c86f17

Browse files
updated README.md for subflow
1 parent 4ac3518 commit 1c86f17

15 files changed

+2933
-26
lines changed

README.md

Lines changed: 214 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@ A fast C++ header-only library to help you quickly build parallel programs with
1212
# Why Cpp-Taskflow?
1313

1414
Cpp-Taskflow lets you quickly build parallel dependency graphs using modern C++17.
15-
It is by far faster, more expressive, and easier for drop-in integration than existing libraries.
15+
It can create both *static and dynamic* tasks,
16+
and is by far faster, more expressive, and easier for drop-in integration than existing libraries.
1617

1718
| Without Cpp-Taskflow | With Cpp-Taskflow |
1819
| -------------------- | ----------------- |
@@ -144,24 +145,214 @@ A.gather(B, C, D); // A runs after B, C, and D.
144145
```
145146

146147
**Linearize**: Linearizing a task sequence adds a preceding link to each adjacent pair.
148+
147149
```cpp
148150
tf.linearize(A, B, C, D); // A runs before B, B runs before C, and C runs before D.
149151
```
150152

151153
## Step 3: Execute the Tasks
154+
152155
There are three methods to carry out a task dependency graph, `dispatch`, `silent_dispatch`, and `wait_for_all`.
156+
153157
```cpp
154158
auto future = tf.dispatch(); // non-blocking, returns with a future immediately.
155159
tf.silent_dispatch(); // non-blocking, no return
156160
```
161+
157162
Calling `wait_for_all` will block until all tasks complete.
163+
158164
```cpp
159165
tf.wait_for_all();
160166
```
161167

168+
Each of these methods will dispatch the current graph to the work queue
169+
and create a data structure called *topology* to store the execution state.
170+
171+
172+
# Dynamic Tasking
173+
174+
Another powerful feature of Taskflow is *dynamic* tasking.
175+
A dynamic task is created during the execution of a dispatched taskflow graph, i.e.,
176+
topology.
177+
These tasks are spawned by a parent task and are grouped together to a *subflow* graph.
178+
The example below demonstrates how to create a subflow node
179+
that spawns three tasks during its execution.
180+
181+
<img align="right" src="image/subflow_join.png" width="40%">
182+
183+
```cpp
184+
// create three regular tasks
185+
auto A = tf.silent_emplace([](){}).name("A");
186+
auto C = tf.silent_emplace([](){}).name("C");
187+
auto D = tf.silent_emplace([](){}).name("D");
188+
189+
// create a subflow graph (dynamic tasking)
190+
auto B = tf.silent_emplace([] (auto& subflow) {
191+
auto B1 = subflow.silent_emplace([](){}).name("B1");
192+
auto B2 = subflow.silent_emplace([](){}).name("B2");
193+
auto B3 = subflow.silent_emplace([](){}).name("B3");
194+
B1.precede(B3);
195+
B2.precede(B3);
196+
}).name("TaskB");
197+
198+
A.precede(B); // B runs after A
199+
A.precede(C); // C runs after A
200+
B.precede(D); // D runs after B
201+
C.precede(D); // D runs after C
202+
203+
// execute the graph without cleanning up topologies
204+
tf.dispatch().get();
205+
std::cout << tf.dump_topologies();
206+
```
207+
208+
By default, a subflow graph joins to its parent node.
209+
This guarantees a subflow graph to finish before executing the successors of
210+
its parent node.
211+
You can disable this feature by calling `subflow.detach()`.
212+
Detaching the above subflow will result in the following execution flow.
213+
214+
<img align="right" src="image/subflow_detach.png" width="65%">
215+
216+
```cpp
217+
// detach a subflow graph
218+
[] (auto& subflow) {
219+
...
220+
B1.precede(B3);
221+
B2.precede(B3);
222+
223+
// detach this from its parent B
224+
subflow.detach();
225+
}).name("TaskB");
226+
```
227+
228+
## Step 1: Create a Subflow
229+
230+
Cpp-Taskflow has an unified interface for static and dynamic tasking.
231+
To create a subflow for dynamic tasking,
232+
emplace a task callable with one argument of type `tf::SubflowBuilder`.
233+
234+
```cpp
235+
auto A = tf.silent_emplace([] (tf::SubflowBuilder& subflow) {});
236+
```
237+
238+
Similarly, you can get a future object to the execution status of the subflow.
239+
240+
```cpp
241+
auto [A, fu] = tf.emplace([] (tf::SubflowBuilder& subflow) {});
242+
```
243+
244+
A subflow builder is a lightweight object that allows you to create
245+
arbitrary dependency graphs on the fly.
246+
All graph building methods defined in taskflow
247+
can be used in a subflow builder.
248+
249+
```cpp
250+
auto A = tf.silent_emplace([] (tf::SubflowBuilder& subflow) {
251+
std::cout << "Task A is spawning two subtasks A1 and A2" << '\n';
252+
auto [A1, A2] = subflow.silent_emplace(
253+
[] () { std::cout << "subtask A1" << '\n'; },
254+
[] () { std::cout << "subtask A2" << '\n'; }
255+
A1.precede(A2);
256+
);
257+
});
258+
```
259+
260+
A subflow can also be nested or recursive. You can create another subflow from
261+
the execution of a subflow and so on.
262+
263+
```cpp
264+
auto A = tf.silent_emplace([] (tf::SubflowBuilder& subflow) {
265+
std::cout << "Task A is spawning one task A1 and one subflow A2" << '\n';
266+
auto A1 = subflow.silent_emplace([] () {
267+
std::cout << "subtask A1" << '\n';
268+
});
269+
auto A2 = subflow.silent_emplace([] (tf::SubflowBuilder& subflow2) {
270+
std::cout << "subflow A2 is spawning two tasks A2_1 and A2_2" << '\n';
271+
auto A2_1 = subflow2.silent_emplace([] () {
272+
std::cout << "subtask A2_1" << '\n';
273+
});
274+
auto A2_2 = subflow2.silent_emplace([] () {
275+
std::cout << "subtask A2_2" << '\n';
276+
});
277+
A2_1.precede(A2_2);
278+
});
279+
A1.precede(A2);
280+
});
281+
```
282+
283+
## Step 2: Detach or Join a Subflow
284+
285+
A subflow has no methods to dispatch its tasks.
286+
Instead, a subflow will be executed after leaving the context of the callable.
287+
By default, a subflow joins to its parent task.
288+
Depending on applications, you can detach a subflow to enable more parallelism.
289+
290+
```cpp
291+
auto A = tf.silent_emplace([] (tf::SubflowBuilder& subflow) {
292+
subflow.detach(); // detach this subflow from its parent task A
293+
}); // subflow starts to run after the callable scope
294+
```
295+
296+
Detaching or Joining a subflow has different meaning in the ready status of
297+
the future object referred to it.
298+
In a joined subflow,
299+
the completion of its parent node is defined as when all tasks
300+
inside the subflow (possibly nested) finish.
301+
302+
<img align="right" src="image/joined_subflow_future.png" width="40%">
303+
304+
```cpp
305+
int value {0};
306+
307+
// create a joined subflow
308+
auto [A, fuA] = tf.emplace([&] (tf::SubflowBuilder& subflow) {
309+
subflow.silent_emplace([&]() {
310+
value = 10;
311+
});
312+
return 100; // some arbitrary value
313+
});
314+
315+
// create a task B after A
316+
auto B = tf.silent_emplace([&] () {
317+
assert(value == 10);
318+
assert(fuA.wait_for(0s) == std::future_status::ready);
319+
});
320+
321+
// A1 must finish before A and therefore before B
322+
A.precede(B);
323+
```
324+
325+
When a subflow is detached from its parent task, it becomes a parallel
326+
execution line to the current flow graph and will eventually
327+
join to the same topology.
328+
329+
<img align="right" src="image/detached_subflow_future.png" width="40%">
330+
331+
```cpp
332+
int value {0};
333+
334+
// create a detached subflow
335+
auto [A, fuA] = tf.emplace([&] (tf::SubflowBuilder& subflow) {
336+
subflow.silent_emplace([&]() { value = 10; });
337+
subflow.detach();
338+
return 100; // some arbitrary value
339+
});
340+
341+
// create a task B after A
342+
auto B = tf.silent_emplace([&] () {
343+
// no guarantee for value to be 10 nor fuA to be ready
344+
});
345+
A.precede(B);
346+
```
347+
348+
162349
# Debug a Taskflow Graph
163-
Concurrent programs are notoriously difficult to debug.
164-
We suggest (1) naming tasks and dumping the graph, and (2) starting with single thread before going multiple.
350+
351+
Concurrent programs are notoriously difficult to debug.
352+
Cpp-Taskflow leverages the graph properties to relief the debugging pain.
353+
To debug a taskflow graph,
354+
(1) name tasks and dump the graph, and
355+
(2) start with one thread before going multiple.
165356
Currently, Cpp-Taskflow supports [GraphViz][GraphViz] format.
166357

167358
```cpp
@@ -179,6 +370,7 @@ B.broadcast(D, E);
179370

180371
std::cout << tf.dump();
181372
```
373+
182374
Run the program and inspect whether dependencies are expressed in the right way.
183375
There are a number of free [GraphViz tools][AwesomeGraphViz] you could find online
184376
to visualize your Taskflow graph.
@@ -204,7 +396,7 @@ digraph Taskflow {
204396
## Taskflow API
205397

206398
The class `tf::Taskflow` is the main place to create taskflow graphs and carry out task dependencies.
207-
The table below summarizes its commonly used methods.
399+
The table below summarizes a list of commonly used methods.
208400

209401
| Method | Argument | Return | Description |
210402
| -------- | --------- | ------- | ----------- |
@@ -225,6 +417,7 @@ The table below summarizes its commonly used methods.
225417
| num_workers | none | size | return the number of working threads in the pool |
226418
| num_topologies | none | size | return the number of dispatched graphs |
227419
| dump | none | string | dump the current graph to a string of GraphViz format |
420+
| dump_topologies | none | string | dump dispatched topologies to a string of GraphViz format |
228421

229422
### *emplace/silent_emplace/placeholder*
230423

@@ -327,11 +520,11 @@ The method `transform_reduce` is similar to reduce, except it applies a unary op
327520
This is particular useful when you need additional data processing to reduce a range of elements.
328521
329522
```cpp
330-
auto v = { {1, 5}, {6, 4}, {-6, 4} };
523+
std::vector<std::pari<int, int>> v = { {1, 5}, {6, 4}, {-6, 4} };
331524
int min = std::numeric_limits<int>::max();
332525
auto [S, T] = tf.transform_reduce(v.begin(), v.end(), min,
333526
[] (int l, int r) { return std::min(l, r); },
334-
[] (const auto& pair) { return std::min(p.first, p.second); }
527+
[] (const std::pair<int, int>& pair) { return std::min(p.first, p.second); }
335528
);
336529
```
337530

@@ -469,13 +662,13 @@ The folder `example/` contains several examples and is a great place to learn to
469662

470663
| Example | Description |
471664
| ------- | ----------- |
472-
| [simple.cpp](./example/simple.cpp) | use basic task building blocks to create a trivial taskflow graph |
473-
| [debug.cpp](./example/debug.cpp)| inspect a taskflow through the dump method |
474-
| [emplace.cpp](./example/emplace.cpp)| demonstrate the difference between the emplace method and the silent_emplace method |
475-
| [matrix.cpp](./example/matrix.cpp) | create two set of matrices and multiply each individually in parallel |
476-
| [parallel_for.cpp](./example/parallel_for.cpp)| parallelize a for loop with unbalanced workload |
477-
| [reduce.cpp](./example/reduce.cpp)| perform reduce operations over linear containers |
478-
| [subflow.cpp](./example/subflow.cpp)| create a taskflow graph that spawns dynamic tasks |
665+
| [simple.cpp](./example/simple.cpp) | uses basic task building blocks to create a trivial taskflow graph |
666+
| [debug.cpp](./example/debug.cpp)| inspects a taskflow through the dump method |
667+
| [emplace.cpp](./example/emplace.cpp)| demonstrates the difference between the emplace method and the silent_emplace method |
668+
| [matrix.cpp](./example/matrix.cpp) | creates two set of matrices and multiply each individually in parallel |
669+
| [parallel_for.cpp](./example/parallel_for.cpp)| parallelizes a for loop with unbalanced workload |
670+
| [reduce.cpp](./example/reduce.cpp)| performs reduce operations over linear containers |
671+
| [subflow.cpp](./example/subflow.cpp)| demonstrates how to create a subflow graph that spawns three dynamic tasks |
479672

480673

481674
# Get Involved
@@ -496,7 +689,6 @@ Please [let me know][email me] if I forgot someone!
496689
# Who is Using Cpp-Taskflow?
497690

498691
Cpp-Taskflow is being used in both industry and academic projects to scale up existing workloads that incorporate complex task dependencies.
499-
A proprietary research report has shown over 10x improvement by switching to Cpp-Taskflow.
500692

501693
- [OpenTimer][OpenTimer]: A High-performance Timing Analysis Tool for VLSI Systems.
502694
- [DtCraft][DtCraft]: A General-purpose Distributed Programming Systems.
@@ -509,13 +701,13 @@ Please [let me know][email me] if I forgot your project!
509701

510702
Cpp-Taskflow is licensed under the [MIT License](./LICENSE):
511703

512-
Copyright &copy; 2018 [Tsung-Wei Huang][Tsung-Wei Huang], [Chun-Xun Lin][Chun-Xun Lin], [Martin Wong][Martin Wong].
513-
514-
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
515-
516-
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
517-
518-
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
704+
>Copyright &copy; 2018 [Tsung-Wei Huang][Tsung-Wei Huang], [Chun-Xun Lin][Chun-Xun Lin], [Martin Wong][Martin Wong].
705+
>
706+
>Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
707+
>
708+
>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
709+
>
710+
>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
519711
520712
* * *
521713

@@ -532,7 +724,7 @@ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR I
532724
[AwesomeGraphViz]: https://github.com/CodeFreezr/awesome-graphviz
533725
[OpenMP Tasking]: http://www.nersc.gov/users/software/programming-models/openmp/openmp-tasking/
534726
[TBB FlowGraph]: https://www.threadingbuildingblocks.org/tutorial-intel-tbb-flow-graph
535-
[OpenTimer]: https://web.engr.illinois.edu/~thuang19/software/timer/OpenTimer.html
727+
[OpenTimer]: https://github.com/OpenTimer/OpenTimer
536728
[DtCraft]: http://dtcraft.web.engr.illinois.edu/
537729
[email me]: mailto:[email protected]
538730

example/subflow.cpp

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,21 @@ int main(int argc, char* argv[]) {
3636
// Create a taskflow graph with three regular tasks and one subflow task.
3737
tf::Taskflow tf(std::thread::hardware_concurrency());
3838

39-
auto [A, B, C, D] = tf.silent_emplace(
39+
int value {0};
40+
auto [A, fuA] = tf.emplace([&] (tf::SubflowBuilder& subflow) {
41+
subflow.silent_emplace([&]() { value = 10; }).name("A1");
42+
subflow.detach();
43+
});
44+
auto B = tf.silent_emplace([&] () { /*assert(value == 10);*/ });
45+
A.precede(B);
46+
A.name("A");
47+
B.name("B");
48+
49+
/*auto [A, B, C, D] = tf.silent_emplace(
4050
// Task A
4151
[] () { std::cout << "TaskA\n"; },
4252
// Task B
43-
[cap=std::vector<int>{1,2,3,4,5,6,7,8}, detached] (auto& subflow) {
53+
[cap=std::vector<int>{1,2,3,4,5,6,7,8}, detached] (auto& subflow) {
4454
4555
std::cout << "TaskB is spawning B1, B2, and B3 ...\n";
4656
@@ -80,6 +90,9 @@ int main(int argc, char* argv[]) {
8090
A.precede(C); // C runs after A
8191
B.precede(D); // D runs after B
8292
C.precede(D); // D runs after C
93+
*/
94+
// examine the graph
95+
std::cout << tf.dump() << '\n';
8396

8497
tf.dispatch().get(); // block until finished
8598

image/.DS_Store

8 KB
Binary file not shown.

image/detached_subflow_future.png

68 KB
Loading

image/joined_subflow_future.png

81.1 KB
Loading

image/logo.pdf

146 KB
Binary file not shown.

image/reduce_old.png

34.4 KB
Loading

0 commit comments

Comments
 (0)