You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>There is a large amount of work on programming systems (e.g., StarPU, Intel TBB, OpenMP, PaRSEC, Kokkos, HPX) in the interest of simplifying the programming complexity of parallel and heterogeneous computing. Each of these systems has its own pros and cons and deserves a reason to exist. However, they do have some problems, particularly from the standpoint of ease of use, static control flow, and scheduling efficiency. Cpp-Taskflow addresses these limitations through a simple, expressive, and transparent graph programming model.</p>
125
125
<h2><aclass="anchor" id="GeneralQuestion10"></a>
126
126
Q10: Do you try to simplify the GPU kernel programming?</h2>
127
-
<p>No, we do not develop new programming models to simplify the kernel programming. The rationale is simple: Writing efficient kernels requires domain-specific knowledge and developers often require direct access to the native GPU programming interface. High-level kernel programming models or abstractions all come with restricted applicability. Despite non-trivial kernel programming, we believe what makes heterogeneous computing difficult are surrounding tasks. A mistake made by task scheduling can outweigh all speed-up benefits from a highly optimized kernel. Therefore, Cpp-Taskflow focuses on heterogeneous tasking that affects the overall system performance to a large extent. </p><hr/>
127
+
<p>No, we do not develop new programming models to simplify the kernel programming. The rationale is simple: Writing efficient kernels requires domain-specific knowledge and developers often require direct access to the native GPU programming interface. High-level kernel programming models or abstractions all come with restricted applicability. Despite non-trivial kernel programming, we believe what makes heterogeneous computing difficult are surrounding tasks. A mistake made by task scheduling can outweigh all speed-up benefits from a highly optimized kernel. Therefore, Cpp-Taskflow focuses on heterogeneous tasking that affects the overall system performance to a large extent.</p>
128
+
<h2><aclass="anchor" id="GeneralQuestion11"></a>
129
+
Q11: Do you have any real use cases?</h2>
130
+
<p>We have applied Cpp-Taskflow to solve many realistic workloads and demonstrated promising performance scalability and programming productivity. Please refer to <aclass="el" href="usecases.html">Real Use Cases</a> and <aclass="el" href="References.html">References</a>. </p><hr/>
<p>Using 8 CPUs and 1 GPU, Cpp-Taskflow is consistently faster than others across all problem sizes (placement iterations). The gap becomes clear at large problem size; at 100 iterations, Cpp-Taskflow is 17% faster than TBB and StarPU. We observed similar results using other CPU numbers. Performance saturates at about 16 CPUs, primarily due to the inherent irregularity of the placement algorithm.</p>
<p>The memory footprint shows the benefit of our conditional tasking. We keep nearly no growth of memory when the problem size increases, whereas StarPU and TBB grow linearly due to unrolled task graphs. At a vertical scale, increasing the number of CPUs bumps up the memory usage of all methods, but the growth rate of Cpp-Taskflow is much slower than the others.</p>
<p>In terms of energy, our scheduler is very power efficient in completing the placement workload, regardless of problem sizes and CPU numbers. Beyond 16 CPUs where performance saturates, our system does not suffer from increasing power as StarPU, due to our adaptive task scheduling algorithm.</p>
<p>For irregular task graphs akin to this placement workload, resource utilization is critical to the system throughput. We corun the same program by up to nine processes that compete for 40 CPUs and 1 GPU. Corunning a placement program is very common for searching the best parameters for an algorithm. We plot the throughput using <em>weighted speed-up</em> across nine coruns at two problem sizes. Both Cpp-Taskflow and TBB achieve higher throughput than StarPU. At the largest problem size, Cpp-Taskflow outperforms TBB and StarPU across all coruns.</p>
Copy file name to clipboardExpand all lines: docs/index.html
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -96,7 +96,7 @@
96
96
Modern C++ Parallel Task Programming</h1>
97
97
<p>Cpp-Taskflow helps you quickly write parallel task programs with high performance scalability and simultaneous high productivity. It is faster, more expressive, fewer lines of code, and easier for drop-in integration than many of existing task programming frameworks.</p>
<p>Cpp-Taskflow is committed to support both academic and industry research projects, making it reliable and cost-efficient to develop large-scale parallel applications. Our users say:</p>
<divclass="fragment"><divclass="line">~$ git clone https://github.com/cpp-taskflow/cpp-taskflow.git</div><divclass="line">~$ cd cpp-taskflow/</div><divclass="line">~$ cp -r taskflow myproject/include/</div></div><!-- fragment --><h1><aclass="anchor" id="ASimpleFirstProgram"></a>
147
147
A Simple First Program</h1>
148
148
<p>Here is a rather simple program to get you started.</p>
149
-
<divclass="fragment"><divclass="line"><spanclass="preprocessor">#include <taskflow/taskflow.hpp></span><spanclass="comment">// Cpp-Taskflow is header-only</span></div><divclass="line"></div><divclass="line"><spanclass="keywordtype">int</span> main(){</div><divclass="line"></div><divclass="line"><aclass="code" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div><divclass="line"><aclass="code" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div><divclass="line"></div><divclass="line"><spanclass="keyword">auto</span> [A, B, C, D] = taskflow.<aclass="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskA\n"</span>; }, <spanclass="comment">// task dependency graph</span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskB\n"</span>; }, <spanclass="comment">// </span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskC\n"</span>; }, <spanclass="comment">// +---+ </span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskD\n"</span>; } <spanclass="comment">// +---->| B |-----+ </span></div><divclass="line"> ); <spanclass="comment">// | +---+ |</span></div><divclass="line"><spanclass="comment">// +---+ +-v-+ </span></div><divclass="line"> A.precede(B); <spanclass="comment">// A runs before B // | A | | D | </span></div><divclass="line"> A.precede(C); <spanclass="comment">// A runs before C // +---+ +-^-+ </span></div><divclass="line"> B.precede(D); <spanclass="comment">// B runs before D // | +---+ | </span></div><divclass="line"> C.precede(D); <spanclass="comment">// C runs before D // +---->| C |-----+ </span></div><divclass="line"><spanclass="comment">// +---+ </span></div><divclass="line"> executor.<aclass="code" href="classtf_1_1Executor.html#a81f35d5b0a20ac0646447eb80d97c0aa">run</a>(taskflow).wait();</div><divclass="line"></div><divclass="line"><spanclass="keywordflow">return</span> 0;</div><divclass="line">}</div></div><!-- fragment --><p>The program creates four tasks A, B, C, and D. The dependency constraints force A to run before B and C, and D to run after B and C. The maximum concurrency is this example is two, where B and C can run at the same time.</p>
149
+
<divclass="fragment"><divclass="line"><spanclass="preprocessor">#include <taskflow/taskflow.hpp></span><spanclass="comment">// Cpp-Taskflow is header-only</span></div><divclass="line"></div><divclass="line"><spanclass="keywordtype">int</span> main(){</div><divclass="line"></div><divclass="line"><aclass="code" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div><divclass="line"><aclass="code" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div><divclass="line"></div><divclass="line"><spanclass="keyword">auto</span> [A, B, C, D] = taskflow.<aclass="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskA\n"</span>; }, <spanclass="comment">// task dependency graph</span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskB\n"</span>; }, <spanclass="comment">// </span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskC\n"</span>; }, <spanclass="comment">// +---+ </span></div><divclass="line"> [] () { <aclass="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> << <spanclass="stringliteral">"TaskD\n"</span>; } <spanclass="comment">// +---->| B |-----+ </span></div><divclass="line"> ); <spanclass="comment">// | +---+ |</span></div><divclass="line"><spanclass="comment">// +---+ +-v-+ </span></div><divclass="line"> A.<aclass="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(B); <spanclass="comment">// A runs before B // | A | | D | </span></div><divclass="line"> A.<aclass="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(C); <spanclass="comment">// A runs before C // +---+ +-^-+ </span></div><divclass="line"> B.<aclass="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(D); <spanclass="comment">// B runs before D // | +---+ | </span></div><divclass="line"> C.<aclass="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(D); <spanclass="comment">// C runs before D // +---->| C |-----+ </span></div><divclass="line"><spanclass="comment">// +---+ </span></div><divclass="line"> executor.<aclass="code" href="classtf_1_1Executor.html#a81f35d5b0a20ac0646447eb80d97c0aa">run</a>(taskflow).wait();</div><divclass="line"></div><divclass="line"><spanclass="keywordflow">return</span> 0;</div><divclass="line">}</div></div><!-- fragment --><p>The program creates four tasks A, B, C, and D. The dependency constraints force A to run before B and C, and D to run after B and C. The maximum concurrency is this example is two, where B and C can run at the same time.</p>
0 commit comments