Skip to content

Commit b5aaea1

Browse files
updated docs
1 parent a5d7cc9 commit b5aaea1

27 files changed

+89
-78
lines changed

docs/FAQ.html

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,10 @@ <h2><a class="anchor" id="GeneralQuestion9"></a>
124124
<p>There is a large amount of work on programming systems (e.g., StarPU, Intel TBB, OpenMP, PaRSEC, Kokkos, HPX) in the interest of simplifying the programming complexity of parallel and heterogeneous computing. Each of these systems has its own pros and cons and deserves a reason to exist. However, they do have some problems, particularly from the standpoint of ease of use, static control flow, and scheduling efficiency. Cpp-Taskflow addresses these limitations through a simple, expressive, and transparent graph programming model.</p>
125125
<h2><a class="anchor" id="GeneralQuestion10"></a>
126126
Q10: Do you try to simplify the GPU kernel programming?</h2>
127-
<p>No, we do not develop new programming models to simplify the kernel programming. The rationale is simple: Writing efficient kernels requires domain-specific knowledge and developers often require direct access to the native GPU programming interface. High-level kernel programming models or abstractions all come with restricted applicability. Despite non-trivial kernel programming, we believe what makes heterogeneous computing difficult are surrounding tasks. A mistake made by task scheduling can outweigh all speed-up benefits from a highly optimized kernel. Therefore, Cpp-Taskflow focuses on heterogeneous tasking that affects the overall system performance to a large extent. </p><hr/>
127+
<p>No, we do not develop new programming models to simplify the kernel programming. The rationale is simple: Writing efficient kernels requires domain-specific knowledge and developers often require direct access to the native GPU programming interface. High-level kernel programming models or abstractions all come with restricted applicability. Despite non-trivial kernel programming, we believe what makes heterogeneous computing difficult are surrounding tasks. A mistake made by task scheduling can outweigh all speed-up benefits from a highly optimized kernel. Therefore, Cpp-Taskflow focuses on heterogeneous tasking that affects the overall system performance to a large extent.</p>
128+
<h2><a class="anchor" id="GeneralQuestion11"></a>
129+
Q11: Do you have any real use cases?</h2>
130+
<p>We have applied Cpp-Taskflow to solve many realistic workloads and demonstrated promising performance scalability and programming productivity. Please refer to <a class="el" href="usecases.html">Real Use Cases</a> and <a class="el" href="References.html">References</a>. </p><hr/>
128131
<h1><a class="anchor" id="ProgrammingQuestions"></a>
129132
Programming Questions</h1>
130133
<h2><a class="anchor" id="ProgrammingQuestions1"></a>

docs/UseCases.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@
6969
</div>
7070
<script type="text/javascript">
7171
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&amp;dn=gpl-2.0.txt GPL-v2 */
72-
$(document).ready(function(){initNavTree('UseCases.html','');});
72+
$(document).ready(function(){initNavTree('usecases.html','');});
7373
/* @license-end */
7474
</script>
7575
<div id="doc-content">

docs/UseCases.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
var UseCases =
1+
var usecases =
22
[
33
[ "Static Timing Analysis", "opentimer.html", [
44
[ "OpenTimer: A High-performance Timing Analysis Tool", "opentimer.html#UseCasesOpenTimer", null ],

docs/dreamplace.html

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ <h1><a class="anchor" id="UseCasesDreamPlace"></a>
107107
</ul>
108108
<p>Each iteration contains overlapped CPU and GPU tasks with nested conditions to decide the convergence.</p>
109109
<div class="image">
110-
<img src="dreamplace_2.png" alt="dreamplace_2.png" width="80%"/>
110+
<img src="dreamplace_2.png" alt="dreamplace_2.png" width="100%"/>
111111
</div>
112112
<h1><a class="anchor" id="UseCasesDreamPlaceProgrammingEffort"></a>
113113
Programming Effort</h1>
@@ -130,19 +130,19 @@ <h1><a class="anchor" id="UseCasesDreamPlaceProgrammingEffort"></a>
130130
Performance</h1>
131131
<p>Using 8 CPUs and 1 GPU, Cpp-Taskflow is consistently faster than others across all problem sizes (placement iterations). The gap becomes clear at large problem size; at 100 iterations, Cpp-Taskflow is 17% faster than TBB and StarPU. We observed similar results using other CPU numbers. Performance saturates at about 16 CPUs, primarily due to the inherent irregularity of the placement algorithm.</p>
132132
<div class="image">
133-
<img src="dreamplace_4.png" alt="dreamplace_4.png" width="60%"/>
133+
<img src="dreamplace_4.png" alt="dreamplace_4.png" width="70%"/>
134134
</div>
135135
<p>The memory footprint shows the benefit of our conditional tasking. We keep nearly no growth of memory when the problem size increases, whereas StarPU and TBB grow linearly due to unrolled task graphs. At a vertical scale, increasing the number of CPUs bumps up the memory usage of all methods, but the growth rate of Cpp-Taskflow is much slower than the others.</p>
136136
<div class="image">
137-
<img src="dreamplace_5.png" alt="dreamplace_5.png" width="60%"/>
137+
<img src="dreamplace_5.png" alt="dreamplace_5.png" width="70%"/>
138138
</div>
139139
<p>In terms of energy, our scheduler is very power efficient in completing the placement workload, regardless of problem sizes and CPU numbers. Beyond 16 CPUs where performance saturates, our system does not suffer from increasing power as StarPU, due to our adaptive task scheduling algorithm.</p>
140140
<div class="image">
141-
<img src="dreamplace_6.png" alt="dreamplace_6.png" width="60%"/>
141+
<img src="dreamplace_6.png" alt="dreamplace_6.png" width="70%"/>
142142
</div>
143143
<p>For irregular task graphs akin to this placement workload, resource utilization is critical to the system throughput. We corun the same program by up to nine processes that compete for 40 CPUs and 1 GPU. Corunning a placement program is very common for searching the best parameters for an algorithm. We plot the throughput using <em>weighted speed-up</em> across nine coruns at two problem sizes. Both Cpp-Taskflow and TBB achieve higher throughput than StarPU. At the largest problem size, Cpp-Taskflow outperforms TBB and StarPU across all coruns.</p>
144144
<div class="image">
145-
<img src="dreamplace_7.png" alt="dreamplace_7.png" width="60%"/>
145+
<img src="dreamplace_7.png" alt="dreamplace_7.png" width="70%"/>
146146
</div>
147147
<h1><a class="anchor" id="UseCasesDreamPlaceConclusion"></a>
148148
Conclusion</h1>
@@ -158,7 +158,7 @@ <h1><a class="anchor" id="UseCasesDreamPlaceReferences"></a>
158158
<!-- start footer part -->
159159
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
160160
<ul>
161-
<li class="navelem"><a class="el" href="UseCases.html">Real Use Cases</a></li>
161+
<li class="navelem"><a class="el" href="usecases.html">Real Use Cases</a></li>
162162
<li class="footer">Generated by
163163
<a href="http://www.doxygen.org/index.html">
164164
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.14 </li>

docs/dreamplace_2.png

24.3 KB
Loading

docs/flipcoins.html

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
Modern C++ Parallel Task Programming</h1>
9797
<p>Cpp-Taskflow helps you quickly write parallel task programs with high performance scalability and simultaneous high productivity. It is faster, more expressive, fewer lines of code, and easier for drop-in integration than many of existing task programming frameworks.</p>
9898
<div class="image">
99-
<img src="performance.png" alt="performance.png" width="95%"/>
99+
<img src="performance.png" alt="performance.png" width="100%"/>
100100
</div>
101101
<p>Cpp-Taskflow is committed to support both academic and industry research projects, making it reliable and cost-efficient to develop large-scale parallel applications. Our users say:</p>
102102
<ul>
@@ -146,7 +146,7 @@ <h1><a class="anchor" id="HowToInstallCppTaskflow"></a>
146146
<div class="fragment"><div class="line">~$ git clone https://github.com/cpp-taskflow/cpp-taskflow.git</div><div class="line">~$ cd cpp-taskflow/</div><div class="line">~$ cp -r taskflow myproject/include/</div></div><!-- fragment --><h1><a class="anchor" id="ASimpleFirstProgram"></a>
147147
A Simple First Program</h1>
148148
<p>Here is a rather simple program to get you started.</p>
149-
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;taskflow/taskflow.hpp&gt;</span> <span class="comment">// Cpp-Taskflow is header-only</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(){</div><div class="line"> </div><div class="line"> <a class="code" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div><div class="line"> <a class="code" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div><div class="line"></div><div class="line"> <span class="keyword">auto</span> [A, B, C, D] = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskA\n&quot;</span>; }, <span class="comment">// task dependency graph</span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskB\n&quot;</span>; }, <span class="comment">// </span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskC\n&quot;</span>; }, <span class="comment">// +---+ </span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskD\n&quot;</span>; } <span class="comment">// +----&gt;| B |-----+ </span></div><div class="line"> ); <span class="comment">// | +---+ |</span></div><div class="line"> <span class="comment">// +---+ +-v-+ </span></div><div class="line"> A.precede(B); <span class="comment">// A runs before B // | A | | D | </span></div><div class="line"> A.precede(C); <span class="comment">// A runs before C // +---+ +-^-+ </span></div><div class="line"> B.precede(D); <span class="comment">// B runs before D // | +---+ | </span></div><div class="line"> C.precede(D); <span class="comment">// C runs before D // +----&gt;| C |-----+ </span></div><div class="line"> <span class="comment">// +---+ </span></div><div class="line"> executor.<a class="code" href="classtf_1_1Executor.html#a81f35d5b0a20ac0646447eb80d97c0aa">run</a>(taskflow).wait();</div><div class="line"></div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><p>The program creates four tasks A, B, C, and D. The dependency constraints force A to run before B and C, and D to run after B and C. The maximum concurrency is this example is two, where B and C can run at the same time.</p>
149+
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;taskflow/taskflow.hpp&gt;</span> <span class="comment">// Cpp-Taskflow is header-only</span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main(){</div><div class="line"> </div><div class="line"> <a class="code" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div><div class="line"> <a class="code" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div><div class="line"></div><div class="line"> <span class="keyword">auto</span> [A, B, C, D] = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskA\n&quot;</span>; }, <span class="comment">// task dependency graph</span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskB\n&quot;</span>; }, <span class="comment">// </span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskC\n&quot;</span>; }, <span class="comment">// +---+ </span></div><div class="line"> [] () { <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a> &lt;&lt; <span class="stringliteral">&quot;TaskD\n&quot;</span>; } <span class="comment">// +----&gt;| B |-----+ </span></div><div class="line"> ); <span class="comment">// | +---+ |</span></div><div class="line"> <span class="comment">// +---+ +-v-+ </span></div><div class="line"> A.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(B); <span class="comment">// A runs before B // | A | | D | </span></div><div class="line"> A.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(C); <span class="comment">// A runs before C // +---+ +-^-+ </span></div><div class="line"> B.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(D); <span class="comment">// B runs before D // | +---+ | </span></div><div class="line"> C.<a class="code" href="classtf_1_1Task.html#a8c78c453295a553c1c016e4062da8588">precede</a>(D); <span class="comment">// C runs before D // +----&gt;| C |-----+ </span></div><div class="line"> <span class="comment">// +---+ </span></div><div class="line"> executor.<a class="code" href="classtf_1_1Executor.html#a81f35d5b0a20ac0646447eb80d97c0aa">run</a>(taskflow).wait();</div><div class="line"></div><div class="line"> <span class="keywordflow">return</span> 0;</div><div class="line">}</div></div><!-- fragment --><p>The program creates four tasks A, B, C, and D. The dependency constraints force A to run before B and C, and D to run after B and C. The maximum concurrency is this example is two, where B and C can run at the same time.</p>
150150
<div class="image">
151151
<object type="image/svg+xml" data="simple.svg" width="35%">simple.svg</object>
152152
</div>

0 commit comments

Comments
 (0)