forked from taskflow/taskflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathchapter6.html
More file actions
133 lines (131 loc) · 13.3 KB
/
chapter6.html
File metadata and controls
133 lines (131 loc) · 13.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
<!-- HTML header for doxygen 1.8.13-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.14"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>Cpp-Taskflow</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<link rel="icon" type="image/x-icon" href="favicon.ico" />
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(document).ready(initResizable);
/* @license-end */</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(document).ready(function() { init_search(); });
/* @license-end */
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectalign" style="padding-left: 0.5em;">
<div id="projectname"><a href="https://github.com/cpp-taskflow/cpp-taskflow">Cpp-Taskflow</a>
 <span id="projectnumber">2.3.0</span>
</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.14 -->
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
var searchBox = new SearchBox("searchBox", "search",false,'Search');
/* @license-end */
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */
$(document).ready(function(){initNavTree('chapter6.html','');});
/* @license-end */
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">C6: CPU-GPU Tasking </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Modern scientific computing typically leverages GPU-powered parallel processing cores to speed up large-scale applications. This chapters discusses how to implement heterogeneous decomposition algorithms using CPU-GPU collaborative tasking.</p>
<h1><a class="anchor" id="C6_Create_a_cudaFlow"></a>
Create a cudaFlow</h1>
<p>Cpp-Taskflow enables concurrent CPU-GPU tasking by leveraging <a href="https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html">Nvidia CUDA Graph</a>. The tasking interface is referred to as <em>cudaFlow</em>. A <a class="el" href="classtf_1_1cudaFlow.html" title="Building methods of a cuda task dependency graph. ">tf::cudaFlow</a> is a graph object created at runtime similar to dynamic tasking. It manages a task node in a taskflow and associates it with a CUDA Graph. To create a cudaFlow, emplace a callable with an argument of type <a class="el" href="classtf_1_1cudaFlow.html" title="Building methods of a cuda task dependency graph. ">tf::cudaFlow</a>. The following example implements the canonical saxpy (A·X Plus Y) task graph.</p>
<div class="fragment"><div class="line"> 1: #include <taskflow/taskflow.hpp></div><div class="line"> 2: </div><div class="line"> 3: <span class="comment">// saxpy (single-precision A·X Plus Y) kernel</span></div><div class="line"> 4: __global__ <span class="keywordtype">void</span> saxpy(<span class="keywordtype">int</span> n, <span class="keywordtype">float</span> a, <span class="keywordtype">float</span> *x, <span class="keywordtype">float</span> *y) {</div><div class="line"> 5: <span class="keywordtype">int</span> i = blockIdx.x*blockDim.x + threadIdx.x;</div><div class="line"> 6: <span class="keywordflow">if</span> (i < n) {</div><div class="line"> 7: y[i] = a*x[i] + y[i];</div><div class="line"> 8: }</div><div class="line"> 9: }</div><div class="line">10:</div><div class="line">11: <span class="comment">// main function begins</span></div><div class="line">12: <span class="keywordtype">int</span> main() {</div><div class="line">13:</div><div class="line">14: <a class="code" href="classtf_1_1Taskflow.html">tf::Taskflow</a> taskflow;</div><div class="line">15: <a class="code" href="classtf_1_1Executor.html">tf::Executor</a> executor;</div><div class="line">16: </div><div class="line">17: <span class="keyword">const</span> <span class="keywordtype">unsigned</span> N = 1<<20; <span class="comment">// size of the vector</span></div><div class="line">18:</div><div class="line">19: <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/vector.html">std::vector<float></a> hx(N, 1.0f); <span class="comment">// x vector at host</span></div><div class="line">20: <a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/container/vector.html">std::vector<float></a> hy(N, 2.0f); <span class="comment">// y vector at host</span></div><div class="line">21:</div><div class="line">22: <span class="keywordtype">float</span> *dx{<span class="keyword">nullptr</span>}; <span class="comment">// x vector at device</span></div><div class="line">23: <span class="keywordtype">float</span> *dy{<span class="keyword">nullptr</span>}; <span class="comment">// y vector at device</span></div><div class="line">24: </div><div class="line">25: <a class="code" href="classtf_1_1Task.html">tf::Task</a> allocate_x = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><div class="line">26: [&](){ cudaMalloc(&dx, N*<span class="keyword">sizeof</span>(<span class="keywordtype">float</span>));}</div><div class="line">27: );</div><div class="line">28:</div><div class="line">29: <a class="code" href="classtf_1_1Task.html">tf::Task</a> allocate_y = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>(</div><div class="line">30: [&](){ cudaMalloc(&dy, N*<span class="keyword">sizeof</span>(<span class="keywordtype">float</span>));}</div><div class="line">31: );</div><div class="line">32:</div><div class="line">33: <a class="code" href="classtf_1_1Task.html">tf::Task</a> cudaflow = taskflow.<a class="code" href="classtf_1_1FlowBuilder.html#a796e29175380f70246cf2a5639adc437">emplace</a>([&](<a class="code" href="classtf_1_1cudaFlow.html">tf::cudaFlow</a>& cf) {</div><div class="line">34: <span class="comment">// create data transfer tasks</span></div><div class="line">35: <a class="code" href="classtf_1_1cudaTask.html">tf::cudaTask</a> h2d_x = cf.<a class="code" href="classtf_1_1cudaFlow.html#a0f36d1b7eb0bc77deb83966b394bd1d7">copy</a>(dx, hx.data(), N); <span class="comment">// host-to-device x data transfer</span></div><div class="line">36: <a class="code" href="classtf_1_1cudaTask.html">tf::cudaTask</a> h2d_y = cf.<a class="code" href="classtf_1_1cudaFlow.html#a0f36d1b7eb0bc77deb83966b394bd1d7">copy</a>(dy, hy.data(), N); <span class="comment">// host-to-device y data transfer</span></div><div class="line">37: <a class="code" href="classtf_1_1cudaTask.html">tf::cudaTask</a> d2h_x = cf.<a class="code" href="classtf_1_1cudaFlow.html#a0f36d1b7eb0bc77deb83966b394bd1d7">copy</a>(hx.data(), dx, N); <span class="comment">// device-to-host x data transfer</span></div><div class="line">38: <a class="code" href="classtf_1_1cudaTask.html">tf::cudaTask</a> d2h_y = cf.<a class="code" href="classtf_1_1cudaFlow.html#a0f36d1b7eb0bc77deb83966b394bd1d7">copy</a>(hy.data(), dy, N); <span class="comment">// device-to-host y data transfer</span></div><div class="line">39:</div><div class="line">40: <span class="comment">// launch saxpy<<<(N+255)/256, 256, 0>>>(N, 2.0f, dx, dy)</span></div><div class="line">41: <a class="code" href="classtf_1_1cudaTask.html">tf::cudaTask</a> kernel = cf.<a class="code" href="classtf_1_1cudaFlow.html#adb731be71bdd436dfb5e36e6213a9a17">kernel</a>((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy);</div><div class="line">42:</div><div class="line">43: kernel.<a class="code" href="classtf_1_1cudaTask.html#a4a9ca1a34bac47e4c9b04eb4fb2f7775">succeed</a>(h2d_x, h2d_y)</div><div class="line">44: .<a class="code" href="classtf_1_1cudaTask.html#abdd68287ec4dff4216af34d1db44d1b4">precede</a>(d2h_x, d2h_y);</div><div class="line">45: });</div><div class="line">46: cudaflow.<a class="code" href="classtf_1_1Task.html#a331b1b726555072e7c7d10941257f664">succeed</a>(allocate_x, allocate_y); <span class="comment">// overlap data allocations</span></div><div class="line">47: </div><div class="line">48: executor.<a class="code" href="classtf_1_1Executor.html#a81f35d5b0a20ac0646447eb80d97c0aa">run</a>(taskflow).wait();</div><div class="line">49:</div><div class="line">50: taskflow.<a class="code" href="classtf_1_1Taskflow.html#ac433018262e44b12c4cc9f0c4748d758">dump</a>(<a class="codeRef" doxygen="/Users/twhuang/PhD/Code/cpp-taskflow/doxygen/cppreference-doxygen-web.tag.xml:http://en.cppreference.com/w/" href="http://en.cppreference.com/w/cpp/io/basic_ostream.html">std::cout</a>); <span class="comment">// dump the taskflow</span></div><div class="line">51: }</div></div><!-- fragment --><div class="image">
<object type="image/svg+xml" data="saxpy.svg" width="50%">saxpy.svg</object>
</div>
<p>Debrief:</p>
<ul>
<li>Lines 3-9 define a saxpy kernel using CUDA </li>
<li>Lines 19-20 declare two host vectors, <code>hx</code> and <code>hy</code> </li>
<li>Lines 22-23 declare two device vector pointers, <code>dx</code> and <code>dy</code> </li>
<li>Lines 25-31 declare two tasks to allocate memory for <code>dx</code> and <code>dy</code> on device, each of <code>N*sizeof</code>(float) bytes </li>
<li>Lines 33-45 create a <a class="el" href="classtf_1_1cudaFlow.html" title="Building methods of a cuda task dependency graph. ">tf::cudaFlow</a> to capture kernel work in a graph (two host-to-device data transfer tasks, one saxpy kernel task, and two device-to-host data transfer tasks) </li>
<li>Lines 46-48 define the task dependency between host tasks and the cudaFlow tasks and execute the taskflow</li>
</ul>
<p>Cpp-Taskflow does not expend unnecessary efforts on kernel programming but focus on tasking CUDA operations with CPU work. We give users full privileges to craft a CUDA kernel that is commensurate with their domain knowledge. Users focus on developing high-performance kernels using a native CUDA toolkit, while leaving difficult task parallelism to Cpp-Taskflow. </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.14 </li>
</ul>
</div>
</body>
</html>