Merge remote-tracking branch 'refs/remotes/origin/dev' into dev

tsung-wei-huang · tsung-wei-huang · commit bedb18a72fa9 · 2018-10-18T16:54:05.000-05:00
diff --git a/doc/app/wavefront.md b/doc/app/wavefront.md
@@ -0,0 +1,141 @@
+# Wavefront
+
+This page we compare the three implementations of wavefront computing pattern using OpenMP, Intel-TBB and Cpp-Taskflow.
+The wavefront computing pattern is from the blog in [Intel Developer Zone]. 
+
+![](wavefront.png)
+
+As shown in the figure, we partition a 2D matrix into a set of identical square sub-matrices (blocks). 
+Each submatrix is mapped to a task that performs a linear scan through each element and 
+apply some arithmetic calculation. The wavefront propagates the block dependency diagonally 
+from the top-left submatrix to the bottom-right submatrix. Each block precedes two blocks, one to the
+right and another below. The blocks with the same color can run concurrently.
+
+
++ [OpenMp](#openmp)
++ [Intel-TBB](#intel-tbb)
++ [Cpp-Taskflow](#cpp-taskflow)
+
+
+# OpenMP 
+
+```cpp
+1:  // MB, NB: number of blocks in the two dimensions. B: dimension of a block
+2:  // matrix: the given 2D matrix 
+3:  // D: dependency matrix 
+4:  void wavefront(size_t MB, size_t NB, size_t B, double** matrix, int** D){
+5:    omp_set_num_threads(std::thread::hardware_concurrency());
+6:    #pragma omp parallel
+7:    {
+8:      #pragma omp single
+9:      {
+10:       for(int i=0; i<MB; i++){
+11:         for(int j=0; j<NB; j++) {
+12:           if(i > 0 && j > 0){
+13:             #pragma omp task depend(in:D[i-1][j], D[i][j-1]) depend(out:D[i][j]) firstprivate(i, j)
+14:             block_computation(matrix, B, i, j);
+15:           }
+16:           // Top left corner
+17:           else if(i == 0 && j == 0){
+18:             #pragma omp task depend(out:D[i][j]) firstprivate(i, j)
+19:             block_computation(matrix, B, i, j);
+20:           }
+21:           // Top edge  
+22:           else if(j+1 <= NB && i == 0 && j > 0){
+23:             #pragma omp task depend(in:D[i][j-1]) depend(out:D[i][j]) firstprivate(i, j)
+24:             block_computation(matrix, B, i, j);
+25:           }
+26:           // Left edge
+27:           else if(i+1 <= MB && i > 0 && j == 0){
+28:             #pragma omp task depend(in:D[i-1][j]) depend(out:D[i][j]) firstprivate(i, j)
+29:             block_computation(matrix, B, i, j);
+30:           }
+31:           // Bottom right corner
+32:           else{
+33:             #pragma omp task depend(in:D[i-1][j] ,D[i][j-1]) firstprivate(i, j)
+34:             block_computation(matrix, B, i, j);
+35:           }
+36:         } // End of inner loop
+37:       }  // End of outer loop
+38:     } // End of omp single 
+39:   } // End of omp parallel 
+40: }
+```
+
+This function shows the wavefront computing implemented using OpenMP. Each
+block is delegated to a OpenMP task. For each task we need to explicitly specify both the
+input and output depedency and an additional depedency matrix `D` is
+created for this purpose.
+
+
+# Intel-TBB
+
+```cpp 
+1:  using namespace tbb;
+2:  using namespace tbb::flow;
+3:  
+4:  // MB, NB: number of blocks in the two dimensions. B: dimension of a block
+5:  // matrix: the given 2D matrix   
+6:  // nodes: the nodes in flow graph
+7:  // G: Intel-TBB flow graph
+8:  void wavefront(size_t MB, size_t NB, size_t B, double** matrix, continue_node<continue_msg> ***nodes, Graph& G){ 
+9:   for(int i=MB; --i>=0;) { 
+10:     for(int j=NB; --j>=0;) {
+11:       node[i][j] = new continue_node<continue_msg>(G,
+12:         [=](const continue_msg&) {
+13:           block_computation(matrix, i, j); 
+14:       });
+15:       if(i+1 < MB) {
+16:          make_edge(*node[i][j], *node[i+1][j]);
+17:       }
+18:       if(j+1 < NB) {
+19:          make_edge(*node[i][j], *node[i][j+1]);
+20:       } 
+21:     } // End of inner loop
+22:   } // End of outer loop
+23:  
+24:   nodes[0][0]->try_put(continue_msg());
+25:   G.wait_for_all();
+26: }
+```
+
+This function shows the wavefront computing implemented using Intel-TBB flow graph. We 
+build a depedency graph using the `continue_node` type in TBB flow graph and delegate 
+each block to a node. The `make_edge` function specifies the depedency between two nodes 
+and calling `wait_for_all` waits until all computations complete.
+
+# Cpp-Taskflow
+
+```cpp 
+1:  // MB, NB: number of blocks in the two dimensions. B: dimension of a block
+2:  // matrix: the given 2D matrix   
+3:  // tasks: the placeholders for tasks in Taskflow
+4:  // tf: Taskflow object
+5:  void wavefront(size_t MB, size_t NB, size_t B, double** matrix, std::vector<std::vector<tf::Task>>& tasks, tf::Taskflow& tf){ 
+6:    for(int i=MB; --i>=0;) { 
+7:      for(int j=NB; --j>=0;) { 
+8:        task[i][j].work([=]() {
+9:          block_computation(matrix, B, i, j); 
+10:       });  
+11:       if(j+1 < NB) {
+12:         task[i][j].precede(task[i][j+1]);
+13:       }
+14:       if(i+1 < MB) {
+15:         task[i][j].precede(task[i+1][j]);
+16:       }
+17:     } // End of inner loop
+18:   } // End of outer loop
+19:
+20:   tf.wait_for_all();
+21: }
+```
+This function shows the wavefront computing implemented using Cpp-Taskflow. We
+delegate each block to a `tf::Task` and use the `precede` function to specify
+the dependency between tasks. The `tf.wait_for_all()` blocks until all tasks
+are executed.
+
+
+* * *
+
+[GraphvizOnline]:        https://dreampuf.github.io/GraphvizOnline/
+[Intel Developer Zone]:  https://software.intel.com/en-us/blogs/2011/09/09/implementing-a-wave-front-computation-using-the-intel-threading-building-blocks-flow-graph
diff --git a/doc/app/wavefront.png b/doc/app/wavefront.png
diff --git a/doc/cookbook/dynamic_tasking.md b/doc/cookbook/dynamic_tasking.md
@@ -6,9 +6,9 @@ In Cpp-Taskflow, we call this *dynamic tasking*.
 In this tutorial, we are going to demonstrate how to enable dynamic tasking
 in Cpp-Taskflow.
 
-+ [Subflow Dependency Graph](#Subflow-Dependency-Graph)
-+ [Detach a Subflow Dependency Graph](#Detach-a-Subflow-Dependency-Graph)
-+ [Nested Subflow](#Nested-Subflow)
++ [Subflow Dependency Graph](#subflow-dependency-graph)
++ [Detach a Subflow Dependency Graph](#detach-a-subflow-dependency-graph)
++ [Nested Subflow](#nested-subflow)
 
 # Subflow Dependency Graph
 
diff --git a/doc/cookbook/hello_world.md b/doc/cookbook/hello_world.md
@@ -3,9 +3,9 @@
 In this tutorial, we are going to demonstrate how to write a Cpp-Taskflow's
 "hello world" program.
 
-+ [Set up Cpp-Taskflow](#Set-up-Cpp-Taskflow)
-+ [Create a Simple Taskflow Graph](#Create-a-Simple-Taskflow-Graph)
-+ [Compile and Run](#Compile-and-Run)
++ [Set up Cpp-Taskflow](#set-up-cpp-taskflow)
++ [Create a Simple Taskflow Graph](#create-a-simple-taskflow-graph)
++ [Compile and Run](#compile-and-run)
 
 # Set up Cpp-Taskflow
 
diff --git a/doc/cookbook/parallel_for.md b/doc/cookbook/parallel_for.md
@@ -3,10 +3,10 @@
 In this tutorial, we are going to demonstrate how to use Cpp-Taskflow
 to run a for loop in parallel.
 
-+ [Range-based For Loop](#Range-based-For-Loop)
-+ [Index-based For Loop](#Index-based-For-Loop)
-+ [Example 1: Parallel Map](#Example-1-Parallel-Map)
-+ [Example 2: Pipeline a Parallel For](#Example-2-Pipeline-a-Parallel-For)
++ [Range-based For Loop](#range-based-for-loop)
++ [Index-based For Loop](#index-based-for-loop)
++ [Example 1: Parallel Map](#example-1-parallel-map)
++ [Example 2: Pipeline a Parallel For](#example-2-pipeline-a-parallel-for)
 
 # Range-based For Loop
 
diff --git a/doc/cookbook/reduce.md b/doc/cookbook/reduce.md
@@ -5,11 +5,11 @@ through particular operations, for instance, sum.
 In this example, we are going to demonstrate how to use Cpp-Taskflow
 to parallelize a reduction workload.
 
-+ [Reduce](#Reduce-through-an-Operator)
-+ [Transform and Reduce](#Transform-and-Reduce)
-+ [Example 1: Find the Min/Max Element](#Example-1-Find-the-Min-Max-Element)
-+ [Example 2: Pipeline a Reduction Graph](#Example-2-Pipeline-a-Reduction-Graph)
-+ [Example 3: Find the Minimum L1-norm](#Example-3-Find-the-Minimum-L1-norm)
++ [Reduce](#reduce-through-an-operator)
++ [Transform and Reduce](#transform-and-reduce)
++ [Example 1: Find the Min/Max Element](#example-1-find-the-min-max-element)
++ [Example 2: Pipeline a Reducer Graph](#example-2-pipeline-a-reducer-graph)
++ [Example 3: Find the Minimum L1-norm](#example-3-find-the-minimum-L1-norm)
 
 # Reduce
 
diff --git a/doc/cookbook/task.md b/doc/cookbook/task.md
@@ -3,11 +3,11 @@
 In this tutorial, we are going to demonstrate the basic construct of 
 a task dependency graph - *Task*.
 
-+ [Create a Task](#Create-a-Task)
-+ [Access the Result of a Task](#Access-the-Result-of-a-Task)
-+ [Create Multiple Tasks at One Time](#Create-Multiple-Tasks-at-One-Time)
-+ [Example 1: Create Multiple Dependency Graphs](#Example-1-Create-Multiple-Dependency-Graphs)
-+ [Example 2: Modify Task Attributes](#Example-2-Modify-Task-Attributes)
++ [Create a Task](#create-a-task)
++ [Access the Result of a Task](#access-the-result-of-a-task)
++ [Create Multiple Tasks at One Time](#create-multiple-tasks-at-one-time)
++ [Example 1: Create Multiple Dependency Graphs](#example-1-create-multiple-dependency-graphs)
++ [Example 2: Modify Task Attributes](#example-2-modify-task-attributes)
 
 # What is a Task?
 
diff --git a/doc/home.md b/doc/home.md
@@ -19,11 +19,8 @@ software architecture, C++ API, and library usages.
 + [Reduce a Container of Items in Parallel](cookbook/reduce.md)
 + [Spawn a Task Dependency Graph at Runtime](cookbook/dynamic_tasking.md)
 
-<!--# Application Programming Interface (API)
-
-+ Task
-+ Taskflow
--->
+# Application 
++ [Wavefront pattern](app/wavefront.md)
 
 # Get More Help
 
diff --git a/example/simple.cpp b/example/simple.cpp
@@ -20,7 +20,7 @@ int main(){
   A.precede(C);  // C runs after A         //    |     +---+     |            
   B.precede(D);  // D runs after B         //    +---->| C |-----+            
   C.precede(D);  // D runs after C         //          +---+                  
-                                                     
+
   tf.wait_for_all();  // block until finished
 
   return 0;
diff --git a/example/taskflow.cpp b/example/taskflow.cpp
diff --git a/taskflow/taskflow.hpp b/taskflow/taskflow.hpp