updated core

Tsung-Wei Huang · Tsung-Wei Huang · commit 3a7e4ab476b2 · 2018-06-30T12:44:27.000-05:00
diff --git a/README.md b/README.md
@@ -213,7 +213,8 @@ The table below summarizes its commonly used methods.
 | placeholder     | none        | task         | insert a node without any work; work can be assigned later |
 | linearize       | task list   | none         | create a linear dependency in the given task list |
 | parallel_for    | beg, end, callable, group | task pair | apply the callable in parallel and group-by-group to the result of dereferencing every iterator in the range | 
-| reduce | beg, end, res, op, group | task pair | apply a binary operator group-by-group to reduce a range of elements to a single result | 
+| reduce | beg, end, res, bop, group | task pair | apply a binary operator group-by-group to reduce a range of elements to a single result | 
+| transform_reduce | beg, end, res, bop, uop, group | task pair | apply a unary operator to each element in the range and reduce the returns to a single result group-by-group through a binary operator | 
 | dispatch        | none        | future | dispatch the current graph and return a shared future to block on completeness |
 | silent_dispatch | none        | none | dispatch the current graph | 
 | wait_for_all    | none        | none | dispatch the current graph and block until all graphs including previously dispatched ones finish |
@@ -279,13 +280,13 @@ auto [S, T] = tf.parallel_for(
   v.end(),      // end of range
   [] (int i) { 
     std::cout << "parallel in " << i << '\n';
-  }
+  },
+  1  // execute one task at a time
 );
-
 // add dependencies via S and T.
 ```
 
-By default, the group size is 1. Changing the group size can force intra-group tasks to run sequentially
+Changing the group size can force intra-group tasks to run sequentially
 and inter-group tasks to run in parallel.
 Depending on applications, different group sizes can result in significant performance hit.
 
@@ -304,13 +305,13 @@ auto [S, T] = tf.parallel_for(
 );
 ```
 
-### *reduce*
+### *reduce/transform_reduce*
 
 The method `reduce` creates a subgraph that applies a binary operator to a range of items in a container.
 The result will be stored in the referenced `res` object passed to the method. 
 It is your responsibility to assign it a correct initial value to reduce.
 
-<img align="right" width="50%" src="image/reduce.png">
+<img align="right" width="45%" src="image/reduce.png">
 
 ```cpp
 auto v = {1, 2, 3, 4};
@@ -319,13 +320,13 @@ auto [S, T] = tf.reduce(
   v.begin(),    // beg of range
   v.end(),      // end of range
   sum,          // pass by reference
-  std::plus<int>()
+  std::plus<int>(),
+  1             // execute one task at a time
 );
-
 // add dependencies via S and T.
 ```
 
-By default, the group size is 1. Changing the group size can force intra-group tasks to run sequentially
+Changing the group size can force intra-group tasks to run sequentially
 and inter-group tasks to run in parallel.
 Depending on applications, different group sizes can result in significant performance hit.
 
@@ -339,6 +340,21 @@ auto [S, T] = tf.reduce(
 );  
 ```
 
+The method `transform_reduce` is similar to reduce, except it applies a unary operator before reduction.
+This is particular useful when you need additional data processing to reduce a range of elements.
+
+```cpp
+auto v = { {1, 5}, {6, 4}, {-6, 4} };
+int min = std::numeric_limits<int>::max();
+auto [S, T] = tf.transform_reduce(v.begin(), v.end(), min, 
+  [] (int l, int r)  { return std::min(l, r); },
+  [] (const Data& d) { return a*a + 2*a*b + b*b; }
+);
+```
+
+All reduce methods have overloads of no group size, 
+in which the workload is evenly partitioned across threads.
+
 ### *dispatch/silent_dispatch/wait_for_all*
 Dispatching a taskflow graph will schedule threads to execute the current graph and return immediately.
 The method `dispatch` gives you a future object to probe the execution progress while
@@ -445,6 +461,7 @@ To use Cpp-Taskflow, you only need a C++17 compiler:
 # Compile Unit Tests and Examples
 Cpp-Taskflow uses [CMake](https://cmake.org/) to build examples and unit tests.
 We recommend using out-of-source build.
+
 ```bash
 ~$ cmake --version  # must be at least 3.9 or higher
 ~$ mkdir build
diff --git a/example/parallel_for.cpp b/example/parallel_for.cpp
@@ -32,7 +32,7 @@ void taskflow(int N) {
   tf::Taskflow tf;
   tf.parallel_for(range, [&] (const int i) { 
     printf("fib[%d]=%d\n", i, fib(i));
-  }, 1);
+  });
   tf.wait_for_all();
 
   auto tend = std::chrono::steady_clock::now();
diff --git a/image/reduce.png b/image/reduce.png
diff --git a/taskflow.hpp b/taskflow.hpp
@@ -93,7 +93,7 @@ template <typename T>
 inline constexpr bool is_iterable_v = is_iterable<T>::value;
 
 //-------------------------------------------------------------------------------------------------
-// Threadpool definition
+// Utility
 //-------------------------------------------------------------------------------------------------
 
 // Struct: MoveOnCopy
@@ -111,7 +111,9 @@ struct MoveOnCopy {
 template <typename T>
 MoveOnCopy(T&&) -> MoveOnCopy<T>;
 
-// ------------------------------------------------------------------------------------------------
+//-------------------------------------------------------------------------------------------------
+// Threadpool definition
+//-------------------------------------------------------------------------------------------------
 
 // Class: Threadpool
 class Threadpool {
diff --git a/unittest/taskflow.cpp b/unittest/taskflow.cpp
@@ -358,7 +358,6 @@ TEST_CASE("Taskflow.ReduceMax") {
       REQUIRE(test == gold);
     }
   }
-
 }
 
 /*// --------------------------------------------------------

Original file line number	Diff line number	Diff line change
`@@ -358,7 +358,6 @@ TEST_CASE("Taskflow.ReduceMax") {`
`358`	`358`	`REQUIRE(test == gold);`
`359`	`359`	`}`
`360`	`360`	`}`
`361`		`-`
`362`	`361`	`}`
`363`	`362`
`364`	`363`	`/*// --------------------------------------------------------`