precomputed-lu-solve.tex

The CRAM method using Equation~\ref{eq:part-frac-matrix-re} involves solving
\begin{equation}
\label{eq:basic-matrix-solve}
 (A - \theta I)\backslash(\alpha b)
\end{equation}
$k/2$ times for fixed $A$ and $b$ and varying $\theta$ and $\alpha$. This
solve is typically done using the LU decomposition algorithm. For the
transmutation problem the $A$ matrix is a sparse matrix with a fixed sparsity
pattern: every entry in $A$ corresponds to a transmutation of one nuclide to
another, and only certain transmutations are ever physically possible.

Furthermore, in transmutation, $A$ consists only of real entries, whereas for
$A - \theta I$, $\theta$ is a nonreal complex number. Thus, the diagonal
entries of the matrices being solved are always nonzero. As a result, pivoting
is not necessary in the LU solve algorithm. This allows precomputing the exact
LU decomposition ahead of time without knowledge of the values of $A$, beyond
which are nonzero.

Pseudocode for the LU solve algorithm for solving $Mx=b$ without pivoting is
shown in Algorithm~\ref{alg:lu-pseudocode}.~\cite{ationneeded}
\begin{algorithm}[h]
  \caption{LU solve of $Mx=b$ without pivoting.}\label{alg:lu-pseudocode}
  \begin{algorithmic}[1]
  \REQUIRE $M_{n\times n}$, $b_{n\times 1}$
  \STATE \COMMENT{First, decompose $M=L\cdot U$.}
  \STATE \COMMENT{The lower triangular part of $LU$ will be $L - I$ and the upper triangular part will be $U$.}
  \STATE
  \STATE $LU \leftarrow \mathrm{copy}(M)$
  \STATE
  \FOR{$i=1$ \TO $n$}
      \FOR{$j=i+1$ \TO $n$}
          \STATE $LU_{j, i} \leftarrow LU_{j, i}/LU_{i, i}$\label{alg:lu-pseudocode-decompose-division}
          \FOR{$k=i+1$ \TO $n$}
              \STATE $LU_{j, k} \leftarrow LU_{j, k} - LU_{j, i}\cdot LU_{i, k}$\label{alg:lu-pseudocode-reduction-line}
          \ENDFOR
      \ENDFOR
  \ENDFOR
  \STATE
  \STATE \COMMENT{Now perform the solve.}
  \STATE $x \leftarrow \mathrm{copy}(b)$
  \STATE
  \STATE \COMMENT{Forward substitution}
  \FOR{$i=1$ \TO $n$}
      \FOR{$j=1$ \TO $i$}
          \STATE $x_i \leftarrow x_i - LU_{i, j}\cdot x_j$\label{alg:lu-pseudocode-forward-substitution-line}
      \ENDFOR
  \ENDFOR
  \STATE
  \STATE \COMMENT{Backward substitution}
  \FOR{$i=n$ \TO $1$}
      \FOR{$j=i+1$ \TO $n$}
          \STATE $x_i \leftarrow x_i -LU_{i, j}\cdot x_j$\label{alg:lu-pseudocode-backward-substitution-line}
      \ENDFOR
      \STATE $x_i \leftarrow x_i/LU_{i, i}$\label{alg:lu-pseudocode-solve-division}
  \ENDFOR
  \STATE
  \ENSURE $x_{n\times 1}$

\end{algorithmic}
\end{algorithm}
Note that many steps of Algorithm~\ref{alg:lu-pseudocode} can be removed if an
element of $LU$ is known to be 0. In particular,
$LU_{j, k} \leftarrow LU_{j, k} - LU_{j, i}\cdot LU_{i, k}$
(line~\ref{alg:lu-pseudocode-reduction-line}) becomes
$LU_{j, k} \leftarrow LU_{j, k}$, a no-op, if either $LU_{j, i}$ or
$LU_{i, k}$ are 0. Similarly, $x_i \leftarrow x_i -LU_{i, j}\cdot x_j$
(lines~\ref{alg:lu-pseudocode-forward-substitution-line}
and~\ref{alg:lu-pseudocode-backward-substitution-line}) becomes
$x_i \leftarrow x_i$ if $LU_{i,j}$ is 0.

Given a sparsity pattern for $M$, the elements of $LU$ start the same
as the elements of $M$, but more elements may become nonzero through the
application of line~\ref{alg:lu-pseudocode-reduction-line}. Starting with a
set of nonzero indices of $M$, $IJ=\{(i, j) | M_{i, j} \neq 0\}$, one can
recursively compute a set of nonzero indices for $LU$. The pseudocode for this
algorithm is shown in Algorithm~\ref{alg:make-ijk}. The idea is to mirror the
decomposition loop from Algorithm~\ref{alg:lu-pseudocode}, adding new index
pairs to the set of nonzero indices $IJK$ whenever both $(j, i)$ and $(i, k)$
are also in the set.

\begin{algorithm}[h]
  \caption{Generate the set of nonzero entries of $LU$ given a set of nonzero
    entries of $M_{n,n}$.}\label{alg:make-ijk}
  \begin{algorithmic}[1]
  \REQUIRE $IJ=\{(i, j) | M_{i, j} \neq 0\}$
  \STATE $IJK \leftarrow \mathrm{copy}(IJ)$
  \STATE
  \FOR{$i=1$ \TO $n$}
      \FOR{$j=i+1$ \TO $n$}
          \IF{$(j, i) \notin IJK$}
              \CONTINUE
          \ENDIF
          \STATE
          \FOR{$k=i+1$ \TO $n$}
              \IF{$(i, k) \in IJK$ \AND $(j, k) \notin IJK$}
                   \STATE Add $(j, k)$ to $IJK$
               \ENDIF
          \ENDFOR
      \ENDFOR
  \ENDFOR
  \ENSURE $IJK=\{(i,j)|LU_{i,j}\mathrm{\ is\ potentially\ nonzero\ in\ Algorithm~\ref{alg:lu-pseudocode}}\}$
\end{algorithmic}
\end{algorithm}

Critically, for the transmutation matrix, it \textit{is} known ahead of time
which entries of $A$ are nonzero. Thus, the steps above can be reduced or
eliminated for the entries that are known to be zero. As noted above, the
diagonal entries for solves used for the real transmutations matrices are
always nonzero. Thus, the divisions by $LU_{i,i}$ in
line~\ref{alg:lu-pseudocode-decompose-division} are always by a nonzero
number.

Additionally, the roots of the denominators of the CRAM approximations
($\theta_i$) do not correspond to the eigenvalues of the transmutation matrix,
so the solve of Equation~\ref{eq:basic-matrix-solve} is never degenerate, that
is, the divisions by $LU_{i,i}$ in the solve stage
(line~\ref{alg:lu-pseudocode-solve-division}) is never by zero.

Because no pivoting is done in the precomputed LU solve, the order of the
entries in the matrix is important for efficiency. The most obvious way to
order matrix entries is to order the nuclides by the charge of the nucleus
(atomic number) followed by atomic mass number and state number
(ZAS)~\cite{ationneeded}. However, a much more efficient order is to order by
atomic mass number followed by the charge of the nucleus and state number
(CINDER). The sparsity pattern generated from the PyNE data has 35256 nonzero
entries, including the diagonals. Using the ZAS ordering, an additional 20194
zero entries must be considered for the LU solve, whereas for the CINDER
ordering, only 14082 additional entries must be considered. This is primarily
because the CINDER ordering results in entries that are closer to the diagonal
(see Figure~\ref{fig:lu-solve-ordering}). The CINDER ordering directly
correlates to a $1.2\times$ speedup over ZAS in the precomputed LU solve.

\begin{figure}[!ht]
\centering
\includegraphics[width=\textwidth]{lu-solve-ordering.pdf}
\caption{A subset of the transmutation matrix sparsity pattern from PyNE data
  with ZAS ordering (left) and Cinder ordering (Right).}
\label{fig:lu-solve-ordering}
\end{figure}