Alex Graves, Greg Wayne, Ivo Danihelka, ArXiv, 2014
Neural Turing Machine (NTM) consists of a neural network controller interacting with a working memory bank in a learnable manner. This is analogous to computers — controllers = CPU (hidden activations as registers) and memory matrix = RAM. Key ideas:
-
Controller (modified RNN) interacts with external world via input and output vectors, and with memory via read and write "heads"
-
"Read" vector is a convex combination of row-vectors of M_t (memory matrix at time t) — r_t = \sum w_t(i) M_t(i) where w_t is a vector of weightings over N memory locations
-
"Writing" is decomposed into 1) erasing and 2) adding
- The write head produces the erase vector e_t and the add vector a_t along with the vector of weightings over memory locations w_t
- M_t(i) = M_{t-1}(i)[1 - w_t(i) e_t] + w_t(i) a_t
- Erase and add vectors control which components of memory are updated, while weightings w_t control which locations are updated
-
Weight vectors are produced by an addressing mechanism
-
Content-based addressing
- Each head produces length M key k_t that is compared to each vector M_t(i) by cosine similarity and a temperature parameter. The weightings are normalized (softmax).
-
Location-based addressing
- Interpolation: Each head produces interpolation gate g_t that is used to blend between weighting at previous time step and the content weighting of current tilmestep w^{g}_t = g_t w^{c}_t + (1-g_t)w_{t-1}
- Shift: Circular convolution (modulo N) with a shift weighting distribution, for example softmax over integer shift positions (say 3 locations)
- Sharpening: Each head emits \gamma_t to sharpen the final weighting
-
-
Experiments on copy, repeat-copy, associative memory, N-gram emulator and priority sort