The purpose of this final project is to replicate the Git Re-Basin paper on a 2L MLP trained on modular addition. The goal of this project would be to build upon Nanda et al.'s work into interpreting networks that have learned moduler addition via "grokking" as shown in Power et al.'s work. With Git Re-Basin, we wish to explore how basin phenomena interacts with certain architectures/tasks, such as in this case modular addition and whether linear mode connectivity exists between models that have "grokked" any task. Another goal is to replicate the paper successfully due to previous failed attempts by others and there being no codebase which implements all three algorithms. We will be using Stanislav Fort's replication, the Git Re-Basin codebase, and Neel Nanda's codebase as a starting point for this project.
If you wish to run this project out of interest or to contribute, you can setup your machine using Miniconda or something similar to make a virtual environment. If you are on macOS or Linux you can use the following:
ENV_PATH=~/cs4644_final/.env/
cd $ENV_PATH
conda create -p $ENV_PATH python=3.10 -y
conda install -p $ENV_PATH pytorch=2.0.0 torchtext torchdata torchvision -c pytorch -y
conda run -p $ENV_PATH pip install -r requirements.txt
If you are on Windows, you can run this:
$env:ENV_PATH='c:\users\<user_name>\cs4644_final\.env'
cd cs4644_final
conda create -p $env:ENV_PATH python=3.10 -y
conda install -p $env:ENV_PATH pytorch=1.12.0 torchtext torchdata torchvision -c pytorch -y
conda run -p $ENV_PATH pip install -r requirements.txt
Below are plots taken for two experiments: training an MLP on MNIST to verify that all three algorithms are working, and an MLP that was trained on modular addition. These modular addition models are notable in that each "grokked" the task achieving 100% test accuracy after initially overfitting on training data, more can be found on the phenomenon here.
(Note: Activation Matching did not work due to nature of Embedding layer)
The performance of the rebasin algorithms on modular additions is terrible to the point any permutation destroys model performance. However, if we don't choose to permute the embedding weights, we see curves similar to the other rebasin curves, and notice that even if we permute the other weights in the model the original grokking performance still holds:
As shown however, naive interpolation still outperforms rebasin techniques, leading me to believe that each possible basin a grokked model could end up in is permutationally invariant, or that any permutation of the weights except embedding still lies in its original basin, hence why there is no rebasin benefit for these models.
As an addendum, these are interpolation plots where neither embedding/unembedding was permuted: