CS 4644 Final Project: Git Re-Basin on 2L MLP trained on Modular Addition

Purpose/Goal

The purpose of this final project is to replicate the Git Re-Basin paper on a 2L MLP trained on modular addition. The goal of this project would be to build upon Nanda et al.'s work into interpreting networks that have learned moduler addition via "grokking" as shown in Power et al.'s work. With Git Re-Basin, we wish to explore how basin phenomena interacts with certain architectures/tasks, such as in this case modular addition and whether linear mode connectivity exists between models that have "grokked" any task. Another goal is to replicate the paper successfully due to previous failed attempts by others and there being no codebase which implements all three algorithms. We will be using Stanislav Fort's replication, the Git Re-Basin codebase, and Neel Nanda's codebase as a starting point for this project.

Setup

If you wish to run this project out of interest or to contribute, you can setup your machine using Miniconda or something similar to make a virtual environment. If you are on macOS or Linux you can use the following:

ENV_PATH=~/cs4644_final/.env/
cd $ENV_PATH
conda create -p $ENV_PATH python=3.10 -y
conda install -p $ENV_PATH pytorch=2.0.0 torchtext torchdata torchvision -c pytorch -y
conda run -p $ENV_PATH pip install -r requirements.txt

If you are on Windows, you can run this:

$env:ENV_PATH='c:\users\<user_name>\cs4644_final\.env'
cd cs4644_final
conda create -p $env:ENV_PATH python=3.10 -y
conda install -p $env:ENV_PATH pytorch=1.12.0 torchtext torchdata torchvision -c pytorch -y
conda run -p $ENV_PATH pip install -r requirements.txt

Results

Below are plots taken for two experiments: training an MLP on MNIST to verify that all three algorithms are working, and an MLP that was trained on modular addition. These modular addition models are notable in that each "grokked" the task achieving 100% test accuracy after initially overfitting on training data, more can be found on the phenomenon here.

MNIST Plots

Activation Matching
Weight Matching
Straight Through Estimation

Modular Addition Plots

(Note: Activation Matching did not work due to nature of Embedding layer)

Weight Matching
Straight Through Estimation

Takeaways

The performance of the rebasin algorithms on modular additions is terrible to the point any permutation destroys model performance. However, if we don't choose to permute the embedding weights, we see curves similar to the other rebasin curves, and notice that even if we permute the other weights in the model the original grokking performance still holds:

Weight Matching (No Embedding Permutation)
Straight Through Estimation (No Embedding Permutation)

As shown however, naive interpolation still outperforms rebasin techniques, leading me to believe that each possible basin a grokked model could end up in is permutationally invariant, or that any permutation of the weights except embedding still lies in its original basin, hence why there is no rebasin benefit for these models.

As an addendum, these are interpolation plots where neither embedding/unembedding was permuted:

Weight Matching
Straight Through Estimation

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.vscode		.vscode
infra		infra
matching		matching
models		models
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
interp.ipynb		interp.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 4644 Final Project: Git Re-Basin on 2L MLP trained on Modular Addition

Purpose/Goal

Setup

Results

MNIST Plots

Modular Addition Plots

Takeaways

About

Releases

Packages

Contributors 2

Languages

afterless/cs4644_final

Folders and files

Latest commit

History

Repository files navigation

CS 4644 Final Project: Git Re-Basin on 2L MLP trained on Modular Addition

Purpose/Goal

Setup

Results

MNIST Plots

Modular Addition Plots

Takeaways

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages