CHANGELOG

v3.18.2 (2024-08-25)

Fix

fix: gemma scope saes yml. 16k for Gemma 2 9b was missing entries. (#266)
add missing saes, 16k was missing for 9b att and mlp
remove file name not needed (86c04ac)

v3.18.1 (2024-08-23)

Chore

chore: adding more metatadata to pyproject.toml for PyPI (#263) (5c2d391)

Fix

fix: modify duplicate neuronpedia ids in config.yml, add test. (#265)
fix duplicate ids
fix test that had mistake (0555178)

v3.18.0 (2024-08-22)

Feature

feat: updated pretrained yml gemmascope and neuronpedia ids (#264) (a3cb00d)

v3.17.1 (2024-08-18)

Fix

fix: fix memory crash when batching huge samples (#262) (f0bec81)

v3.17.0 (2024-08-16)

Feature

feat: add end-2-end saes from Braun et al to yaml (#261) (1d4eac1)

v3.16.0 (2024-08-15)

Feature

feat: make canonical saes for attn (#259) (ed2437b)

v3.15.0 (2024-08-14)

Chore

chore: updating slack link in docs (#255) (5c7595a)

Feature

feat: support uploading and loading arbitrary huggingface SAEs (#258) (5994827)

Unknown

Remove duplicate link (#256) (c40f1c5)
Update index.md (#257)

removes comment asking for table creation and links to it (1e185b3)

Merge pull request #244 from jbloomAus/add_pythia_70m_saes

Added pythia-70m SAEs to yaml (022f1de)

Merge branch 'main' into add_pythia_70m_saes (32901f2)

v3.14.0 (2024-08-05)

Feature

feat: GemmaScope SAEs + fix gemma-scope in docs (#254) (3da4cee)

Unknown

More complete set of Gemma Scope SAEs (#252)
commit for posterity
ignore pt files in home
add canonical saes
improve gemma 2 loader
better error msg on wrong id
handle config better
handle hook z weirdness better
add joseph / curt script
add gemma scope saes
format
make linter happy (68de42c)
Updated dashes (7c7a271)
Changed gemma repo to google (fa483f0)
Fixed pretrained_saes.yaml Gemma 2 paths (920b77e)
Gemma2 2b saes (#251)
Added support for Gemma 2
minor fixes
format
remove faulty error raise

Co-authored-by: jbloomAus <jbloomaus@gmail.com> (df273c4)

v3.13.1 (2024-07-31)

Fix

fix: update automated-interpretability dep to use newly released version (#247)
fix: update automated-interpretability dep to use newly released version
fixing / ignore optim typing errors (93b2ebe)

Unknown

Tutorial 2.0 (#250)
tutorial 2.0 draft
minor changes
Various additions to tutorial
Added ablation
better intro text
improve content further
fix steering
fix ablation to be true ablation
current tutorial

Co-authored-by: curt-tigges <ct@curttigges.com> (fe27b7c)

Fix typo in readme (#249) (fe987f1)
Merge pull request #242 from jbloomAus/add_openai_gpt2_small_saes

Added OpenAI TopK SAEs to pretrained yaml (2c1cbc4)

Added pythia-70m SAEs to yaml (25fb167)
Neuronpedia API key is now in header, not in body (#243) (caacef1)
Merge pull request #237 from jbloomAus/use_error_term_param

Use error term param (ac86d10)

Update pyproject.toml (4b032ab)
Added OpenAI TopK SAEs to pretrained yaml (7463e9f)

v3.13.0 (2024-07-18)

Feature

feat: validate that pretokenized dataset tokenizer matches model tokenizer (#215)

Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (c73b811)

Unknown

add more bootleg gemma saes (#240)
add more bootleg gemma saes
removed unused import (22a0841)

v3.12.5 (2024-07-18)

Fix

fix: fixing bug with cfg loading for fine-tuning (#241) (5a88d2c)

Unknown

Update deploy_docs.yml

Removed the Debug Info step that was causing issues. (71fd509)

More tests for the negative case (a0b0f54)
Upped version (845d5d7)
Added tests (e5ff793)

v3.12.4 (2024-07-17)

Fix

fix: Trainer eval config will now respect trainer config params (#238)
Trainer eval config will now respect trainer config params
Corrected toml version (5375505)

Unknown

Neuronpedia Autointerp/Explanation Improvements (#239)
Neuronpedia autointerp API improvements: new API, new flags for save to disk and test key, fix bug with scoring disabled
Ignore C901 (ba7d218)
Fixed toml file (8211cac)
Ensured that even detatched SAEs are returned to former state (90ac661)
Added use_error_term functionality to run_with_x functions (1531c1f)
Added use_error_term to hooked sae transformer (d172e79)
Trainer will now fold and log estimated norm scaling factor (#229)
Trainer will now fold and log estimated norm scaling factor after doing fit
Updated tutorials to use SAEDashboard
fix: sae hook location (#235)
3.12.2

Automatically generated by python-semantic-release

fix: sae to method (#236)
3.12.3

Automatically generated by python-semantic-release

Trainer will now fold and log estimated norm scaling factor after doing fit
Added functionality to load and fold in precomputed scaling factors from the YAML directory
Fixed toml

Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> (8d38d96)

Update README.md (172ad6a)

v3.12.3 (2024-07-15)

Fix

fix: sae to method (#236) (4df78ea)

v3.12.2 (2024-07-15)

Fix

fix: sae hook location (#235) (94ba11c)

Unknown

Updated tutorials to use SAEDashboard (db89dbc)
Merge branch 'JoshEngels-Evals' (fe66285)
Removed redundant lines (29e3aa2)
Merge branch 'Evals' of https://github.com/JoshEngels/SAELens into JoshEngels-Evals (7b84053)

v3.12.1 (2024-07-11)

Fix

fix: force release of dtype_fix (bfe7feb)

Unknown

Merge pull request #225 from jbloomAus/dtype_fix

fix: load_from_pretrained should not require a dtype nor default to float32 (71d9da8)

TrainingSAE should: 1) respect device override and 2) not default to float32 dtype, and instead default to the SAE's dtype (a4a1c46)
load_from_pretrained should not require a dtype nor default to float32 (a485dc0)
Fix SAE failing to upload to wandb due to artifact name. (#224)
Fix SAE artifact name.
format

Co-authored-by: Joseph Bloom <jbloomaus@gmail.com> (6ae4849)

v3.12.0 (2024-07-09)

Feature

feat: use TransformerLens 2 (#214)
Updated pyproject.toml to use TL ^2.0, and to use fork of sae-vis that also uses TL ^2.0
Removed reliance on sae-vis
Removed neuronpedia tutorial
Added error handling for view operation
Corrected formatting (526e736)

Unknown

Fix/allow device override (#221)
Forced load_from_pretrained to respect device and dtype params
Removed test file (697dd5f)
Fixed hooks for single head SAEs (#219)
included zero-ablation-hook for single-head SAEs
fixed a typo in single_head_replacement_hook (3bb4f73)

v3.11.2 (2024-07-08)

Fix

fix: rename encode_fn to encode and encode to encode_standard (#218) (8c09ec1)

v3.11.1 (2024-07-08)

Fix

fix: avoid bfloat16 errors in training gated saes (#217) (1e48f86)

Unknown

Update README.md (9adba61)
Update deploy_docs.yml

Modified this file to install dependencies (using caching for efficiency). (e90d5c1)

Adding type hint (5da6a13)
Actually doing merge (c362e81)
Merge remote-tracking branch 'upstream/main' into Evals (52780c0)
Making changes in response to comments (cf4ebcd)

v3.11.0 (2024-07-04)

Feature

feat: make pretrained sae directory docs page (#213)
make pretrained sae directory docs page
type issue weirdness
type issue weirdness (b8a99ab)

v3.10.0 (2024-07-04)

Feature

feat: make activations_store re start the dataset when it runs out (#207)
make activations_store re start the dataset when it runs out
remove misleading comments
allow StopIteration to bubble up where appropriate
add test to ensure that stopiteration is raised
formatting
more formatting
format tweak so we can re-try ci
add deps back (91f4850)
feat: allow models to be passed in as overrides (#210) (dd95996)

Fix

fix: Activation store factor unscaling fold fix (#212)
add unscaling to evals
fix act norm unscaling missing
improved variance explained, still off for that prompt
format
why suddenly a typingerror and only in CI? (1db84b5)

v3.9.2 (2024-07-03)

Fix

fix: Gated SAE Note Loading (#211)
fix: add tests, make pass
not in (b083feb)

Unknown

SAETrainingRunner takes optional HFDataset (#206)
SAETrainingRunner takes optional HFDataset
more explicit errors when the buffer is too large for the dataset
format
add warnings when a new dataset is added
replace default dataset with empty string
remove valueerror (2c8fb6a)

v3.9.1 (2024-07-01)

Fix

fix: pin typing-extensions version (#205) (3f0e4fe)

v3.9.0 (2024-07-01)

Feature

feat: OpenAI TopK SAEs for residual stream of GPT2 Small (#201)
streamlit app
feat: basic top-k support + oai gpt2small saes
fix merge mistake (06c4302)

Unknown

prevent context size mismatch error (#200) (76389ac)
point gpt2 dataset path to apollo-research/monology-pile (#199) (d3eb427)

v3.8.0 (2024-06-29)

Feature

feat: harmize activation store and pretokenize runner (#181)
eat: harmize activation store and pretokenize runner
reverting SAE cfg back to prepend_bos
adding a benchmark test
adding another test
adding list of tokenized datasets to docs
adding a warning message about lack of pre-tokenization, and linking to SAELens docs
fixing tests after apollo deleted sae- dataset versions
Update training_saes.md (2e6a3c3)

Unknown

Updating example commands (265687c)
Fixing test (389a159)
Adding script to evals.py (f9aa2dd)
Moving file (4be5011)
First round of evals (2476afb)

v3.7.0 (2024-06-25)

Feature

feat: new saes for gemma-2b-it and feature splitting on gpt2-small-layer-8 (#195) (5cfe382)

v3.6.0 (2024-06-25)

Feature

feat: Support Gated-SAEs (#188)
Initial draft of encoder
Second draft of Gated SAE implementation
Added SFN loss implementation
Latest modification of SFN loss training setup
fix missing config use
dont have special sfn loss
add hooks and reshape
sae error term not working, WIP
make tests pass
add benchmark for gated

Co-authored-by: Joseph Bloom <jbloomaus@gmail.com> (232c39c)

Unknown

fix hook z loader (#194) (cb30996)

v3.5.0 (2024-06-20)

Feature

feat: trigger release (1a4663b)

Unknown

Performance improvements + using multiple GPUs. (#189)
fix: no grads when filling cache
trainer should put activations on sae device
hack to allow sae device to be specific gpu when model is on multiple devices
add some tests (not in CI, which check multiple GPU performance
make formatter typer happy
make sure SAE calls move data between devices as needed (400474e)

v3.4.1 (2024-06-17)

Fix

fix: allow settings trust_remote_code for new huggingface version (#187)
fix: allow settings trust_remote_code for new huggingface version
default to True, not none

Co-authored-by: jbloomAus <jbloomaus@gmail.com> (33a612d)

v3.4.0 (2024-06-14)

Feature

feat: Adding Mistral SAEs (#178)

Note: normalize_activations is now a string and should be either 'none', 'expected_average_only_in' (Anthropic April Update, not yet folded), 'constant_norm_rescale' (Anthropic Feb update).

Adding code to load mistral saes
Black formatting
Removing library changes that allowed forward pass normalization
feat: support feb update style norm scaling for mistral saes
Adding code to load mistral saes
Black formatting
Removing library changes that allowed forward pass normalization
Adding code to load mistral saes
Black formatting
Removing library changes that allowed forward pass normalization
feat: support feb update style norm scaling for mistral saes
remove accidental inclusion

Co-authored-by: jbloomAus <jbloomaus@gmail.com> (227d208)

Unknown

Update README.md Slack Link Expired (this one shouldn't expire) (209696a)
add expected perf for pretrained (#179)

Co-authored-by: jbloom-md <joseph@massdynamics.com> (10bd9c5)

fix progress bar updates (#171) (4d92975)

v3.3.0 (2024-06-10)

Feature

feat: updating docs and standardizing PretokenizeRunner export (#176) (03f071b)

Unknown

add tutorial (#175) (8c67c23)

v3.2.3 (2024-06-05)

Fix

fix: allow tutorial packages for colab install to use latest version (#173)

fix: allow tutorial packages for colab install to use latest version (#173) (f73cb73)

Unknown

fix pip install in HookedSAETransformer Demo (#172) (5d0faed)

v3.2.2 (2024-06-02)

Fix

fix: removing truncation in activations store data loading (#62) (43c93e2)

v3.2.1 (2024-06-02)

Fix

fix: moving non-essential deps to dev (#121) (1a2cde0)

v3.2.0 (2024-05-30)

Feature

feat: activation norm scaling factor folding (#170)
feat: add convenience function for folding scaling factor
keep playing around with benchmark (773e308)

v3.1.1 (2024-05-29)

Fix

fix: share config defaulting between hf and local loading (#169) (7df479c)

v3.1.0 (2024-05-29)

Feature

feat: add w_dec_norm folding (#167)
feat: add w_dec_norm folding
format (f1908a3)

Unknown

Fixed typo in Hooked_SAE_Transformer_Demo.ipynb preventing Open in Colab badge from working (#166)

Minor typo in file name was preventing Hooked_SAE_Transformer_Demo.ipynb "Open in Colab" badge from working. (4850b16)

Fix hook z training reshape bug (#165)
remove file duplicate
fix: hook-z evals working, and reshaping mode more explicit (0550ae3)

v3.0.0 (2024-05-28)

Breaking

feat: refactor SAE code

BREAKING CHANGE: renamed and re-implemented paths (3c67666)

Unknown

major: trigger release

BREAKING CHANGE: https://python-semantic-release.readthedocs.io/en/latest/commit-parsing.html#commit-parser-angular

BREAKING CHANGE: (fac8533)

major: trigger release

BREAKING CHANGE: trigger release (apparently we need a newline) (90ed2c2)

BREAKING CHANGE: Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. (#162)
move HookedSAETransformer from TL
add tests
move runners one level up
fix docs name
trainer clean up
create training sae, not fully seperate yet
remove accidentally commited notebook
commit working code in the middle of refactor, more work to do
don't use act layers plural
make tutorial not use the activation store
moved this file
move import of toy model runner
saes need to store at least enough information to run them
further refactor and add tests
finish act store device rebase
fix config type not caught by test
partial progress, not yet handling error term for hooked sae transformer
bring tests in line with trainer doing more work
revert some of the simplification to preserve various features, ghost grads, noising
hooked sae transformer is working
homogenize configs
re-enable sae compilation
remove old file that doesn't belong
include normalize activations in base sae config
make sure tutorial works
don't forget to update pbar
rename sparse autoencoder to sae for brevity
move non-training specific modules out of training
rename to remove _point
first steps towards better docs
final cleanup
have ci use same test coverage total as make check-ci
clean up docs a bit

Co-authored-by: ckkissane <67170576+ckkissane@users.noreply.github.com> (e4eaccc)

Move activation store to cpu (#159)
add act store device to config
fix serialisation issue with device
fix accidental hardcoding of a device
test activations get moved correctly
fix issue with test cacher that shared state
add split store & model test + fix failure
clarify comment
formatting fixes (eb9489a)
Refactor training (#158)
turn training runner into a class
make a trainer class
further refactor
update runner call
update docs (72179c8)
Enable autocast for LM activation creation (#157)
add LM autocasting
add script to test autocast performance
format fix
update autocast demo script (cf94845)
gemma 2b sae resid post 12. fix ghost grad print (2a676b2)
don't hardcode hook (a10283d)
add mlp out SAEs to from pretrained (ee9291e)
remove resuming ability, keep resume config but complain if true (#156) (64e4dcd)
Add notebook to transfer W&B models to HF (#154)

hard to check this works quickly but assuming it does. (91239c1)

Remove sae parallel training, simplify code (#155)
remove sae parallel training, simplify code
remove unused import
remove accidental inclusion of file

(not tagging this as breaking since we're do a new major release this week and I don't want to keep bumping the major version) (f445fdf)

Update pretrained_saes.yaml (37fb150)
Ansible: update incorrect EC2 quota request link (432c7e1)
Merge pull request #153 from jbloomAus/ansible_dev

Ansible: dev only mode (51d2175)

Ansible: dev only mode (027460f)
feature: add gemma-2b bootleg saes (#152) (b9b7e32)

v2.1.3 (2024-05-15)

Fix

fix: Fix normalisation (#150)
fix GPT2 sweep settings to use correct dataset
add gpt2 small block sweep to check norm
larger buffer + more evals
fix activation rescaling so normalisation works
formatting fixes (9ce0fe4)

Unknown

Fix checkpointing of training state that includes a compiled SAE (#143)
Adds state_dict to L1Scheduler
investigating test failure
fix: Fix issues with resumption testing (#144)
fix always-true comparison in train context testing
set default warmup steps to zero
remove unused type attribute from L1Scheduler
update training tests to use real context builder
add docstring for build_train_ctx
2.1.2

Automatically generated by python-semantic-release

Adds state_dict to L1Scheduler
investigating test failure

Co-authored-by: github-actions <github-actions@github.com> (2f8c4e1)

fix GPT2 sweep settings to use correct dataset (#147)
fix GPT2 sweep settings to use correct dataset
add gpt2 small block sweep to check norm
larger buffer + more evals

Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (448d911)

Pretokenize runner (#148)
feat: adding a pretokenize runner
rewriting pretokenization based on feedback (f864178)
Fix config files for Ansible (ec70cea)
Pin Ansible config example to a specific version, update docs (#142)
Pin Ansible config example to a specific version, update docs
Allow running cache acts or train sae separately. Update README
Update readme (41785ae)

v2.1.2 (2024-05-13)

Fix

fix: Fix issues with resumption testing (#144)
fix always-true comparison in train context testing
set default warmup steps to zero
remove unused type attribute from L1Scheduler
update training tests to use real context builder
add docstring for build_train_ctx (085d04f)

v2.1.1 (2024-05-13)

Fix

fix: hardcoded mps device in ckrk attn saes (#141) (eba3f4e)

Unknown

feature: run saelens on AWS with one command (#138)
Ansible playbook for automating caching activations and training saes
Add automation
Fix example config
Fix bugs with ansible mounting s3
Reorg, more automation, Ubuntu instead of Amazon Linux
More automation
Train SAE automation
Train SAEs and readme
fix gitignore
Fix automation config bugs, clean up paths
Fix shutdown time, logs (13de52a)
Gpt 2 sweep (#140)
sweep settings for gpt2-small
get model string right
fix some comments that don't apply now
formatting fix (4cb270b)
Remove cuda cache emptying in evals.py (#139) (bdef2cf)

v2.1.0 (2024-05-11)

Chore

chore: remove use_deterministic_algorithms=True since it causes cuda errors (#137) (1a3bedb)

Feature

feat: Hooked toy model (#134)
adds initial re-implementation of toy models
removes instance dimension from toy models
fixing up minor nits and adding more tests

Co-authored-by: David Chanin <chanindav@gmail.com> (03aa25c)

v2.0.0 (2024-05-10)

Breaking

feat: rename batch sizes to give informative units (#133)

BREAKING CHANGE: renamed batch sizing config params

renaming batch sizes to give units
changes in notebooks
missed one!

Co-authored-by: David Chanin <chanindav@gmail.com> (cc78e27)

Chore

chore: tools to make tests more deterministic (#132) (2071d09)
chore: Make tutorial notebooks work in Google Colab (#120)

Co-authored-by: David Chanin <chanindav@gmail.com> (007141e)

v1.8.0 (2024-05-09)

Chore

chore: closing " in docs (#130) (5154d29)

Feature

feat: Add model_from_pretrained_kwargs as config parameter (#122)
add model_from_pretrained_kwargs config parameter to allow full control over model used to extract activations from. Update tests to cover new cases
tweaking test style

Co-authored-by: David Chanin <chanindav@gmail.com> (094b1e8)

v1.7.0 (2024-05-08)

Feature

feat: Add torch compile (#129)
Surface # of eval batches and # of eval sequences
fix formatting
config changes
add compilation to lm_runner.py
remove accidental print statement
formatting fix (5c41336)
feat: Change eval batch size (#128)
Surface # of eval batches and # of eval sequences
fix formatting
fix print statement accidentally left in (758a50b)

v1.6.1 (2024-05-07)

Fix

fix: Revert "feat: Add kl eval (#124)" (#127)

This reverts commit c1d9cbe8627f27f4d5384ed4c9438c3ad350d412. (1a0619c)

v1.6.0 (2024-05-07)

Feature

feat: Add bf16 autocast (#126)
add bf16 autocast and gradient scaling
simplify autocast setup
remove completed TODO
add autocast dtype selection (generally keep bf16)
formatting fix
remove autocast dtype (8e28bfb)

v1.5.0 (2024-05-07)

Feature

feat: Add kl eval (#124)
add kl divergence to evals.py
fix linter (c1d9cbe)

Unknown

major: How we train saes replication (#123)
l1 scheduler, clip grad norm
add provisional ability to normalize activations
notebook
change heuristic norm init to constant, report b_e and W_dec norms (fix tests later)
fix mse calculation
add benchmark test
update heuristic init to 0.1
make tests pass device issue
continue rebase
use better args in benchmark
remove stack in get activations
broken! improve CA runner
get cache activation runner working and add some tests
add training steps to path
avoid ghost grad tensor casting
enable download of full dataset if desired
add benchmark for cache activation runner
add updated tutorial
format

Co-authored-by: Johnny Lin <hijohnnylin@gmail.com> (5f46329)

v1.4.0 (2024-05-05)

Feature

feat: Store state to allow resuming a run (#106)
first pass of saving
added runner resume code
added auto detect most recent checkpoint code
make linter happy (and one small bug)
blak code formatting
isort
help pyright
black reformatting:
activations store flake
pyright typing
black code formatting
added test for saving and loading
bigger training set
black code
move to pickle
use pickle because safetensors doesn't support all the stuff needed for optimizer and scheduler state
added resume test
added wandb_id for resuming
use wandb id for checkpoint
moved loaded to device and minor fixes to resuming

Co-authored-by: David Chanin <chanindav@gmail.com> (4d12e7a)

Unknown

Fix: sparsity norm calculated at incorrect dimension. (#119)
Fix: sparsity norm calculated at incorrect dimension.

For L1 this does not effect anything as essentially it's calculating the abs() and average everything. For L2 this is problematic as L2 involves sum and sqrt. Unexpected behaviors occur when x is of shape (batch, sen_length, hidden_dim).

Added tests.
Changed sparsity calculation to handle 3d inputs. (ce95fb2)

v1.3.0 (2024-05-03)

Feature

feat: add activation bins for neuronpedia outputs, and allow customizing quantiles (#113) (05d650d)
feat: Update for Neuropedia auto-interp (#112)
cleanup Neuronpedia autointerp code
Fix logic bug with OpenAI key

Co-authored-by: Joseph Bloom <69127271+jbloomAus@users.noreply.github.com> (033283d)

feat: SparseAutoencoder.from_pretrained() similar to transformer lens (#111)
add partial work so David can continue
feat: adding a SparseAutoencoder.from_pretrained() function

Co-authored-by: jbloomaus <jbloomaus@gmail.com> (617d416)

Fix

fix: replace list_files_info with list_repo_tree (#117) (676062c)
fix: Improved activation initialization, fix using argument to pass in API key (#116) (7047bcc)

v1.2.0 (2024-04-29)

Feature

feat: breaks up SAE.forward() into encode() and decode() (#107)
breaks up SAE.forward() into encode() and decode()
cleans up return typing of encode by splitting into a hidden and public function (7b4311b)

v1.1.0 (2024-04-29)

Feature

feat: API for generating autointerp + scoring for neuronpedia (#108)
API for generating autointerp for neuronpedia
Undo pytest vscode setting change
Fix autointerp import
Use pypi import for automated-interpretability (7c43c4c)

v1.0.0 (2024-04-27)

Breaking

chore: empty commit to bump release

BREAKING CHANGE: v1 release (2615a3e)

Chore

chore: fix outdated lr_scheduler_name in docs (#109)
chore: fix outdated lr_scheduler_name in docs
add tutorial hparams (7cba332)

Unknown

BREAKING CHANGE: 1.0.0 release

BREAKING CHANGE: 1.0.0 release (c23098f)

Neuronpedia: allow resuming upload (#102) (0184671)

v0.7.0 (2024-04-24)

Feature

feat: make a neuronpedia list with features via api call (#101) (23e680d)

Unknown

Merge pull request #100 from jbloomAus/np_improvements

Improvements to Neuronpedia Runner (5118f7f)

neuronpedia: save run settings to json file to avoid errors when resuming later. automatically skip batch files that already exist (4b5412b)
skip batch file if it already exists (7d0e396)
neuronpedia: include log sparsity threshold in skipped_indexes.json (5c967e7)

v0.6.0 (2024-04-21)

Chore

chore: enabling pythong 3.12 checks for CI (25526ea)
chore: setting up precommit to be consistent with CI (18e706d)

Feature

feat: Added tanh-relu activation fn and input noise options (#77)
Still need to pip-install from GitHub hufy implementation.
Added support for tanh_sae.
Added notebook for loading the tanh_sae
tweaking config options to be more declarating / composable
testing adding noise to SAE forward pass
updating notebook

Co-authored-by: David Chanin <chanindav@gmail.com> (551e94d)

Unknown

Update proposal.md (6d45b33)
Merge pull request #96 from jbloomAus/github-templates

add templates for PR's / issues (241a201)

add templates for PR's / issues (74ff597)
Merge pull request #95 from jbloomAus/load-state-dict-not-strict

Make load_state_dict use strict=False (4a9e274)

fix accidental bug (c22fbbd)
fix load pretrained legacy with state dict change (b5e97f8)
Make load_state_dict use strict=False (fdf7fe9)
Merge pull request #94 from jbloomAus/update-pre-commit

chore: setting up precommit to be consistent with CI (6a056b7)

Merge pull request #87 from evanhanders/old_to_new

Adds function that converts old .pt pretrained SAEs to new folder format (1cb1725)

Merge pull request #93 from jbloomAus/py-312-ci

chore: enabling python 3.12 checks for CI (87be422)

v0.5.1 (2024-04-19)

Chore

chore: re-enabling isort in CI (#86) (9c44731)

Fix

fix: pin pyzmq==26.0.1 temporarily (0094021)
fix: typing issue, temporary (25cebf1)

Unknown

v0.5.1 (0ac218b)
fixes string vs path typing errors (94f1fc1)
removes unused import (06406b0)
updates formatting for alignment with repo standards. (5e1f342)
consolidates with SAE class load_legacy function & adds test (0f85ded)
adds old->new file conversion function (fda2b57)
Merge pull request #91 from jbloomAus/decoder-fine-tuning

Decoder fine tuning (1fc652c)

par update (2bb5975)
Merge pull request #89 from jbloomAus/fix_np

Enhance + Fix Neuronpedia generation / upload (38d507c)

minor changes (bc766e4)
reformat run.ipynb (822882c)
get decoder fine tuning working (11a71e1)
format (040676d)
Merge pull request #88 from jbloomAus/get_feature_from_neuronpedia

FEAT: Add API for getting Neuronpedia feature (1666a68)

Fix resuming from batch (145a407)
Use original repo for sae_vis (1a7d636)
Use correct model name for np runner (138d5d4)
Merge main, remove eindex (6578436)
Add API for getting Neuronpedia feature (e78207d)

v0.5.0 (2024-04-17)

Feature

feat: Mamba support vs mamba-lens (#79)
mamba support
added init
added optional model kwargs
Support transformers and mamba
forgot one model kwargs
failed opts
tokens input
hack to fix tokens, will look into fixing mambalens
fixed checkpoint
added sae group
removed some comments and fixed merge error
removed unneeded params since that issue is fixed in mambalens now
Unneded input param
removed debug checkpoing and eval
added refs to hookedrootmodule
feed linter
added example and fixed loading
made layer for eval change
fix linter issues
adding mamba-lens as optional dep, and fixing typing/linting
adding a test for loading mamba model
adding mamba-lens to dev for CI
updating min mamba-lens version
updating mamba-lens version

Co-authored-by: David Chanin <chanindav@gmail.com> (eea7db4)

Unknown

update readme (440df7b)
update readme (3694fd2)
Fix upload skipped/dead features (932f380)
Use python typer instead of shell script for neuronpedia jobs (b611e72)
Merge branch 'main' into fix_np (cc6cb6a)
convert sparsity to log sparsity if needed (8d7d404)

v0.4.0 (2024-04-16)

Feature

feat: support orthogonal decoder init and no pre-decoder bias (ac606a3)

Fix

fix: sae dict bug (484163e)
fix: session loader wasn't working (a928d7e)

Unknown

enable setting adam pars in config (1e53ede)
fix sae dict loader and format (c558849)
default orthogonal init false (a8b0113)
Formatting (1e3d53e)
Eindex required by sae_vis (f769e7a)
Upload dead feature stubs (9067380)
Make feature sparsity an argument (8230570)
Fix buffer" (dde2481)
Merge branch 'main' into fix_np (6658392)
notebook update (feca408)
Merge branch 'main' into fix_np (f8fb3ef)
Final fixes (e87788d)
Don't use buffer, fix anomalies (2c9ca64)

v0.3.0 (2024-04-15)

Feature

feat: add basic tutorial for training saes (1847280)

v0.2.2 (2024-04-15)

Fix

fix: dense batch dim mse norm optional (8018bc9)

Unknown

format (c359c27)
make dense_batch_mse_normalization optional (c41774e)
Runner is fixed, faster, cleaned up, and now gives whole sequences instead of buffer. (3837884)
Merge branch 'main' into fix_np (3ed30cf)
add warning in run script (9a772ca)
update sae loading code (356a8ef)
add device override to session loader (96b1e12)
update readme (5cd5652)

v0.2.1 (2024-04-13)

Fix

fix: neuronpedia quicklist (6769466)

v0.2.0 (2024-04-13)

Chore

chore: improving CI speed (9e3863c)
chore: updating README.md with pip install instructions and PyPI badge (682db80)

Feature

feat: overhaul saving and loading (004e8f6)

Unknown

Use legacy loader, add back histograms, logits. Fix anomaly characters. (ebbb622)
Merge branch 'main' into fix_np (586e088)
Merge pull request #80 from wllgrnt/will-update-tutorial

bugfix - minimum viable updates to tutorial notebook (e51016b)

minimum viable fixes to evaluation notebook (b907567)
Merge pull request #76 from jbloomAus/faster-ci

perf: improving CI speed (8b00000)

try partial cache restore (392f982)
Merge branch 'main' into faster-ci (89e1568)
Merge pull request #78 from jbloomAus/fix-artifact-saving-loading

Fix artifact saving loading (8784c74)

remove duplicate code (6ed6af5)
set device in load from pretrained (b4e12cd)
fix typing issue which required ignore (a5df8b0)
remove print statement (295e0e4)
remove load with session option (74926e1)
fix broken test (16935ef)
avoid tqdm repeating during training (1d70af8)
avoid division by 0 (2c7c6d8)
remove old notebook (e1ad1aa)
use-sae-dict-not-group (27f8003)
formatting (827abd0)
improve artifact loading storage, tutorial forthcoming (604f102)
add safetensors to project (0da48b0)
Don't precompute background colors and tick values (271dbf0)
Merge pull request #71 from weissercn/main

Addressing notebook issues (8417505)

Merge pull request #70 from jbloomAus/update-readme-install

chore: updating README.md with pip install instructions and PyPI badge (4d7d1e7)

FIX: Add back correlated neurons, frac_nonzero (d532b82)
linting (1db0b5a)
fixed graph name (ace4813)
changed key for df_enrichment_scores, so it can be run (f0a9d0b)
fixed space in notebook 2 (2278419)
fixed space in notebook 2 (24a6696)
fixed space in notebook (d2f8c8e)
fixed pickle backwards compatibility in tutorial (3a97a04)

v0.1.0 (2024-04-06)

Feature

feat: release (c70b148)

Fix

fix: removing paths-ignore from action to avoid blocking releases (28ff797)
fix: updating saevis version to use pypi (dbd96a2)

Unknown

Merge pull request #69 from chanind/remove-ci-ignore

fix: removing paths-ignore from action to avoid blocking releases (179cea1)

Update README.md (1720ce8)
Merge pull request #68 from chanind/updating-sae-vis

fix: hotfix updating saevis version to use pypi (a13cee3)

v0.0.0 (2024-04-06)

Chore

chore: adding more tests to ActivationsStore + light refactoring (cc9899c)
chore: running isort to fix imports (53853b9)
chore: setting up pyright type checking and fixing typing errors (351995c)
chore: enable full flake8 default rules list (19886e2)
chore: using poetry for dependency management (465e003)
chore: removing .DS_Store files (32f09b6)

Unknown

Merge pull request #66 from chanind/pypi

feat: setting up sae_lens package and auto-deploy with semantic-release (34633e8)

Merge branch 'main' into pypi (3ce7f99)
Merge pull request #60 from chanind/improve-config-typing

fixing config typing (b8fba4f)

setting up sae_lens package and auto-deploy with semantic-release (ba41f32)
fixing config typing

switch to using explicit params for ActivationsStore config instead of RunnerConfig base class (9be3445)

Merge pull request #65 from chanind/fix-forgotten-scheduler-opts

passing accidentally overlooked scheduler opts (773bc02)

passing accidentally overlooked scheduler opts (ad089b7)
Merge pull request #64 from chanind/lr-decay

adding lr_decay_steps and refactoring get_scheduler (c960d99)

adding lr_decay_steps and refactoring get_scheduler (fd5448c)
Merge pull request #53 from hijohnnylin/neuronpedia_runner

Generate and upload Neuronpedia artifacts (0b94f84)

format (792c7cb)
ignore type incorrectness in imported package (5fe83a9)
Merge pull request #63 from chanind/remove-eindex

removing unused eindex depencency (1ce44d7)

removing unused eindex depencency (7cf991b)
Safe to_str_tokens, fix memory issues (901b888)
Allow starting neuronpedia generation at a specific batch numbe (85d8f57)
FIX: Linting 'do not use except' (ce3d40c)
Fix vocab: Ċ should be line break. Also set left and right buffers (205b1c1)
Merge (b159010)
Update Neuronpedia Runner (885de27)
Merge pull request #58 from canrager/main

Make prepend BOS optional: Default True (48a07f9)

make tests pass with use_bos flag (618d4bb)
Merge pull request #59 from chanind/fix-docs-deploy

attempting to fix docs deploy (cfafbe7)

force docs push (3aa179d)
ignore type eror (e87198b)
format (67dfb46)
attempting to fix docs deploy (cda8ece)
Merge branch 'main' of https://github.com/jbloomAus/mats_sae_training into main (8aadcd3)
add prepend bos flag (c0b29cc)
fix attn out on run evals (02fa90b)
Merge pull request #57 from chanind/optim-tests

Adding tests to get_scheduler (13c8085)

Merge pull request #56 from chanind/sae-tests

minor refactoring to SAE and adding tests (2c425ca)

minor refactoring to SAE and adding tests (92a98dd)
adding tests to get_scheduler (3b7e173)
Generate and upload Neuronpedia artifacts (b52e0e2)
Merge pull request #54 from jbloomAus/hook_z_suppourt

notional support, needs more thorough testing (277f35b)

Merge pull request #55 from chanind/contributing-docs

adding a contribution guide to docs (8ac8f05)

adding a contribution guide to docs (693c5b3)
notional support, needs more thorough testing (9585022)
Generate and upload Neuronpedia artifacts (4540268)
Merge pull request #52 from hijohnnylin/fix_db_runner_assert

FIX: Don't check wandb assert if not using wandb (5c48811)

FIX: Don't check wandb assert if not using wandb (1adefda)
add docs badge (f623ed1)
try to get correct deployment (777dd6c)
Merge pull request #51 from jbloomAus/mkdocs

Add Docs to the project. (d2ebbd7)

mkdocs, test (9f14250)
code cov (2ae6224)
Merge pull request #48 from chanind/fix-sae-vis-version

Pin sae_vis to previous working version (3f8a30b)

fix suffix issue (209ba13)
pin sae_vis to previous working version (ae0002a)
don't ignore changes to .github (35fdeec)
add cov report (971d497)
Merge pull request #40 from chanind/refactor-train-sae

Refactor train SAE and adding unit tests (5aa0b11)

Merge branch 'main' into refactor-train-sae (0acdcb3)
Merge pull request #41 from jbloomAus/move_to_sae_vis

Move to sae vis (bcb9a52)

flake8 can ignore imports, we're using isort anyway (6b7ae72)
format (af680e2)
fix mps bug (e7b238f)
more tests (01978e6)
wip (4c03b3d)
more tests (7c1cb6b)
testing that sparsity counts get updated correctly (5b5d653)
adding some unit tests to _train_step() (dbf3f01)
Merge branch 'main' into refactor-train-sae (2d5ec98)
Update README.md (d148b6a)
Merge pull request #20 from chanind/activations_store_tests

chore: adding more tests to ActivationsStore + light refactoring (69dcf8e)

Merge branch 'main' into activations_store_tests (4896d0a)
refactoring train_sae_on_language_model.py into smaller functions (e75a15d)
suppourt apollo pretokenized datasets (e814054)
handle saes saved before groups (5acd89b)
typechecker (fa6cc49)
fix geom median bug (8d4a080)
remove references to old code (861151f)
remove old geom median code (05e0aca)
Merge pull request #22 from themachinefan/faster_geometric_median

Faster geometric median. (341c49a)

makefile check type and types of geometric media (736bf83)
Merge pull request #21 from schmatz/fix-dashboard-image

Fix broken dashboard image on README (eb90cc9)

Merge pull request #24 from neelnanda-io/add-post-link

Added link to AF post (39f8d3d)

Added link to AF post (f0da9ea)
formatting (0168612)
use device, don't use cuda if not there (20334cb)
format (ce49658)
fix tsea typing (449d90f)
faster geometric median. Run geometric_median,py to test. (92cad26)
Fix dashboard image (6358862)
fix incorrect code used to avoid typing issue (ed0b0ea)
add nltk (bc7e276)
ignore various typing issues (6972c00)
add babe package (481069e)
make formatter happy (612c7c7)
share scatter so can link (9f88dc3)
add_analysis_files_for_post (e75323c)
don't block on isort linting (3949a46)
formatting (951a320)
Update README.md (b2478c1)
Merge pull request #18 from chanind/type-checking

chore: setting up pyright type checking and fixing typing errors (bd5fc43)

Merge branch 'main' into type-checking (57c4582)
Merge pull request #17 from Benw8888/sae_group_pr

SAE Group for sweeps PR (3e78bce)

Merge pull request #1 from chanind/sae_group_pr_isort_fix

chore: running isort to fix imports (dd24413)

black format (0ffcf21)
fixed expansion factor sweep (749b8cf)
remove tqdm from data loader, too noisy (de3b1a1)
fix tests (b3054b1)
don't calculate geom median unless you need to (d31bc31)
add to method (b3f6dc6)
flake8 and black (ed8345a)
flake8 linter changes (8e41e59)
Merge branch 'main' into sae_group_pr (082c813)
Delete evaluating.ipynb (d3cafa3)
Delete activation_storing.py (fa82992)
Delete lp_sae_training.py (0d1e1c9)
implemented SAE groups (66facfe)
Merge pull request #16 from chanind/flake-default-rules

chore: enable full flake8 default rules list (ad84706)

implemented sweeping via config list (80f61fa)
Merge pull request #13 from chanind/poetry

chore: using poetry for dependency management (496f7b4)

progress on implementing multi-sae support (2ba2131)
Merge pull request #11 from lucyfarnik/fix-caching-shuffle-edge-case

Fixed edge case in activation cache shuffling (3727b5d)

Merge pull request #12 from lucyfarnik/add-run-name-to-config

Added run name to config (c2e05c4)

Added run name to config (ab2aabd)
Fixed edge case in activation cache shuffling (18fd4a1)
Merge pull request #9 from chanind/rm-ds-store

chore: removing .DS_Store files (37771ce)

improve readmen (f3fe937)
fix_evals_bad_rebase (22e415d)
evals changes, incomplete (736c40e)
make tutorial independent of artefact and delete old artefact (6754e65)
fix MSE in ghost grad (44f7988)
Merge pull request #5 from jbloomAus/clean_up_repo

Add CI/CD, black formatting, pre-commit with flake8 linting. Fix some bugs. (01ccb92)

clean up run examples (9d46bdd)
move where we save the final artifact (f445fac)
fix activations store innefficiency (07d38a0)
black format and linting (479765b)
dummy file change (912a748)
try adding this branch listed specifically (7fd0e0c)
yml not yaml (9f3f1c8)
add ci (91aca91)
get units tests working (ade2976)
make unit tests pass, add make file (08b2c92)
add pytest-cov to requirements.txt (ce526df)
seperate research from main repo (32b668c)
remove comma and set default store batch size lower (9761b9a)
notebook for Johny (39a18f2)
best practices ghost grads fix (f554b16)
Update README.md

improved the hyperpars (2d4caf6)

dashboard runner (a511223)
readme update (c303c55)
still hadn't fixed the issue, now fixed (a36ee21)
fix mean of loss which broke in last commit (b4546db)
generate dashboards (35fa631)
Merge pull request #3 from jbloomAus/ghost_grads_dev

Ghost grads dev (4d150c2)

save final log sparsity (98e4f1b)
start saving log sparsity (4d6df6f)
get ghost grads working (e863ed7)
add notebook/changes for ghost-grad (not working yet) (73053c1)
idk, probs good (0407ad9)
bunch of shit (1ec8f97)
Merge branch 'main' of github.com:jbloomAus/mats_sae_training (a22d856)
Reverse engineering the "not only... but" feature (74d4fb8)
Merge pull request #2 from slavachalnev/no_reinit

Allow sampling method to be None (4c5fed8)

Allow sampling method to be None (166799d)
research/week_15th_jan/gpt2_small_resid_pre_3.ipynb (52a1da7)
add arg for dead neuron calc (ffb75fb)
notebooks for lucy (0319d89)
add args for b_dec_init (82da877)
add geom median as submodule instead (4c0d001)
add geom median to req (4c8ac9d)
add-geometric-mean-b_dec-init (d5853f8)
reset feature sparsity calculation (4c7f6f2)
anthropic sampling (048d267)
get anthropic resampling working (ca74543)
add ability to finetune existing autoencoder (c1208eb)
run notebook (879ad27)
switch to batch size independent loss metrics (0623d39)
track mean sparsity (75f1547)
don't stop early (44078a6)
name runs better (5041748)
improve-eval-metrics-for-attn (00d9b65)
add hook q (b061ee3)
add copy suppression notebook (1dc893a)
fix check in neuron sampling (809becd)
Merge pull request #1 from jbloomAus/activations_on_disk

Activations on disk (e5f198e)

merge into main (94ed3e6)
notebook (b5344a3)
various research notebooks (be63fce)
Added activations caching to run.ipynb (054cf6d)
Added activations dir to gitignore (c4a31ae)
Saving and loading activations from disk (309e2de)
Fixed typo that threw out half of activations (5f73918)
minor speed improvement (f7ea316)
add notebook with different example runs (c0eac0a)
add ability to train on attn heads (18cfaad)
add gzip for pt artefacts (9614a23)
add_example_feature_dashboard (e90e54d)
get_shit_done (ce73042)
commit_various_things_in_progress (3843c39)
add sae visualizer and tutorial (6f4030c)
make it possible to load sae trained on cuda onto mps (3298b75)
reduce hist freq, don't cap re-init (debcf0f)
add loader import to readme (b63f14e)
Update README.md (88f086b)
improve-resampling (a3072c2)
add readme (e9b8e56)
fixl0_plus_other_stuff (2f162f0)
add checkpoints (4cacbfc)
improve_model_saving_loading (f6697c6)
stuff (19d278a)
Added support for non-tokenized datasets (afcc239)
notebook_for_keith (d06e09b)
fix resampling bug (2b43980)
test pars (f601362)
further-lm-improvments (63048eb)
get_lm_working_well (eba5f79)
basic-lm-training-currently-broken (7396b8b)
set_up_lm_runner (d1095af)
fix old test, may remove (b407aab)
happy with hyperpars on benchmark (836298a)
improve metrics (f52c7bb)
make toy model runner (4851dd1)
various-changes-toy-model-test (a61b75f)
Added activation store and activation gathering (a85f24d)
First successful run on toy models (4927145)
halfway-to-toy-models (feeb411)
Initial commit (7a94b0e)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

v3.18.2 (2024-08-25)

Fix

v3.18.1 (2024-08-23)

Chore

Fix

v3.18.0 (2024-08-22)

Feature

v3.17.1 (2024-08-18)

Fix

v3.17.0 (2024-08-16)

Feature

v3.16.0 (2024-08-15)

Feature

v3.15.0 (2024-08-14)

Chore

Feature

Unknown

v3.14.0 (2024-08-05)

Feature

Unknown

v3.13.1 (2024-07-31)

Fix

Unknown

v3.13.0 (2024-07-18)

Feature

Unknown

v3.12.5 (2024-07-18)

Fix

Unknown

v3.12.4 (2024-07-17)

Fix

Unknown

v3.12.3 (2024-07-15)

Fix

v3.12.2 (2024-07-15)

Fix

Unknown

v3.12.1 (2024-07-11)

Fix

Unknown

v3.12.0 (2024-07-09)

Feature

Unknown

v3.11.2 (2024-07-08)

Fix

v3.11.1 (2024-07-08)

Fix

Unknown

v3.11.0 (2024-07-04)

Feature

v3.10.0 (2024-07-04)

Feature

Fix

v3.9.2 (2024-07-03)

Fix

Unknown

v3.9.1 (2024-07-01)

Fix

v3.9.0 (2024-07-01)

Feature

Unknown

v3.8.0 (2024-06-29)

Feature

Unknown

v3.7.0 (2024-06-25)

Feature

v3.6.0 (2024-06-25)

Feature

Unknown

v3.5.0 (2024-06-20)

Feature

Unknown

v3.4.1 (2024-06-17)