Skip to content

Instantly share code, notes, and snippets.

import os
import sys
with open(sys.argv[0]) as f:
code = f.read() # read the code of this file ASAP, for logging
import uuid
import time
import glob
import subprocess
import contextlib
from dataclasses import dataclass
# NOTE: Resume originally formatted to fit a standard terminal screen (80 px)
Fern Balsam (she/they)
community-funded open-source ml researcher / part-time outdoor adventurer
9 years of experience developing and aggressively optimizing neural networks
[ primary ] goal: societal impact via applied research and surrounding work
[ secondary ] goal: pure research extending theory of deep learning
-------------------------------------------------------------------------------
| |
| Achievements |
# NOTE: Resume originally formatted to fit a standard terminal screen (80 px)
Fern Balsam (she/they)
community-funded open-source ml researcher / part-time outdoor adventurer
9 years of experience developing and aggressively optimizing neural networks
[ primary ] goal: societal impact via applied research and surrounding work
[ secondary ] goal: pure research extending theory of deep learning
-------------------------------------------------------------------------------
| |
| Achievements |
@tysam-code
tysam-code / hlb-cifar10-ternary-train-initial-working-prototype.py
Last active December 29, 2023 07:47
Trains a network to ~>91.5% on CIFAR10 in less than 10 seconds on an A100 with ternary weights, should fit uncompressed w/ correct storage dtypes in just over half of a floppy drive. <3 :'))))
# Note: The one change we need to make if we're in Colab is to uncomment this below block.
# If we are in an ipython session or a notebook, clear the state to avoid bugs
"""
try:
_ = get_ipython().__class__.__name__
## we set -f below to avoid prompting the user before clearing the notebook state
%reset -f
except NameError:
pass ## we're still good
"""
@tysam-code
tysam-code / discrete-action-backprop-bypass-hlb-cifar10-demo.py
Last active March 14, 2024 09:24
This is a network sketch. Network sketches aren't complete, released versions, but rather a (hopefully convincing) proof-of-concept of a particular idea. The intent of this network sketch is to show that traditional backprop is not required in order to learn and coordinate a network's dataflow structure over a simulated discrete, non-backproppab…
# Sketch-specific note: a roughly ~25 run battery for this code estimated a roughly ~93.11% accuracy in the same number of steps as the baseline network, ~1.7x runtime overhead (much of which goes to the torch.randn allocations and extra layer calculations).
# Note: The one change we need to make if we're in Colab is to uncomment this below block.
# If we are in an ipython session or a notebook, clear the state to avoid bugs
#"""
try:
_ = get_ipython().__class__.__name__
## we set -f below to avoid prompting the user before clearing the notebook state
%reset -f
except NameError:
@tysam-code
tysam-code / in_dev_slicer_prefetcher.py
Last active December 15, 2023 06:44
in dev, TODO fill out description later.
import os
import time
import queue
import threading
import random
import torch
import torchvision
from torchvision.transforms import v2
@tysam-code
tysam-code / condensed-ml-tidbits.txt
Last active December 11, 2023 04:22
TODOTODOTODOTODO # workinprogress <3 :'))))
# [IN-DEV currently]
# Maintained/Initially created by Fern. Say hi to me and feel free to ask any questions as needed! <3 :'))))
# If anything here is self-cited/has no citation, that means that it's a conclusion I arrived at over time, or in
# deriving something from the basics, however, there may be work elaborating it in further detail (feel free to comment if there's an especially relevant link).
# Misc
- LayerNorm/RMSNorm might be acting as lateral inhibition, a paradigm attempted in many 2000's and surrounding ML papers (Fern, {relevant sources needed})
- 'Soft' (pre-determined or pre-compiled) architectures in the weights of your network can greatly increase convergences times and/or generalization.
- Downcasting dtypes to a lower bit depth in your dot products can be a 'free' efficiency improvement in some circumstances.