[ICLR 2025] General-purpose activation steering library
-
Updated
Sep 18, 2025 - Python
[ICLR 2025] General-purpose activation steering library
KV Cache Steering for Inducing Reasoning in Small Language Models
Official repo for the paper: "Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection"
Official code for "Activation Steering for Accent Adaptation in Speech Foundation Models" (Interspeech 2026). Parameter-free accent adaptation via mean-shift steering vectors — no weight updates, consistent WER reductions across 8 accents.
[ICLR 2026] ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Phase-aware LLM activation steering and linear probing. A memory-efficient, practical implementation of Representation Engineering (RepE) for safety research.
Accepted at 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026
Probing whether reasoning can be structurally crystallised into a small LLM via cyclic domain training + gap phases. Testing emergence at 300M parameters from scratch. SAE-based interpretability. Pruning crossover as depth measurement. Open collaboration.
Activation steering toolkit for Llama 3.2 3B — inject sensory-constructed vectors into model activations to alter processing dispositions. Web UI + API. Runs locally on consumer hardware.
An experimental comparison of prompt-based behavioral steering and activation steering in LLMs.
Qwen3-0.6B activation steering: style vectors, lens contamination eval, CPRR methodology
Bio-inspired multi-scale competency architecture for LLMs. VCG auction routing (Mixture of Bidders) and activation steering for cognitive homeostasis, grounded in Michael Levin's TAME framework
Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored
Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization" (ICML 2025 R2FM Workshop).
Multi-Agent Evolutionary Simulation exploring adversarial economics and AI steering
🌐 Model adversarial economics and AI alignment using the Panopticon Lattice, a multi-agent simulation exploring hidden collusion and system dynamics.
Reshape how AI thinks, one slider at a time. Activation steering for local LLMs with visual sliders, presets, and a web UI.
Add a description, image, and links to the activation-steering topic page so that developers can more easily learn about it.
To associate your repository with the activation-steering topic, visit your repo's landing page and select "manage topics."