The OpenAI team has released them openai/circuit-sparsity Hugging face and model openai/circuit_sparsity Toolkit on GitHub. Packages the model and circuit from the release paper ‘Weight-sparse transformers have interpretable circuits,

What is load rarefaction transformer?,
The models are GPT-2 style decoders that are Transformers trained only on Python code. Sparsity is not added after training, it is applied during optimization. After each AdamW step, the training loop keeps only the largest magnitude entries in each weight matrix and bias, including the token embeddings and zeros the rest. All matrices maintain the same fraction of non-zero elements.
The rarest models contain approximately 1 in 1000 Non-zero load. Additionally, the OpenAI team implemented lightweight activation sparsity 1 in 4 Node activations are non-zero, covering residual reads, residual writes, attention channels, and MLP neurons.
Sparsity is eliminated during training. Models start densely, then the allowed non-zero budget gradually increases toward the target value. This design allows the research team to measure breadth while keeping the number of non-zero parameters constant, and then study explanatory power tradeoffs as they vary in sparsity and model size. The research team shows that, for any pre-training loss, the circuits recovered from the sparse model are roughly 16 times Smaller than denser models.

So, what is a sparse circuit?
The central objective of this research work is a sparse circuitThe research team defines the nodes very precisely, each node is a single neuron, attention channel, residual read channel or residual write channel, An edge is a single non-zero entry in a weight matrix that connects two nodes, The size of a circuit is measured by the geometric average number of edges in the functions,
To test the models, the research team constructed 20 Simple Python Next Token Binary FunctionsEach task forces the model to choose between 2 completions that differ in one token, Examples include:
single_double_quotePredict whether a string needs to be closed with single or double quotesbracket_countingdecide between]And]]List by nesting depthset_or_stringTrack whether a variable was initialized as a set or string
For each task, they prune the model to find the smallest circuit that still achieves the target loss. 0.15 On that work distribution. Pruning operates at the node level. Deleted nodes are means separateTheir activation remains constant to the mean of the pre-training distribution. A learned binary mask per node is optimized with a straight-through style surrogate to balance the objective function loss and circuit size.

Example circuits, quote closing and parentheses counting
The simplest example of this is a circuit single_double_quoteHere the model must emit the correct closing quote type given the starting quote, pruning circuit is 12 nodes and 9 edges,
The system is of two stages. in layer 0.mlp2 Neurons Expert:
- A quote detector Neuron that is active on both
"And' - A Citation Type Classifier neuron that is positive
"and on negative'
A subsequent focus in the layer is prominent 10.attn The quote detector uses the channel as the key and the quote type classifier uses the channel as the value. The last token contains a constant positive query, so the attention output copies the correct quote type into the last position and closes the model string correctly.

bracket_counting Generates a slightly larger circuit but with a clearer algorithm. embedding of [ writes into several residual channels that act as bracket detectors. A value channel in a layer 2 attention head averages this detector activation over the context, effectively computing nesting depth and storing it in a residual channel. A later attention head thresholds this depth and activates a nested list close channel only when the list is nested, which leads the model to output ],,
A third circuit, for set_or_string_fixedvarnameShows how the model tracks the type of the called variable. currentCopies a head embedding current In set() Or "" Token. The latter head uses that embedding as the query and key to copy the relevant information back when the model has to choose between .add And +=,


Bridges connecting sparse models to dense models
The research team also introduces bridges Which connects a sparse model to a previously trained dense model. Each bridge is an encoder decoder pair that maps dense activations to sparse activations and back once per sublayer. The encoder uses a linear map with AbsTopK activation, the decoder is linear.
A loss is added to training that encourages the hybrid sparse dense forward pass to match the original dense model. This lets the research team perturb interpretable sparse features like the citation type classifier channel and then map that perturbation to a dense model, changing its behavior in a controlled way.

What exactly has the OpenAI team released?
OpenAI team released openai/circuit-sparsity model on Hugging face. this is one 0.4B parameter Tagged with models custom_codeaccordingly csp_yolo2 In research paper. The model is used for qualitative results on bracket counting and variable binding. It is licensed under Apache 2.0.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
if __name__ == "__main__":
PROMPT = "def square_sum(xs):\n return sum(x * x for x in xs)\n\nsquare_sum([1, 2, 3])\n"
tok = AutoTokenizer.from_pretrained("openai/circuit-sparsity", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"openai/circuit-sparsity",
trust_remote_code=True,
torch_dtype="auto",
)
model.to("cuda" if torch.cuda.is_available() else "cpu")
inputs = tok(PROMPT, return_tensors="pt", add_special_tokens=False)["input_ids"].to(
model.device
)
with torch.no_grad():
out = model.generate(
inputs,
max_new_tokens=64,
do_sample=True,
temperature=0.8,
top_p=0.95,
return_dict_in_generate=False,
)
print(tok.decode(out[0], skip_special_tokens=True))
``` :contentReference[oaicite:14]{index=14}
key takeaways
- Weight loss training, not post hoc pruning: Circuit Sparsity Trains a GPT-2 style decoder model with extreme weight sparsity applied during optimization, most of the weights are zero so each neuron has only a few connections.
- Small, task specific circuits with clear nodes and edges: The research team defines circuits at the level of individual neurons, attention channels, and residual channels, and recovers circuits that often have tens of nodes and few edges for 20 binary python next token tasks.
- Quotation closing and type tracking are fully instantiated circuits:for tasks like
single_double_quote,bracket_countingAndset_or_string_fixedvarnameThe research team isolates circuits that implement concrete algorithms for quote detection, bracket depth, and variable type tracking, along with a string closing circuit using 12 nodes and 9 edges. - Hugging Face and models and tooling on GitHub: OpenAI released 0.4B parameter
openai/circuit-sparsityModel on hugging face and perfectopenai/circuit_sparsityCodebase on GitHub under Apache 2.0, including model checkpoints, task definitions, and a circuit visualization UI. - Bridge mechanism for connecting sparse and dense models: The work introduces encoder-decoder bridges that map between sparse and dense activations, which allows researchers to transfer sparse feature interferences to standard dense activations and study how interpretable circuits relate to real production scale models.
check it out paper and model weightFeel free to check us out GitHub page for tutorials, code, and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.
Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.