---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# Monet: Mixture of Monosemantic Experts for Transformers
## Model Summary
Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.
### Resources and Technical Documentation
- **GitHub Repository**: https://github.com/dmis-lab/Monet
- **Paper**: https://arxiv.org/abs/2412.04139
- **Model Hub**: https://huggingface.co/MonetLLM
- **Demo**: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer
### Available Checkpoints
#### Base Models
#### Instruction-Tuned Models
## Evaluation
### Open-Ended LLM Benchmarks
Model | MMLU | ARC | WG | PIQA | SIQA | OBQA | HS | CSQA | Avg. |
0-shot |
Monet-HD 850M | 0.320 | 0.460 | 0.506 | 0.699 | 0.416 | 0.364 | 0.465 | 0.337 | 0.446 |
Monet-VD 850M | 0.328 | 0.456 | 0.530 | 0.708 | 0.417 | 0.356 | 0.488 | 0.343 | 0.453 |
Monet-HD 1.4B | 0.338 | 0.471 | 0.538 | 0.714 | 0.418 | 0.382 | 0.501 | 0.339 | 0.463 |
Monet-VD 1.4B | 0.352 | 0.495 | 0.522 | 0.727 | 0.423 | 0.418 | 0.529 | 0.363 | 0.478 |
Monet-HD 4.1B | 0.375 | 0.558 | 0.560 | 0.741 | 0.427 | 0.414 | 0.571 | 0.379 | 0.503 |
Monet-VD 4.1B | 0.380 | 0.547 | 0.557 | 0.751 | 0.437 | 0.424 | 0.604 | 0.389 | 0.511 |
5-shot |
Monet-HD 850M | 0.332 | 0.537 | 0.510 | 0.697 | 0.409 | 0.346 | 0.479 | 0.420 | 0.466 |
Monet-VD 850M | 0.341 | 0.548 | 0.520 | 0.709 | 0.437 | 0.368 | 0.504 | 0.454 | 0.485 |
Monet-HD 1.4B | 0.352 | 0.544 | 0.530 | 0.720 | 0.432 | 0.360 | 0.518 | 0.441 | 0.487 |
Monet-VD 1.4B | 0.360 | 0.547 | 0.526 | 0.730 | 0.441 | 0.422 | 0.551 | 0.501 | 0.510 |
Monet-HD 4.1B | 0.385 | 0.603 | 0.545 | 0.742 | 0.463 | 0.412 | 0.588 | 0.545 | 0.535 |
Monet-VD 4.1B | 0.398 | 0.625 | 0.564 | 0.761 | 0.470 | 0.438 | 0.619 | 0.525 | 0.550 |
### Detoxification
Detoxification task performances are evaluated on the [Monet-VD 1.4B](MonetLLM/monet-vd-1.4B-100BT-hf) model.
#### RealToxicityPrompts
Masking Threshold |
Masking Ratio |
Exp. Max. Toxicity |
Toxicity Prob. |
Avg. Perf. |
Toxic |
Non-Toxic |
Toxic |
Non-Toxic |
– |
– |
0.795 |
0.269 |
0.926 |
0.08 |
0.478 |
0.2 |
1.0% |
0.767 |
0.268 |
0.909 |
0.07 |
0.479 |
0.1 |
4.1% |
0.657 |
0.270 |
0.768 |
0.08 |
0.478 |
0.05 |
14.4% |
0.552 |
0.256 |
0.564 |
0.05 |
0.467 |
#### ToxiGen
Masking Threshold |
Masking Ratio |
RoBERTa Score |
Avg. Perf. |
Hate |
Neutral |
– |
– |
0.642 |
0.035 |
0.478 |
0.2 |
1.4% |
0.643 |
0.033 |
0.478 |
0.1 |
5.4% |
0.504 |
0.028 |
0.473 |
0.05 |
15.0% |
0.430 |
0.027 |
0.455 |
## Examples
### Text Generation
```python
from transformers import pipeline
model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])
```
### Code Generation
```python
from transformers import pipeline
model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
text = '''
def print_len(x: str):
"""For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])
```
### Chat Completion
```python
from transformers import pipeline
model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
text = tokenizer.apply_chat_template(
[{"role": "user", "content": "Hi! How are you?"}],
add_generation_prompt=True,
tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])
```
### Using vLLM
The custom implementation of vLLM is provided in [the repository](https://github.com/dmis-lab/Monet/blob/main/modeling_monet_vllm.py).
```python
from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM
# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)
model = LLM(
"MonetLLM/monet-vd-1.4B-100BT-hf",
trust_remote_code=True,
dtype="bfloat16",
gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)
```
## Training
### Model
- Architecture: Monet
- Pretraining tokens: 100B
- Precision: bfloat16
### Hardware
- TPUs: TPU-v4-64 Pod Slice (supported by [TRC Program](https://sites.research.google/trc/about/))
### Software
- Training Framework: [Jax](https://github.com/jax-ml/jax), [Flax](https://github.com/google/flax)
## Intended Use
### Primary Intended Uses
This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:
- Mechanistic interpretability research for language models
- Text generation with enhanced interpretability
- Code generation (CodeMonet variant)
- Chat completion (instruction-tuned variant)
- Vision-language tasks (VisionMonet variant)
### Out-of-Scope Uses
This model has not been explicitly developed or tested for all potential downstream applications. Therefore:
1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu).
3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
4. Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope.
## Model Architecture
Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:
- Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
- Fine-grained expert specialization: offers clear insight into model behavior
- Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.
## Ethical Considerations
### Transparency
- Designed specifically for enhanced interpretability
- Enables understanding of internal model behavior
- Allows tracking of knowledge attribution
### Control
- Supports toxicity mitigation
- Enables domain-specific knowledge control
- Maintains performance while adjusting behavior
## License and Usage
Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:
- Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
- Research and educational use is encouraged
- Commercial use is subject to Apache 2.0 license terms
## Citation
```bibtex
@article{park2024monet,
title={{Monet: Mixture of Monosemantic Experts for Transformers}},
author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
journal={arXiv preprint arXiv:2404.05567},
year={2024}
}
```