--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers --- # Monet: Mixture of Monosemantic Experts for Transformers ## Model Summary Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance. ### Resources and Technical Documentation - **GitHub Repository**: https://github.com/dmis-lab/Monet - **Paper**: https://arxiv.org/abs/2412.04139 - **Model Hub**: https://huggingface.co/MonetLLM - **Demo**: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer ### Available Checkpoints #### Base Models
Model Dataset #Params #Tokens Checkpoint Demo
Monet-VD FineWeb-Edu 850M 100BT monet-vd-850M-100BT-hf
1.4B 100BT monet-vd-1.4B-100BT-hf Viewer
4.1B 100BT monet-vd-4.1B-100BT-hf
StarCoderData 1.4B 100BT codemonet-vd-1.4B-100BT-hf Viewer
Monet-HD FineWeb-Edu 850M 100BT monet-hd-850M-100BT-hf
1.4B 100BT monet-hd-1.4B-100BT-hf
4.1B 100BT monet-hd-4.1B-100BT-hf
#### Instruction-Tuned Models
Model Purpose Recipe #Params Checkpoint
Monet-VD Chat Completion SmolLM 1.4B monet-vd-1.4B-100BT-chat-hf
Vision-Language Model LLaVA 1.6B visionmonet-vd-1.4B-100BT-hf
## Evaluation ### Open-Ended LLM Benchmarks
ModelMMLUARCWGPIQASIQAOBQAHSCSQAAvg.
0-shot
Monet-HD 850M0.3200.4600.5060.6990.4160.3640.4650.3370.446
Monet-VD 850M0.3280.4560.5300.7080.4170.3560.4880.3430.453
Monet-HD 1.4B0.3380.4710.5380.7140.4180.3820.5010.3390.463
Monet-VD 1.4B0.3520.4950.5220.7270.4230.4180.5290.3630.478
Monet-HD 4.1B0.3750.5580.5600.7410.4270.4140.5710.3790.503
Monet-VD 4.1B0.3800.5470.5570.7510.4370.4240.6040.3890.511
5-shot
Monet-HD 850M0.3320.5370.5100.6970.4090.3460.4790.4200.466
Monet-VD 850M0.3410.5480.5200.7090.4370.3680.5040.4540.485
Monet-HD 1.4B0.3520.5440.5300.7200.4320.3600.5180.4410.487
Monet-VD 1.4B0.3600.5470.5260.7300.4410.4220.5510.5010.510
Monet-HD 4.1B0.3850.6030.5450.7420.4630.4120.5880.5450.535
Monet-VD 4.1B0.3980.6250.5640.7610.4700.4380.6190.5250.550
### Detoxification Detoxification task performances are evaluated on the [Monet-VD 1.4B](MonetLLM/monet-vd-1.4B-100BT-hf) model. #### RealToxicityPrompts
Masking
Threshold
Masking
Ratio
Exp. Max. Toxicity Toxicity Prob. Avg. Perf.
Toxic Non-Toxic Toxic Non-Toxic
0.795 0.269 0.926 0.08 0.478
0.2 1.0% 0.767 0.268 0.909 0.07 0.479
0.1 4.1% 0.657 0.270 0.768 0.08 0.478
0.05 14.4% 0.552 0.256 0.564 0.05 0.467
#### ToxiGen
Masking
Threshold
Masking
Ratio
RoBERTa Score Avg. Perf.
Hate Neutral
0.642 0.035 0.478
0.2 1.4% 0.643 0.033 0.478
0.1 5.4% 0.504 0.028 0.473
0.05 15.0% 0.430 0.027 0.455
## Examples ### Text Generation ```python from transformers import pipeline model_name = "MonetLLM/monet-vd-1.4B-100BT-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"]) ``` ### Code Generation ```python from transformers import pipeline model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) text = ''' def print_len(x: str): """For a given string x, print the length of x.""" ''' print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0]) ``` ### Chat Completion ```python from transformers import pipeline model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) text = tokenizer.apply_chat_template( [{"role": "user", "content": "Hi! How are you?"}], add_generation_prompt=True, tokenize=False, ) print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"]) ``` ### Using vLLM The custom implementation of vLLM is provided in [the repository](https://github.com/dmis-lab/Monet/blob/main/modeling_monet_vllm.py). ```python from vllm import LLM, ModelRegistry, SamplingParams from modeling_monet_vllm import MonetForCausalLM # Register Monet architecture with vLLM ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM) model = LLM( "MonetLLM/monet-vd-1.4B-100BT-hf", trust_remote_code=True, dtype="bfloat16", gpu_memory_utilization=0.8 ) sampling_params = SamplingParams(max_tokens=20, temperature=1.0) print(model.generate("The key to life is", sampling_params)[0].outputs[0].text) ``` ## Training ### Model - Architecture: Monet - Pretraining tokens: 100B - Precision: bfloat16 ### Hardware - TPUs: TPU-v4-64 Pod Slice (supported by [TRC Program](https://sites.research.google/trc/about/)) ### Software - Training Framework: [Jax](https://github.com/jax-ml/jax), [Flax](https://github.com/google/flax) ## Intended Use ### Primary Intended Uses This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include: - Mechanistic interpretability research for language models - Text generation with enhanced interpretability - Code generation (CodeMonet variant) - Chat completion (instruction-tuned variant) - Vision-language tasks (VisionMonet variant) ### Out-of-Scope Uses This model has not been explicitly developed or tested for all potential downstream applications. Therefore: 1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios. 2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu). 3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released. 4. Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope. ## Model Architecture Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations: - Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts - Fine-grained expert specialization: offers clear insight into model behavior - Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level. ## Ethical Considerations ### Transparency - Designed specifically for enhanced interpretability - Enables understanding of internal model behavior - Allows tracking of knowledge attribution ### Control - Supports toxicity mitigation - Enables domain-specific knowledge control - Maintains performance while adjusting behavior ## License and Usage Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes: - Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models - Research and educational use is encouraged - Commercial use is subject to Apache 2.0 license terms ## Citation ```bibtex @article{park2024monet, title={{Monet: Mixture of Monosemantic Experts for Transformers}}, author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang}, journal={arXiv preprint arXiv:2404.05567}, year={2024} } ```