--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers --- # Monet: Mixture of Monosemantic Experts for Transformers ## Model Summary Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance. ### Resources and Technical Documentation - **GitHub Repository**: https://github.com/dmis-lab/Monet - **Paper**: https://arxiv.org/abs/2412.04139 - **Model Hub**: https://huggingface.co/MonetLLM - **Demo**: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer ### Available Checkpoints #### Base Models

Model	Dataset	#Params	#Tokens	Checkpoint	Demo
Monet-VD	FineWeb-Edu	850M	100BT	monet-vd-850M-100BT-hf
		1.4B	100BT	monet-vd-1.4B-100BT-hf	Viewer
		4.1B	100BT	monet-vd-4.1B-100BT-hf
	StarCoderData	1.4B	100BT	codemonet-vd-1.4B-100BT-hf	Viewer
Monet-HD	FineWeb-Edu	850M	100BT	monet-hd-850M-100BT-hf
		1.4B	100BT	monet-hd-1.4B-100BT-hf
		4.1B	100BT	monet-hd-4.1B-100BT-hf

#### Instruction-Tuned Models

Model	Purpose	Recipe	#Params	Checkpoint
Monet-VD	Chat Completion	SmolLM	1.4B	monet-vd-1.4B-100BT-chat-hf
Monet-VD	Vision-Language Model	LLaVA	1.6B	visionmonet-vd-1.4B-100BT-hf

## Evaluation ### Open-Ended LLM Benchmarks

Model	MMLU	ARC	WG	PIQA	SIQA	OBQA	HS	CSQA	Avg.
0-shot
Monet-HD 850M	0.320	0.460	0.506	0.699	0.416	0.364	0.465	0.337	0.446
Monet-VD 850M	0.328	0.456	0.530	0.708	0.417	0.356	0.488	0.343	0.453
Monet-HD 1.4B	0.338	0.471	0.538	0.714	0.418	0.382	0.501	0.339	0.463
Monet-VD 1.4B	0.352	0.495	0.522	0.727	0.423	0.418	0.529	0.363	0.478
Monet-HD 4.1B	0.375	0.558	0.560	0.741	0.427	0.414	0.571	0.379	0.503
Monet-VD 4.1B	0.380	0.547	0.557	0.751	0.437	0.424	0.604	0.389	0.511
5-shot
Monet-HD 850M	0.332	0.537	0.510	0.697	0.409	0.346	0.479	0.420	0.466
Monet-VD 850M	0.341	0.548	0.520	0.709	0.437	0.368	0.504	0.454	0.485
Monet-HD 1.4B	0.352	0.544	0.530	0.720	0.432	0.360	0.518	0.441	0.487
Monet-VD 1.4B	0.360	0.547	0.526	0.730	0.441	0.422	0.551	0.501	0.510
Monet-HD 4.1B	0.385	0.603	0.545	0.742	0.463	0.412	0.588	0.545	0.535
Monet-VD 4.1B	0.398	0.625	0.564	0.761	0.470	0.438	0.619	0.525	0.550

### Detoxification Detoxification task performances are evaluated on the [Monet-VD 1.4B](MonetLLM/monet-vd-1.4B-100BT-hf) model. #### RealToxicityPrompts

Masking Threshold	Masking Ratio	Exp. Max. Toxicity		Toxicity Prob.		Avg. Perf.
Masking Threshold	Masking Ratio	Toxic	Non-Toxic	Toxic	Non-Toxic	Avg. Perf.
–	–	0.795	0.269	0.926	0.08	0.478
0.2	1.0%	0.767	0.268	0.909	0.07	0.479
0.1	4.1%	0.657	0.270	0.768	0.08	0.478
0.05	14.4%	0.552	0.256	0.564	0.05	0.467

#### ToxiGen

Masking Threshold	Masking Ratio	RoBERTa Score		Avg. Perf.
Masking Threshold	Masking Ratio	Hate	Neutral	Avg. Perf.
–	–	0.642	0.035	0.478
0.2	1.4%	0.643	0.033	0.478
0.1	5.4%	0.504	0.028	0.473
0.05	15.0%	0.430	0.027	0.455

## Examples ### Text Generation ```python from transformers import pipeline model_name = "MonetLLM/monet-vd-1.4B-100BT-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"]) ``` ### Code Generation ```python from transformers import pipeline model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) text = ''' def print_len(x: str): """For a given string x, print the length of x.""" ''' print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0]) ``` ### Chat Completion ```python from transformers import pipeline model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf" pipe = pipeline( "text-generation", model_name, tokenizer=AutoTokenizer.from_pretrained(model_name), torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) text = tokenizer.apply_chat_template( [{"role": "user", "content": "Hi! How are you?"}], add_generation_prompt=True, tokenize=False, ) print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"]) ``` ### Using vLLM The custom implementation of vLLM is provided in [the repository](https://github.com/dmis-lab/Monet/blob/main/modeling_monet_vllm.py). ```python from vllm import LLM, ModelRegistry, SamplingParams from modeling_monet_vllm import MonetForCausalLM # Register Monet architecture with vLLM ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM) model = LLM( "MonetLLM/monet-vd-1.4B-100BT-hf", trust_remote_code=True, dtype="bfloat16", gpu_memory_utilization=0.8 ) sampling_params = SamplingParams(max_tokens=20, temperature=1.0) print(model.generate("The key to life is", sampling_params)[0].outputs[0].text) ``` ## Training ### Model - Architecture: Monet - Pretraining tokens: 100B - Precision: bfloat16 ### Hardware - TPUs: TPU-v4-64 Pod Slice (supported by [TRC Program](https://sites.research.google/trc/about/)) ### Software - Training Framework: [Jax](https://github.com/jax-ml/jax), [Flax](https://github.com/google/flax) ## Intended Use ### Primary Intended Uses This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include: - Mechanistic interpretability research for language models - Text generation with enhanced interpretability - Code generation (CodeMonet variant) - Chat completion (instruction-tuned variant) - Vision-language tasks (VisionMonet variant) ### Out-of-Scope Uses This model has not been explicitly developed or tested for all potential downstream applications. Therefore: 1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios. 2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu). 3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released. 4. Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope. ## Model Architecture Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations: - Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts - Fine-grained expert specialization: offers clear insight into model behavior - Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level. ## Ethical Considerations ### Transparency - Designed specifically for enhanced interpretability - Enables understanding of internal model behavior - Allows tracking of knowledge attribution ### Control - Supports toxicity mitigation - Enables domain-specific knowledge control - Maintains performance while adjusting behavior ## License and Usage Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes: - Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models - Research and educational use is encouraged - Commercial use is subject to Apache 2.0 license terms ## Citation ```bibtex @article{park2024monet, title={{Monet: Mixture of Monosemantic Experts for Transformers}}, author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang}, journal={arXiv preprint arXiv:2404.05567}, year={2024} } ```