Introduction

We're excited to introduce our latest model, the Lite Oute 2 Mamba2Attn 250M.
This is our third generation model featuring the new Mamba2 architecture with attention layers.
If you're interested in more technical details that covers the training process, architecture, and performance: Read the full blog post here
This is a base pre-trained model, not an instruction-tuned model for direct interaction. It is specifically designed as a starting point for further fine-tuning on specific tasks or downstream datasets.
It serves as a foundation for developers and researchers to customize and optimize for their particular applications through additional training on task-specific data.

Model Variants

Training Details

The model was pre-trained on 30 billion tokens using a balanced mixture of datasets:

  • 50% dclm-baseline-1.0
  • 50% fineweb-edu

Base model training was conducted on single NVIDIA 4090 and NVIDIA H100 GPUs, with the following key parameters:

  • Max learning rate: 4e-4
  • Min learning rate: 1e-4
  • Block size: 4096
  • Token batches: ~100k tokens

Benchmark Results

Benchmark Lite-Oute-2-Mamba2Attn-250M-Base
ARC-C (0-shot) 26.88
ARC-E (0-shot) 53.54
HellaSWAG (0-shot) 38.00
MMLU (0-shot) 24.87
OpenBookQA (0-shot) 30.20
PIQA (0-shot) 66.27
Winogrande (0-shot) 52.01
ARC-C (5-shot) 27.22
ARC-E (5-shot) 55.51
HellaSWAG (5-shot) 38.17
MMLU (5-shot) 25.59
OpenBookQA (5-shot) 30.40
PIQA (5-shot) 66.59
Winogrande (5-shot) 52.49

Let's look at some real world examples

Input:

<s>Scientists have made a breakthrough in renewable energy by developing a new type of

Output:

solar cell that can convert sunlight directly into electricity.
The researchers found that the material, called a carbon nanotube (CNT) solar cell, is able to capture and store light energy from the sun's rays and converts it into electricity using a process known as photoelectrochemistry.
This process involves passing electrons through a layer of semiconducting materials like silicon or germanium, which are then connected together via cables. As the electrons pass through these layers, they cause them to become excited and move towards anode, where they are oxidized and released their stored energy.
In this process, the CNT solar cells generate a tiny amount of power, but when the current flows through the circuit, it produces more than enough electricity to run electronic devices such as computers, lights, and other electronic devices.

Usage with HuggingFace transformers

Model weights were converted from the original Mamba2 implementation to be Hugging Face compatible.
Due to the lack of official support for Mamba2 attention layers in Hugging Face Transformers, custom modeling files are included.
The implementation of Mamba2 with attention in the modeling files comes from Pull Request #32027 in the Hugging Face Transformers repository: https://github.com/huggingface/transformers/pull/32027

To speed up inference, we recommend installing mamba-ssm and flash attention 2.

mamba-ssm:

pip install causal-conv1d>=1.4.0
pip install mamba-ssm

flash attention 2:

pip install flash-attn --no-build-isolation
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(
    "OuteAI/Lite-Oute-2-Mamba2Attn-Base",
    # To allow custom modeling files
    trust_remote_code=True,

    # If you have installed flash attention 2
    # attn_implementation="flash_attention_2",
    # torch_dtype=torch.bfloat16,
)
model.to(device)
tokenizer = AutoTokenizer.from_pretrained("OuteAI/Lite-Oute-2-Mamba2Attn-Base")

def generate_response(message: str, temperature: float = 0.2, repetition_penalty: float = 1.12) -> str:
    # Convert message to PyTorch tensors
    input_ids = tokenizer.encode(
        message, return_tensors="pt"
    ).to(device)
    # Generate the response
    output = model.generate(
        input_ids,
        max_length=256,
        temperature=temperature,
        repetition_penalty=repetition_penalty,
        do_sample=True
    ) 
    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text
message = "Scientists have made a breakthrough in renewable energy by developing a new type of"
response = generate_response(message)
print(response)

Disclaimer

By using this model, you acknowledge that you understand and assume the risks associated with its use. You are solely responsible for ensuring compliance with all applicable laws and regulations. We disclaim any liability for problems arising from the use of this open-source model, including but not limited to direct, indirect, incidental, consequential, or punitive damages. We make no warranties, express or implied, regarding the model's performance, accuracy, or fitness for a particular purpose. Your use of this model is at your own risk, and you agree to hold harmless and indemnify us, our affiliates, and our contributors from any claims, damages, or expenses arising from your use of the model.

Downloads last month
7
Safetensors
Model size
252M params
Tensor type
F32
ยท
Inference API
Unable to determine this model's library. Check the docs .

Model tree for OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base

Finetunes
1 model

Collections including OuteAI/Lite-Oute-2-Mamba2Attn-250M-Base