Model Card for C4AI Command R7B 4bit

Model Summary

C4AI Command R7B is an open weights research release of a 7B billion parameter model with advanced capabilities optimized for a variety of use cases including reasoning, summarization, question answering, and code. The model is trained to perform sophisticated tasks including Retrieval Augmented Generation (RAG) and tool use. The model also has powerful agentic capabilities with the ability to use and combine multiple tools over multiple steps to accomplish more difficult tasks. It obtains top performance on enterprise relevant code use cases. C4AI Command R7B is a multilingual model trained on 23 languages.

Developed by: Cohere and Cohere For AI

Cohere2ForCausalLM(
  (model): Cohere2Model(
    (embed_tokens): Embedding(256000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x Cohere2DecoderLayer(
        (self_attn): Cohere2Attention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): Cohere2MLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Cohere2LayerNorm()
      )
    )
    (norm): Cohere2LayerNorm()
    (rotary_emb): Cohere2RotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=256000, bias=False)
  (_cache): HybridCache()
)

Try C4AI Command R7B

You can try out C4AI Command R7B before downloading the weights in our hosted Hugging Face Space.

Usage

Please install transformers from the source repository that includes the necessary changes for this model.

# !pip install -U "git+https://github.com/huggingface/transformers.git" bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Configuration de la quantification
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

# ID du modèle
model_id = "CohereForAI/c4ai-command-r7b-12-2024"

# Chargement du tokenizer et du modèle
tokenizer = AutoTokenizer.from_pretrained(model_id, token="")
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, token="")

# Format message with the c4ai-command-r7b-12-2024 chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")

gen_tokens = model.generate(
    input_ids,
    max_new_tokens=2048,
    do_sample=True,
    temperature=0.9,
)

gen_text = tokenizer.decode(gen_tokens[0], skip_special_tokens=True)
print(gen_text)

Model Details

Input: Models input text only.

Output: Models generate text only.

Model Architecture: This is an auto-regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine-tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety. The model features three layers with sliding window attention (window size 4096) and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.

Languages covered: The model has been trained on 23 languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Chinese, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.

Context length: Command R7B supports a context length of 128K.

A well-rounded model

Command R7B excels on standardized and externally verifiable benchmarks such as the HuggingFace Open LLM Leaderboard. Compared to other similarly sized open-weights models, Command R7B ranks first with strong performance across all tasks.

Command R7B Gemma 2 IT 9B Ministral 8B Llama 3.1 8B
Average 31.4 28.9 22 28.2
IFEval 77.9 74.4 58.96 78.6
BBH 36.1 42.1 25.82 29.9
MATH hard 26.4 0.2 6.5 19.3
GPQA 7.7 14.8 4.5 2.4
MUSR 11.6 9.74 10.7 8.41
MMLU-Pro 28.5 32 25.5 30.7

HuggingFace Leaderboard evaluation results. Competitor numbers are taken from the official leaderboard. Command R7B results are calculated by us using the official HuggingFace prompts and evaluation code.

Chat Capabilities:

Command R7B can be configured as both a conversational model and an instruct model. The conversational mode conditions the model on interactive behaviour, meaning it is expected to reply in a conversational fashion, provides introductory statements and follow-up questions, and uses Markdown as well as LaTeX where appropriate. It is optimized for interactive experiences, such as chatbots, where the model engages in dialogue.

The instruct mode, in contrast, conditions the model to provide concise yet comprehensive responses, and does not use Markdown / LaTeX by default. It is designed for non-interactive, task-focused use cases like extracting information, summarizing text, translation, and categorization.

Note: by default, Command R7B is delivered without a system preamble. We recommend to add the conversational or instruct preambles as described in our docs.

RAG Capabilities:

Command R7B has been trained specifically for tasks like the final step of Retrieval Augmented Generation (RAG).

RAG with Command R7B is supported through chat templates in Transformers. The model takes a conversation as input (with an optional user-supplied system preamble), along with a list of document snippets.

RAG Example [CLICK TO EXPAND]
# Define conversation input
conversation = [{"role": "user", "content": "What has Man always dreamed of?"}]

# Define documents for retrieval-based generation
documents = [
  {"heading": "The Moon: Our Age-Old Foe", "body": "Man has always dreamed of destroying the moon. In this essay, I shall..."},
  {"heading": "Love is all you need", "body": "Man's dream has always been to find love. This profound lesson..."}
]

# Get the RAG prompt
input_prompt = tokenizer.apply_chat_template(conversation=conversation, documents=documents, tokenize=False, add_generation_prompt=True, return_tensors="pt")
# Tokenize the prompt
input_ids = tokenizer.encode_plus(input_prompt, return_tensors="pt")

You can then generate text from this input as normal.

Document snippets should be short chunks, rather than long documents, typically around 100-400 words per chunk, formatted as key-value pairs. The keys should be short descriptive strings, the values can be text or semi-structured.

You may find that simply including relevant documents directly in a user message works just as well, or better than using the documents parameter to render the special RAG template. The RAG template is generally a strong default. We encourage users to play with both, and to evaluate which mode works best for their specific use case.

Note that this was a very brief introduction to RAG - for more information, see the Command R7B prompt format docs and the Transformers RAG documentation.

Tool Use Capabilities:

Command R7B has been specifically trained with conversational tool use capabilities. This allows the model to interact with external tools like APIs, databases, or search engines. Instructions on how to leverage these capabilities in Hugging Face are coming soon.

Code Capabilities:

Command R7B has meaningfully improved on code capabilities. In addition to academic code benchmarks, we have evaluated it on enterprise-relevant scenarios, including SQL and code translation, where it outperforms other models of similar size. Try these out by requesting code snippets, code explanations, or code rewrites. For better performance, we also recommend using a low temperature (and even greedy decoding) for code-generation related instructions.

Model Card Contact

For errors or additional questions about details in this model card, contact [email protected].

Terms of Use:

We hope that the release of this model will make community-based research efforts more accessible, by releasing the weights of a highly performant 7 billion parameter model to researchers all over the world. This model is governed by a CC-BY-NC License with an acceptable use addendum, and also requires adhering to C4AI's Acceptable Use Policy.

Try Chat:

You can try Command R7B chat in the playground here. You can also use it in our dedicated Hugging Face Space here.

Downloads last month
315
Safetensors
Model size
4.65B params
Tensor type
F32
·
FP16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Svngoku/c4ai-command-r7b-12-2024-4bit

Quantized
(7)
this model

Space using Svngoku/c4ai-command-r7b-12-2024-4bit 1