NanoTranslator-XS

English | 简体中文

Introduction

This is the x-small model of the NanoTranslator, currently supported only in English to Chinese.

The ONNX version of the model is also available in the repository.

All models are collected in the NanoTranslator Collection.

P. Arch. Act. V. H. I. L. A.H. K.H. Tie
XXL2 102 LLaMA SwiGLU 16K 1120 3072 6 16 8 True
XXL 100 LLaMA SwiGLU 16K 768 4096 8 24 8 True
XL 78 LLaMA GeGLU 16K 768 4096 6 24 8 True
L 49 LLaMA GeGLU 16K 512 2816 8 16 8 True
M2 22 Qwen2 GeGLU 4K 432 2304 6 24 8 True
M 22 LLaMA SwiGLU 8K 256 1408 16 16 4 True
S 9 LLaMA SwiGLU 4K 168 896 16 12 4 True
XS 2 LLaMA SwiGLU 2K 96 512 12 12 4 True
  • P. - Parameters (in million)
  • V. - vocab size
  • H. - hidden size
  • I. - intermediate size
  • L. - num layers
  • A.H. - num attention heads
  • K.H. - num kv heads
  • Tie - tie word embeddings

How to use

Prompt format as follows:

<|im_start|> {English Text} <|endoftext|>

Directly using transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'Mxode/NanoTranslator-XS'

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

def translate(text: str, model, **kwargs):
    generation_args = dict(
        max_new_tokens = kwargs.pop("max_new_tokens", 512),
        do_sample = kwargs.pop("do_sample", True),
        temperature = kwargs.pop("temperature", 0.55),
        top_p = kwargs.pop("top_p", 0.8),
        top_k = kwargs.pop("top_k", 40),
        **kwargs
    )

    prompt = "<|im_start|>" + text + "<|endoftext|>"
    model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

    generated_ids = model.generate(model_inputs.input_ids, **generation_args)
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

text = "I love to watch my favorite TV series."

response = translate(text, model, max_new_tokens=64, do_sample=False)
print(response)

ONNX

It has been measured that reasoning with ONNX models will be 2-10 times faster than reasoning directly with transformers models.

You should switch to onnx branch manually and download to local.

reference docs:

Using ORTModelForCausalLM

from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

model_path = "your/folder/to/onnx_model"

ort_model = ORTModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

text = "I love to watch my favorite TV series."

response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
print(response)

Using pipeline

from optimum.pipelines import pipeline

model_path = "your/folder/to/onnx_model"
pipe = pipeline("text-generation", model=model_path, accelerator="ort")

text = "I love to watch my favorite TV series."

response = pipe(text, max_new_tokens=64, do_sample=False)
response
Downloads last month
33
Safetensors
Model size
2.26M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Mxode/NanoTranslator-XS

Space using Mxode/NanoTranslator-XS 1

Collection including Mxode/NanoTranslator-XS