Qwen2.5 32B for Japanese to English Light Novel translation

This model was fine-tuned on light and web novel for Japanese to English translation.

It can translate entire chapters (up to 32K tokens total for input and output).

This model contains the trained adapter on Qwen2.5-32B-Instruct. The gguf version is recommended for running.

Usage

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_name = "thefrigidliquidation/lightnovel-translate-Qwen2.5-32B"
model = AutoPeftModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Prompt format

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
<GLOSSARY>
マイン : Myne
</GLOSSARY>
マイン、ルッツが迎えに来たよ<|im_end|>
<|im_start|>assistant
Myne, Lutz is here to take you home.

The glossary is optional. Remove it if not needed.

Text preprocessing

The Japanese text must be preprocessed with the following clean_string function that replaces some unicode characters with ASCII equivalents. Failure to do this may cause issues.

import ftfy

FTFY_ADDITIONAL_MAP = {
    "—": "--",
    "–": "-",
    "⸻": "----",
    "«": "\"",
    "»": "\"",
    "〝": "\"",
    "〟": "\"",
    "✧": "*",
    "✽": "*",
    "⬤": "*",
    "⭘": "*",
    "∴": "*",
    "∵": "*",
    "✩": "*",
    "【": "[",
    "】": "]",
    "「": "[",
    "」": "]",
    "〖": "[",
    "〗": "]",
    "〈": "<",
    "〉": ">",
    "《": "<<",
    "》": ">>",
}

def clean_string(text: str, strip: bool = True) -> str:
    config = ftfy.TextFixerConfig(normalization="NFC")
    s = ftfy.fix_text(text, config=config)
    s = "\n".join((x.strip() if strip else x.rstrip()) for x in s.splitlines())
    for b, g in FTFY_ADDITIONAL_MAP.items():
        s = s.replace(b, g)
    return s

Glossary

You can provide up to 30 custom translations for nouns and character names at runtime. Prefix your chapter with glossary terms (one per line) Japanese term : English term inside <GLOSSARY></GLOSSARY> tags.

For example, if you wish to have マイン translated as Myne you can construct the input prompt with:

glossary = [
    {"ja": "マイン", "en": "Myne"},
]
chapter_text = "マイン、ルッツが迎えに来たよ"

def make_glossary_str(glossary: list[dict[str, str]]) -> str:
    if glossart is None or len(glossary) == 0:
        return ""
    unique_glossary = {(term['ja'], term['en']) for term in glossary}
    terms = "\n".join([f"{ja} : {en}" for ja, en in unique_glossary])
    return f"<GLOSSARY>\n{terms}\n</GLOSSARY>\n"

user_prompt = f"{make_glossary_str(glossary)}{clean_string(chapter_text)}"

<GLOSSARY>
マイン : Myne
</GLOSSARY>
マイン、ルッツが迎えに来たよ

thefrigidliquidation
/

lightnovel-translate-Qwen2.5-32B

Qwen2.5 32B for Japanese to English Light Novel translation

Usage

Prompt format

Text preprocessing

Glossary

Model tree for thefrigidliquidation/lightnovel-translate-Qwen2.5-32B