Qwen2.5 32B for Japanese to English Light Novel translation

This model was fine-tuned on light and web novel for Japanese to English translation.

It can translate entire chapters (up to 32K tokens total for input and output).

This model contains the trained adapter on Qwen2.5-32B-Instruct. The gguf version is recommended for running.

Usage

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_name = "thefrigidliquidation/lightnovel-translate-Qwen2.5-32B"
model = AutoPeftModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Prompt format

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
<GLOSSARY>
γƒžγ‚€γƒ³ : Myne
</GLOSSARY>
γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ<|im_end|>
<|im_start|>assistant
Myne, Lutz is here to take you home.

The glossary is optional. Remove it if not needed.

Text preprocessing

The Japanese text must be preprocessed with the following clean_string function that replaces some unicode characters with ASCII equivalents. Failure to do this may cause issues.

import ftfy

FTFY_ADDITIONAL_MAP = {
    "β€”": "--",
    "–": "-",
    "βΈ»": "----",
    "Β«": "\"",
    "Β»": "\"",
    "〝": "\"",
    "γ€Ÿ": "\"",
    "✧": "*",
    "✽": "*",
    "⬀": "*",
    "⭘": "*",
    "∴": "*",
    "∡": "*",
    "✩": "*",
    "【": "[",
    "】": "]",
    "γ€Œ": "[",
    "」": "]",
    "γ€–": "[",
    "γ€—": "]",
    "γ€ˆ": "<",
    "〉": ">",
    "γ€Š": "<<",
    "》": ">>",
}

def clean_string(text: str, strip: bool = True) -> str:
    config = ftfy.TextFixerConfig(normalization="NFC")
    s = ftfy.fix_text(text, config=config)
    s = "\n".join((x.strip() if strip else x.rstrip()) for x in s.splitlines())
    for b, g in FTFY_ADDITIONAL_MAP.items():
        s = s.replace(b, g)
    return s

Glossary

You can provide up to 30 custom translations for nouns and character names at runtime. Prefix your chapter with glossary terms (one per line) Japanese term : English term inside <GLOSSARY></GLOSSARY> tags.

For example, if you wish to have γƒžγ‚€γƒ³ translated as Myne you can construct the input prompt with:

glossary = [
    {"ja": "γƒžγ‚€γƒ³", "en": "Myne"},
]
chapter_text = "γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ"

def make_glossary_str(glossary: list[dict[str, str]]) -> str:
    if glossart is None or len(glossary) == 0:
        return ""
    unique_glossary = {(term['ja'], term['en']) for term in glossary}
    terms = "\n".join([f"{ja} : {en}" for ja, en in unique_glossary])
    return f"<GLOSSARY>\n{terms}\n</GLOSSARY>\n"

user_prompt = f"{make_glossary_str(glossary)}{clean_string(chapter_text)}"
<GLOSSARY>
γƒžγ‚€γƒ³ : Myne
</GLOSSARY>
γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for thefrigidliquidation/lightnovel-translate-Qwen2.5-32B

Base model

Qwen/Qwen2.5-32B
Adapter
(7)
this model
Quantizations
1 model