Qwen2.5 32B for Japanese to English Light Novel translation
This model was fine-tuned on light and web novel for Japanese to English translation.
It can translate entire chapters (up to 32K tokens total for input and output).
This model contains the trained adapter on Qwen2.5-32B-Instruct. The gguf version is recommended for running.
Usage
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_name = "thefrigidliquidation/lightnovel-translate-Qwen2.5-32B"
model = AutoPeftModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Prompt format
<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Example:
<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
<GLOSSARY>
γγ€γ³ : Myne
</GLOSSARY>
γγ€γ³γγ«γγγθΏγγ«ζ₯γγ<|im_end|>
<|im_start|>assistant
Myne, Lutz is here to take you home.
The glossary is optional. Remove it if not needed.
Text preprocessing
The Japanese text must be preprocessed with the following clean_string
function that replaces some unicode characters
with ASCII equivalents. Failure to do this may cause issues.
import ftfy
FTFY_ADDITIONAL_MAP = {
"β": "--",
"β": "-",
"βΈ»": "----",
"Β«": "\"",
"Β»": "\"",
"γ": "\"",
"γ": "\"",
"β§": "*",
"β½": "*",
"⬀": "*",
"β": "*",
"β΄": "*",
"β΅": "*",
"β©": "*",
"γ": "[",
"γ": "]",
"γ": "[",
"γ": "]",
"γ": "[",
"γ": "]",
"γ": "<",
"γ": ">",
"γ": "<<",
"γ": ">>",
}
def clean_string(text: str, strip: bool = True) -> str:
config = ftfy.TextFixerConfig(normalization="NFC")
s = ftfy.fix_text(text, config=config)
s = "\n".join((x.strip() if strip else x.rstrip()) for x in s.splitlines())
for b, g in FTFY_ADDITIONAL_MAP.items():
s = s.replace(b, g)
return s
Glossary
You can provide up to 30 custom translations for nouns and character names at runtime.
Prefix your chapter with glossary terms (one per line) Japanese term : English term
inside <GLOSSARY></GLOSSARY>
tags.
For example, if you wish to have γγ€γ³
translated as Myne
you can construct the input prompt with:
glossary = [
{"ja": "γγ€γ³", "en": "Myne"},
]
chapter_text = "γγ€γ³γγ«γγγθΏγγ«ζ₯γγ"
def make_glossary_str(glossary: list[dict[str, str]]) -> str:
if glossart is None or len(glossary) == 0:
return ""
unique_glossary = {(term['ja'], term['en']) for term in glossary}
terms = "\n".join([f"{ja} : {en}" for ja, en in unique_glossary])
return f"<GLOSSARY>\n{terms}\n</GLOSSARY>\n"
user_prompt = f"{make_glossary_str(glossary)}{clean_string(chapter_text)}"
<GLOSSARY>
γγ€γ³ : Myne
</GLOSSARY>
γγ€γ³γγ«γγγθΏγγ«ζ₯γγ
- Downloads last month
- 7
Model tree for thefrigidliquidation/lightnovel-translate-Qwen2.5-32B
Base model
Qwen/Qwen2.5-32B