--- license: mit language: - ja - zh pipeline_tag: translation datasets: - ayymen/Pontoon-Translations widget: - text: <-ja2zh-> フェルディナント・ラッサール \n は、プロイセンの政治学者、哲学者、法学者、社会主義者、労働運動指導者。ドイツ社会民主党の母体となる全ドイツ労働者同盟の創設者である。社会主義共和政の統一ドイツを目指しつつも、…… inference: parameters: repetition_penalty: 1.4 --- # new model: iryneko571/mt5-small-translation-ja_zh
better in most aspects, more like a base model with pure data
数值上更好,是用更纯的数据跑的
includes colab notebook
已经配置了colab的notebook,可以直接测试翻译,不需要安装
# Release Notes * this model is finetuned from mt5-small * will use about 1.5G vram, fp16 will be less than 1G(if batch size is small), cpu inference speed is ok anyway * used a trimmed piece of pontoon dataset that features ja to zh translate part * also scrambled bunch of the translation from mt5-translation-ja_zh-game-v0.1, which is a large amount of junk for training * reason for making this model
Testing the ideas of using pontoon dataset
Constructing a flexible translation evaluation standard, need a poor performance model to compare # 模型公开声明 * 这个模型由 mt5-translation-ja_zh 继续训练得来 * 使用大于1.5g的显存,fp16载入会小于1G显存(batch拉高会大于1G),使用cpu运作速度也还可以 * 制作这个模型的原因
尝试使用现有的模型精调,小模型训练速度奇快
* 本模型缺陷
本身就是用来做测试的,虽然使用的显存很低,但翻译能力奇差
# 简单的后端应用 还没稳定调试,慎用 * https://github.com/IryNeko/RabbitCafe # A more precise example using it # 使用指南 ```python from transformers import pipeline model_name="iryneko571/mt5-translation-ja_zh-game-small" #pipe = pipeline("translation",model=model_name,tokenizer=model_name,repetition_penalty=1.4,batch_size=1,max_length=256) pipe = pipeline("translation", model=model_name, repetition_penalty=1.4, batch_size=1, max_length=256 ) def translate_batch(batch, language='<-ja2zh->'): # batch is an array of string i=0 # quickly format the list while i https://discord.gg/JmjPmJjA
If you need any help, a test server or just want to chat
如果需要帮助,需要试试最新的版本,或者只是为了看下我是啥,可以进channel看看(这边允许发布这个吗?)