Word based BERT model
原模型及说明见:https://github.com/ZhuiyiTechnology/WoBERT
pytorch 模型见: https://github.com/JunnYu/WoBERT_pytorch
安装 WoBertTokenizer
pip install git+https://github.com/JunnYu/WoBERT_pytorch.git
TF Example
from transformers import TFBertForMaskedLM as WoBertForMaskedLM
from wobert import WoBertTokenizer
import tensorflow as tf
pretrained_model_or_path = 'qinluo/wobert-chinese-plus'
tokenizer = WoBertTokenizer.from_pretrained(pretrained_model_or_path)
model = WoBertForMaskedLM.from_pretrained(pretrained_model_or_path)
text = '今天[MASK]很好,我[MASK]去公园玩。'
inputs = tokenizer(text, return_tensors='tf')
outputs = model(**inputs).logits[0]
outputs_sentence = ''
for i, id in enumerate(tokenizer.encode(text)):
if id == tokenizer.mask_token_id:
tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(outputs[i], k=5)[1])
outputs_sentence += '[' + '|'.join(tokens) + ']'
else:
outputs_sentence += ''.join(tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))
print(outputs_sentence)
# 今天[天气|阳光|天|心情|空气]很好,我[想|要|打算|准备|就]去公园玩。
Pytorch Example
from transformers import BertForMaskedLM as WoBertForMaskedLM
from wobert import WoBertTokenizer
pretrained_model_or_path = 'qinluo/wobert-chinese-plus'
tokenizer = WoBertTokenizer.from_pretrained(pretrained_model_or_path)
model = WoBertForMaskedLM.from_pretrained(pretrained_model_or_path)
text = '今天[MASK]很好,我[MASK]去公园玩。'
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs).logits[0]
outputs_sentence = ''
for i, id in enumerate(tokenizer.encode(text)):
if id == tokenizer.mask_token_id:
tokens = tokenizer.convert_ids_to_tokens(outputs[i].topk(k=5)[1])
outputs_sentence += '[' + '|'.join(tokens) + ']'
else:
outputs_sentence += ''.join(tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))
print(outputs_sentence)
# 今天[天气|阳光|天|心情|空气]很好,我[想|要|打算|准备|就]去公园玩。
引用
Bibtex:
@techreport{zhuiyiwobert,
title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI},
author={Jianlin Su},
year={2020},
url="https://github.com/ZhuiyiTechnology/WoBERT",
}
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.