Update README.md
Browse files
REDAME.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- "zh"
|
4 |
+
thumbnail: "https://raw.githubusercontent.com/SIKU-BERT/SikuBERT/main/appendix/sikubert.png"
|
5 |
+
tags:
|
6 |
+
- "chinese"
|
7 |
+
- "classical chinese"
|
8 |
+
- "literary chinese"
|
9 |
+
- "ancient chinese"
|
10 |
+
- "bert"
|
11 |
+
- "roberta"
|
12 |
+
- "pytorch"
|
13 |
+
license: "apache-2.0"
|
14 |
+
---
|
15 |
+
# SikuBERT
|
16 |
+
## Model description
|
17 |
+
![SikuBERT](https://raw.githubusercontent.com/SIKU-BERT/SikuBERT/main/appendix/sikubert.png)
|
18 |
+
Digital humanities research needs the support of large-scale corpus and high-performance ancient Chinese natural language processing tools. The pre-training language model has greatly improved the accuracy of text mining in English and modern Chinese texts. At present, there is an urgent need for a pre-training model specifically for the automatic processing of ancient texts. We used the verified high-quality “Siku Quanshu” full-text corpus as the training set, based on the BERT deep language model architecture, we constructed the SikuBERT and SikuRoBERTa pre-training language models for intelligent processing tasks of ancient Chinese.
|
19 |
+
## How to use
|
20 |
+
```python
|
21 |
+
from transformers import AutoTokenizer, AutoModel
|
22 |
+
tokenizer = AutoTokenizer.from_pretrained("SIKU-BERT/sikubert")
|
23 |
+
model = AutoModel.from_pretrained("SIKU-BERT/sikubert")
|
24 |
+
```
|
25 |
+
## About Us
|
26 |
+
We are from Nanjing Agricultural University.
|
27 |
+
> Created with by SIKU-BERT [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/SIKU-BERT)
|