conan1024hao
commited on
Commit
·
14a3793
1
Parent(s):
c4e642c
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ from transformers import AutoTokenizer, AutoModelForMaskedLM
|
|
22 |
tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
|
23 |
model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
|
24 |
```
|
25 |
-
You don't need any text segmentation when you fine-tune downstream tasks. (
|
26 |
|
27 |
### Tokenization
|
28 |
We use character-based tokenization with whole-word-masking strategy.
|
|
|
22 |
tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
|
23 |
model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
|
24 |
```
|
25 |
+
You don't need any text segmentation when you fine-tune downstream tasks. (Though you may obtain better results if you apply morphological analysis to the data before fine-tuning.)
|
26 |
|
27 |
### Tokenization
|
28 |
We use character-based tokenization with whole-word-masking strategy.
|