|
--- |
|
language: ko |
|
license: apache-2.0 |
|
tags: |
|
- korean |
|
--- |
|
|
|
# KrELECTRA-base-mecab |
|
Korean-based Pre-trained ELECTRA Language Model using Mecab (Morphological Analyzer) |
|
|
|
## Usage |
|
|
|
### Load model and tokenizer |
|
|
|
```python |
|
>>> from transformers import AutoTokenizer, AutoModelForPreTraining |
|
>>> model = AutoModelForPreTraining.from_pretrained("Jinhwan/krelectra-base-mecab") |
|
>>> tokenizer = AutoTokenizer.from_pretrained("Jinhwan/krelectra-base-mecab") |
|
``` |
|
|
|
### Tokenizer example |
|
|
|
```python |
|
>>> from transformers import AutoTokenizer |
|
>>> tokenizer = AutoTokenizer.from_pretrained("Jinhwan/krelectra-base-mecab") |
|
>>> tokenizer.tokenize("[CLS] 한국어 ELECTRA를 공유합니다. [SEP]") |
|
['[CLS]', '한국어', 'EL', '##ECT', '##RA', '##를', '공유', '##합', '##니다', '.', '[SEP]'] |
|
>>> tokenizer.convert_tokens_to_ids(['[CLS]', '한국어', 'EL', '##ECT', '##RA', '##를', '공유', '##합', '##니다', '.', '[SEP]']) |
|
[2, 7214, 24023, 24663, 26580, 3195, 7086, 3746, 5500, 17, 3] |
|
|