ljvmiranda921
commited on
Commit
•
1b512c5
1
Parent(s):
1a299a2
Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ library_name: spacy
|
|
16 |
|
17 |
# calamanCy: Tagalog NLP pipelines in spaCy
|
18 |
|
19 |
-
This is the latest **medium-sized pipeline** for calamanCy.
|
20 |
Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
|
21 |
This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
|
22 |
The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
|
@@ -32,7 +32,7 @@ The training code can be found [in GitHub](https://github.com/ljvmiranda921/cala
|
|
32 |
| **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
|
33 |
| **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
|
34 |
| **License** | `MIT` |
|
35 |
-
| **Author** | [Lester James V. Miranda](https://github.
|
36 |
|
37 |
### Label Scheme
|
38 |
|
|
|
16 |
|
17 |
# calamanCy: Tagalog NLP pipelines in spaCy
|
18 |
|
19 |
+
This is the latest **medium-sized pipeline** for [calamanCy](https://arxiv.org/abs/2311.07171).
|
20 |
Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
|
21 |
This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
|
22 |
The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
|
|
|
32 |
| **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
|
33 |
| **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
|
34 |
| **License** | `MIT` |
|
35 |
+
| **Author** | [Lester James V. Miranda](https://ljvmiranda921.github.io) |
|
36 |
|
37 |
### Label Scheme
|
38 |
|