ljvmiranda921
/

tl_calamancy_md

Token Classification

Model card Files Files and versions Community

ljvmiranda921 commited on 2 days ago

Commit

1b512c5

•

1 Parent(s): 1a299a2

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ library_name: spacy
 # calamanCy: Tagalog NLP pipelines in spaCy
-This is the latest **medium-sized pipeline** for calamanCy.
 Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
 This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
 The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
@@ -32,7 +32,7 @@ The training code can be found [in GitHub](https://github.com/ljvmiranda921/cala
 | **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
 | **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
 | **License** | `MIT` |
-| **Author** | [Lester James V. Miranda](https://github.com/ljvmiranda921/calamanCy) |
 ### Label Scheme

 # calamanCy: Tagalog NLP pipelines in spaCy
+This is the latest **medium-sized pipeline** for [calamanCy](https://arxiv.org/abs/2311.07171).
 Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
 This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
 The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
 | **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
 | **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
 | **License** | `MIT` |
+| **Author** | [Lester James V. Miranda](https://ljvmiranda921.github.io) |
 ### Label Scheme