Token Classification
spaCy
Tagalog
ljvmiranda921 commited on
Commit
1b512c5
1 Parent(s): 1a299a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ library_name: spacy
16
 
17
  # calamanCy: Tagalog NLP pipelines in spaCy
18
 
19
- This is the latest **medium-sized pipeline** for calamanCy.
20
  Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
21
  This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
22
  The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
@@ -32,7 +32,7 @@ The training code can be found [in GitHub](https://github.com/ljvmiranda921/cala
32
  | **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
33
  | **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
34
  | **License** | `MIT` |
35
- | **Author** | [Lester James V. Miranda](https://github.com/ljvmiranda921/calamanCy) |
36
 
37
  ### Label Scheme
38
 
 
16
 
17
  # calamanCy: Tagalog NLP pipelines in spaCy
18
 
19
+ This is the latest **medium-sized pipeline** for [calamanCy](https://arxiv.org/abs/2311.07171).
20
  Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
21
  This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
22
  The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
 
32
  | **Vectors** | -1 keys, 200000 unique vectors (200 dimensions) |
33
  | **Sources** | [TLUnified NER Dataset](https://aclanthology.org/2023.sealp-1.2/) (Lester James V. Miranda)<br>[UD NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl) (Angelina Aquino and Lester James V. Miranda and Elsie Or)<br>[TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br>[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br>[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) |
34
  | **License** | `MIT` |
35
+ | **Author** | [Lester James V. Miranda](https://ljvmiranda921.github.io) |
36
 
37
  ### Label Scheme
38