jcblaise commited on
Commit
6650639
·
1 Parent(s): 707731f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -26
README.md CHANGED
@@ -11,35 +11,10 @@ inference: false
11
  # BERT Tagalog Base Cased (Whole Word Masking)
12
  Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community. This particular version uses whole word masking.
13
 
14
- ## Usage
15
- The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
16
-
17
- ```python
18
- from transformers import TFAutoModel, AutoModel, AutoTokenizer
19
-
20
- # TensorFlow
21
- model = TFAutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', from_pt=True)
22
- tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', do_lower_case=False)
23
-
24
- # PyTorch
25
- model = AutoModel.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM')
26
- tokenizer = AutoTokenizer.from_pretrained('jcblaise/bert-tagalog-base-cased-WWM', do_lower_case=False)
27
- ```
28
- Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
29
-
30
  ## Citations
31
  All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
32
 
33
  ```
34
- @inproceedings{localization2020cruz,
35
- title={{Localization of Fake News Detection via Multitask Transfer Learning}},
36
- author={Cruz, Jan Christian Blaise and Tan, Julianne Agatha and Cheng, Charibeth},
37
- booktitle={Proceedings of The 12th Language Resources and Evaluation Conference},
38
- pages={2589--2597},
39
- year={2020},
40
- url={https://www.aclweb.org/anthology/2020.lrec-1.315}
41
- }
42
-
43
  @article{cruz2020establishing,
44
  title={Establishing Baselines for Text Classification in Low-Resource Languages},
45
  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
@@ -59,4 +34,4 @@ All model details and training setups can be found in our papers. If you use our
59
  Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
60
 
61
  ## Contact
62
- If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
 
11
  # BERT Tagalog Base Cased (Whole Word Masking)
12
  Tagalog version of BERT trained on a large preprocessed text corpus scraped and sourced from the internet. This model is part of a larger research project. We open-source the model to allow greater usage within the Filipino NLP community. This particular version uses whole word masking.
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Citations
15
  All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
16
 
17
  ```
 
 
 
 
 
 
 
 
 
18
  @article{cruz2020establishing,
19
  title={Establishing Baselines for Text Classification in Low-Resource Languages},
20
  author={Cruz, Jan Christian Blaise and Cheng, Charibeth},
 
34
  Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
35
 
36
  ## Contact
37
+ If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at me@blaisecruz.com