update #2
Browse files
README.md
CHANGED
@@ -5,11 +5,15 @@ license: mit
|
|
5 |
---
|
6 |
# ✨ bert-restore-punctuation
|
7 |
[![forthebadge](https://forthebadge.com/images/badges/gluten-free.svg)]()
|
|
|
8 |
This a bert-base-uncased model finetuned for punctuation restoration on [Yelp Reviews](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews).
|
|
|
9 |
The model predicts the punctuation and upper-casing of plain, lower-cased text. An example use case can be ASR output. Or other cases when text has lost punctuation.
|
|
|
10 |
This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
|
11 |
|
12 |
Model restores the following punctuations -- [` ! ? . , - : ; '`]
|
|
|
13 |
Model also restores upper-casing of words.
|
14 |
|
15 |
-----------------------------------------------
|
@@ -34,7 +38,9 @@ rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector th
|
|
34 |
|
35 |
-----------------------------------------------
|
36 |
## 📡 Training data
|
|
|
37 |
Here is the number of product reviews we used for finetuning the model:
|
|
|
38 |
| Language | Number of reviews |
|
39 |
| -------- | ----------------- |
|
40 |
| English | 560,000 |
|
@@ -51,7 +57,6 @@ The fine-tuned model obtained the following accuracy on 45,990 held-out text sam
|
|
51 |
|
52 |
Below is a breakdown of the performance of the model by each label:
|
53 |
|
54 |
-
|
55 |
| label | precision | recall | f1-score | support|
|
56 |
| --------- | -------------|-------- | ----------|--------|
|
57 |
| **!** | 0.45 | 0.17 | 0.24 | 424
|
@@ -69,6 +74,7 @@ Below is a breakdown of the performance of the model by each label:
|
|
69 |
| **?+Upper** | 0.40 | 0.50 | 0.44 | 4
|
70 |
| **none** | 0.96 | 0.96 | 0.96 |35352
|
71 |
| **Upper** | 0.84 | 0.82 | 0.83 | 5442
|
|
|
72 |
-----------------------------------------------
|
73 |
|
74 |
## ☕ Contact
|
|
|
5 |
---
|
6 |
# ✨ bert-restore-punctuation
|
7 |
[![forthebadge](https://forthebadge.com/images/badges/gluten-free.svg)]()
|
8 |
+
|
9 |
This a bert-base-uncased model finetuned for punctuation restoration on [Yelp Reviews](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews).
|
10 |
+
|
11 |
The model predicts the punctuation and upper-casing of plain, lower-cased text. An example use case can be ASR output. Or other cases when text has lost punctuation.
|
12 |
+
|
13 |
This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
|
14 |
|
15 |
Model restores the following punctuations -- [` ! ? . , - : ; '`]
|
16 |
+
|
17 |
Model also restores upper-casing of words.
|
18 |
|
19 |
-----------------------------------------------
|
|
|
38 |
|
39 |
-----------------------------------------------
|
40 |
## 📡 Training data
|
41 |
+
|
42 |
Here is the number of product reviews we used for finetuning the model:
|
43 |
+
|
44 |
| Language | Number of reviews |
|
45 |
| -------- | ----------------- |
|
46 |
| English | 560,000 |
|
|
|
57 |
|
58 |
Below is a breakdown of the performance of the model by each label:
|
59 |
|
|
|
60 |
| label | precision | recall | f1-score | support|
|
61 |
| --------- | -------------|-------- | ----------|--------|
|
62 |
| **!** | 0.45 | 0.17 | 0.24 | 424
|
|
|
74 |
| **?+Upper** | 0.40 | 0.50 | 0.44 | 4
|
75 |
| **none** | 0.96 | 0.96 | 0.96 |35352
|
76 |
| **Upper** | 0.84 | 0.82 | 0.83 | 5442
|
77 |
+
|
78 |
-----------------------------------------------
|
79 |
|
80 |
## ☕ Contact
|