felflare
/

bert-restore-punctuation

@@ -5,11 +5,15 @@ license: mit
 ---
 # ✨ bert-restore-punctuation
 [![forthebadge](https://forthebadge.com/images/badges/gluten-free.svg)]()
 This a bert-base-uncased model finetuned for punctuation restoration on [Yelp Reviews](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews).
 The model predicts the punctuation and upper-casing of plain, lower-cased text. An example use case can be ASR output. Or other cases when text has lost punctuation.
 This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
 Model restores the following punctuations -- [` ! ? . , - : ; '`]
 Model also restores upper-casing of words.
 -----------------------------------------------
@@ -34,7 +38,9 @@ rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector th
 -----------------------------------------------
 ## 📡 Training data
 Here is the number of product reviews we used for finetuning the model:
 | Language | Number of reviews |
 | -------- | ----------------- |
 | English  | 560,000           |
@@ -51,7 +57,6 @@ The fine-tuned model obtained the following accuracy on 45,990 held-out text sam
 Below is a breakdown of the performance of the model by each label:
 |  label    |   precision  |  recall | f1-score  | support|
 | --------- | -------------|-------- | ----------|--------|
 |     **!**    |   0.45       | 0.17    |  0.24     |  424
@@ -69,6 +74,7 @@ Below is a breakdown of the performance of the model by each label:
 |     **?+Upper**    |   0.40       | 0.50    |  0.44     |    4
 |     **none**    |   0.96       | 0.96    |  0.96     |35352
 |     **Upper**    |   0.84       | 0.82    |  0.83     | 5442
 -----------------------------------------------
 ## ☕ Contact

 ---
 # ✨ bert-restore-punctuation
 [![forthebadge](https://forthebadge.com/images/badges/gluten-free.svg)]()
 This a bert-base-uncased model finetuned for punctuation restoration on [Yelp Reviews](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews).
 The model predicts the punctuation and upper-casing of plain, lower-cased text. An example use case can be ASR output. Or other cases when text has lost punctuation.
 This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
 Model restores the following punctuations -- [` ! ? . , - : ; '`]
 Model also restores upper-casing of words.
 -----------------------------------------------
 -----------------------------------------------
 ## 📡 Training data
 Here is the number of product reviews we used for finetuning the model:
 | Language | Number of reviews |
 | -------- | ----------------- |
 | English  | 560,000           |
 Below is a breakdown of the performance of the model by each label:
 |  label    |   precision  |  recall | f1-score  | support|
 | --------- | -------------|-------- | ----------|--------|
 |     **!**    |   0.45       | 0.17    |  0.24     |  424
 |     **?+Upper**    |   0.40       | 0.50    |  0.44     |    4
 |     **none**    |   0.96       | 0.96    |  0.96     |35352
 |     **Upper**    |   0.84       | 0.82    |  0.83     | 5442
 -----------------------------------------------
 ## ☕ Contact