felflare commited on
Commit
dede2ce
β€’
1 Parent(s): 1be7657

Update README.md

Browse files

last README update

Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -12,13 +12,13 @@ The model predicts the punctuation and upper-casing of plain, lower-cased text.
12
 
13
  This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
14
 
15
- Model restores the following punctuations -- [` ! ? . , - : ; '`]
16
 
17
- Model also restores upper-casing of words.
18
 
19
  -----------------------------------------------
20
  ## πŸš‹ Usage
21
- Below is a quick way to get up and running with the model.
22
  1. First, install the package.
23
  ```bash
24
  pip install rpunct
@@ -28,24 +28,31 @@ pip install rpunct
28
  from rpunct import RestorePuncts
29
  # The default language is 'english'
30
  rpunct = RestorePuncts()
31
- rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector that in combination with an algorithm-driven process called ptychography set a world record by tripling the resolution of a state-of-the-art electron microscope as successful as it was that approach had a weakness it only worked with ultrathin samples that were a few atoms thick anything thicker would cause the electrons to scatter in ways that could not be disentangled now a team again led by david muller the samuel b eckert professor of engineering has bested its own record by a factor of two with an electron microscope pixel array detector empad that incorporates even more sophisticated 3d reconstruction algorithms the resolution is so fine-tuned the only blurring that remains is the thermal jiggling of the atoms themselves""")
32
-
 
 
 
33
  # Outputs the following:
34
- # In 2018, Cornell researchers built a high-powered detector that, in combination with an algorithm-driven process called Ptychography, set a world record by tripling the resolution of a state-of-the-art electron microscope. As successful as it was, that approach had a weakness. It only worked with ultrathin samples that were a few atoms thick. Anything thicker would cause the electrons to scatter in ways that could not be disentangled. Now, a team again led by David Muller, the Samuel B. Eckert Professor of Engineering, has bested its own record by a factor of two with an Electron microscope pixel array detector empad that incorporates even more sophisticated 3d reconstruction algorithms. The resolution is so fine-tuned the only blurring that remains is the thermal jiggling of the atoms themselves.
 
 
 
 
35
  ```
36
 
37
- `This model works on arbitrarily large text in English language and uses GPU if available.`
38
 
39
  -----------------------------------------------
40
  ## πŸ“‘ Training data
41
 
42
  Here is the number of product reviews we used for finetuning the model:
43
 
44
- | Language | Number of reviews |
45
  | -------- | ----------------- |
46
  | English | 560,000 |
47
 
48
- We found the best convergence around `3 epochs`, which is what presented here and available via a download.
49
 
50
  -----------------------------------------------
51
  ## 🎯 Accuracy
@@ -76,7 +83,6 @@ Below is a breakdown of the performance of the model by each label:
76
  | **Upper** | 0.84 | 0.82 | 0.83 | 5442
77
 
78
  -----------------------------------------------
79
-
80
  ## β˜• Contact
81
  Contact [Daulet Nurmanbetov]([email protected]) for questions, feedback and/or requests for similar models.
82
 
 
12
 
13
  This model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
14
 
15
+ Model restores the following punctuations -- **[! ? . , - : ; ' ]**
16
 
17
+ The model also restores the upper-casing of words.
18
 
19
  -----------------------------------------------
20
  ## πŸš‹ Usage
21
+ **Below is a quick way to get up and running with the model.**
22
  1. First, install the package.
23
  ```bash
24
  pip install rpunct
 
28
  from rpunct import RestorePuncts
29
  # The default language is 'english'
30
  rpunct = RestorePuncts()
31
+ rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector that in combination with an algorithm-driven process called ptychography set a world record
32
+ by tripling the resolution of a state-of-the-art electron microscope as successful as it was that approach had a weakness it only worked with ultrathin samples that were
33
+ a few atoms thick anything thicker would cause the electrons to scatter in ways that could not be disentangled now a team again led by david muller the samuel b eckert
34
+ professor of engineering has bested its own record by a factor of two with an electron microscope pixel array detector empad that incorporates even more sophisticated
35
+ 3d reconstruction algorithms the resolution is so fine-tuned the only blurring that remains is the thermal jiggling of the atoms themselves""")
36
  # Outputs the following:
37
+ # In 2018, Cornell researchers built a high-powered detector that, in combination with an algorithm-driven process called Ptychography, set a world record by tripling the
38
+ # resolution of a state-of-the-art electron microscope. As successful as it was, that approach had a weakness. It only worked with ultrathin samples that were a few atoms
39
+ # thick. Anything thicker would cause the electrons to scatter in ways that could not be disentangled. Now, a team again led by David Muller, the Samuel B.
40
+ # Eckert Professor of Engineering, has bested its own record by a factor of two with an Electron microscope pixel array detector empad that incorporates even more
41
+ # sophisticated 3d reconstruction algorithms. The resolution is so fine-tuned the only blurring that remains is the thermal jiggling of the atoms themselves.
42
  ```
43
 
44
+ **This model works on arbitrarily large text in English language and uses GPU if available.**
45
 
46
  -----------------------------------------------
47
  ## πŸ“‘ Training data
48
 
49
  Here is the number of product reviews we used for finetuning the model:
50
 
51
+ | Language | Number of text samples|
52
  | -------- | ----------------- |
53
  | English | 560,000 |
54
 
55
+ We found the best convergence around _**3 epochs**_, which is what presented here and available via a download.
56
 
57
  -----------------------------------------------
58
  ## 🎯 Accuracy
 
83
  | **Upper** | 0.84 | 0.82 | 0.83 | 5442
84
 
85
  -----------------------------------------------
 
86
  ## β˜• Contact
87
  Contact [Daulet Nurmanbetov]([email protected]) for questions, feedback and/or requests for similar models.
88