DeDeckerThomas
commited on
Commit
·
96251f2
1
Parent(s):
83b6e21
Update README.md
Browse files
README.md
CHANGED
@@ -43,19 +43,60 @@ Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma,
|
|
43 |
* This keyphrase generation model is very domain-specific and will perform very well on abstracts of scientific papers. It's not recommended to use this model for other domains, but you are free to test it out.
|
44 |
* Only works for English documents.
|
45 |
* For a custom model, please consult the training notebook for more information (link incoming).
|
|
|
46 |
|
47 |
### ❓ How to use
|
48 |
```python
|
49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
```
|
51 |
|
52 |
```python
|
|
|
|
|
53 |
|
54 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
```
|
57 |
# Output
|
58 |
-
|
|
|
59 |
```
|
60 |
|
61 |
## 📚 Training Dataset
|
|
|
43 |
* This keyphrase generation model is very domain-specific and will perform very well on abstracts of scientific papers. It's not recommended to use this model for other domains, but you are free to test it out.
|
44 |
* Only works for English documents.
|
45 |
* For a custom model, please consult the training notebook for more information (link incoming).
|
46 |
+
* Sometimes the output can make no sense.
|
47 |
|
48 |
### ❓ How to use
|
49 |
```python
|
50 |
+
# Model parameters
|
51 |
+
from transformers import (
|
52 |
+
Text2TextGenerationPipeline,
|
53 |
+
BartForConditionalGeneration,
|
54 |
+
AutoTokenizer,
|
55 |
+
)
|
56 |
+
import numpy as np
|
57 |
+
|
58 |
+
|
59 |
+
class KeyphraseGenerationPipeline(Text2TextGenerationPipeline):
|
60 |
+
def __init__(self, model, keyphrase_sep_token=";", *args, **kwargs):
|
61 |
+
super().__init__(
|
62 |
+
model=BartForConditionalGeneration.from_pretrained(model),
|
63 |
+
tokenizer=AutoTokenizer.from_pretrained(model),
|
64 |
+
*args,
|
65 |
+
**kwargs
|
66 |
+
)
|
67 |
+
self.keyphrase_sep_token = keyphrase_sep_token
|
68 |
+
|
69 |
+
def postprocess(self, model_outputs):
|
70 |
+
results = super().postprocess(
|
71 |
+
model_outputs=model_outputs
|
72 |
+
)
|
73 |
+
return np.unique([result.strip() for result in results[0].get("generated_text").split(self.keyphrase_sep_token)])
|
74 |
```
|
75 |
|
76 |
```python
|
77 |
+
model_name = "DeDeckerThomas/keyphrase-generation-keybart-inspec"
|
78 |
+
generator = KeyphraseGenerationPipeline(model=model_name)
|
79 |
|
80 |
+
```python
|
81 |
+
text = """
|
82 |
+
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
83 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
84 |
+
Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
|
85 |
+
The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
|
86 |
+
Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …), keyphrase extraction can be improved.
|
87 |
+
These new methods also focus on the semantics and context of a document, which is quite an improvement.
|
88 |
+
""".replace(
|
89 |
+
"\n", ""
|
90 |
+
)
|
91 |
+
|
92 |
+
keyphrases = generator(text)
|
93 |
+
|
94 |
+
print(keyphrases)
|
95 |
|
96 |
```
|
97 |
# Output
|
98 |
+
['artificial intelligence' 'classical machine learning methods'
|
99 |
+
'keyphrase extraction' 'lingu' 'statistics' 'text analysis']
|
100 |
```
|
101 |
|
102 |
## 📚 Training Dataset
|