piotr-rybak
commited on
Commit
·
3a32c99
1
Parent(s):
878f739
Update README.md
Browse files
README.md
CHANGED
@@ -12,13 +12,19 @@ datasets:
|
|
12 |
- ipipan/maupqa
|
13 |
---
|
14 |
|
15 |
-
#
|
16 |
|
17 |
-
|
18 |
|
19 |
-
It was initialized from the [HerBERT-base](https://huggingface.co/allegro/herbert-base-cased) model and fine-tuned on the [PolQA](https://huggingface.co/ipipan/polqa) and [MAUPQA](https://huggingface.co/ipipan/maupqa) datasets for
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
-
The model was trained on question-passage pairs and works best on similar tasks. The training passages consisted of `title` and `text` concatenated with the special token `</s>`. Even if your passages don't have a `title`, it is still beneficial to prefix a passage `text` with the `</s>` token.
|
22 |
## Usage (Sentence-Transformers)
|
23 |
|
24 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
@@ -32,11 +38,11 @@ Then you can use the model like this:
|
|
32 |
```python
|
33 |
from sentence_transformers import SentenceTransformer
|
34 |
sentences = [
|
35 |
-
"W jakim mieście urodził się Zbigniew Herbert?",
|
36 |
"Zbigniew Herbert</s>Zbigniew Bolesław Ryszard Herbert (ur. 29 października 1924 we Lwowie, zm. 28 lipca 1998 w Warszawie) – polski poeta, eseista i dramaturg.",
|
37 |
]
|
38 |
|
39 |
-
model = SentenceTransformer('ipipan/
|
40 |
embeddings = model.encode(sentences)
|
41 |
print(embeddings)
|
42 |
```
|
@@ -55,12 +61,12 @@ def cls_pooling(model_output, attention_mask):
|
|
55 |
|
56 |
# Sentences we want sentence embeddings for
|
57 |
sentences = [
|
58 |
-
"W jakim mieście urodził się Zbigniew Herbert?",
|
59 |
"Zbigniew Herbert</s>Zbigniew Bolesław Ryszard Herbert (ur. 29 października 1924 we Lwowie, zm. 28 lipca 1998 w Warszawie) – polski poeta, eseista i dramaturg.",
|
60 |
]
|
61 |
# Load model from HuggingFace Hub
|
62 |
-
tokenizer = AutoTokenizer.from_pretrained('ipipan/
|
63 |
-
model = AutoModel.from_pretrained('ipipan/
|
64 |
|
65 |
# Tokenize sentences
|
66 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
|
|
12 |
- ipipan/maupqa
|
13 |
---
|
14 |
|
15 |
+
# Silver Retriever Base (v1)
|
16 |
|
17 |
+
Silver Retriever model encodes the Polish sentences or paragraphs into a 768-dimensional dense vector space and can be used for tasks like document retrieval or semantic search.
|
18 |
|
19 |
+
It was initialized from the [HerBERT-base](https://huggingface.co/allegro/herbert-base-cased) model and fine-tuned on the [PolQA](https://huggingface.co/ipipan/polqa) and [MAUPQA](https://huggingface.co/ipipan/maupqa) datasets for 15,000 steps with a batch size of 1,024.
|
20 |
+
|
21 |
+
## Preparing inputs
|
22 |
+
|
23 |
+
The model was trained on question-passage pairs and works best when the input is the same format as that used during training:
|
24 |
+
- We added the phrase `Pytanie:' to the beginning of the question.
|
25 |
+
- The training passages consisted of `title` and `text` concatenated with the special token `</s>`. Even if your passages don't have a `title`, it is still beneficial to prefix a passage with the `</s>` token.
|
26 |
+
- Although we used the dot product during training, the model usually works better with the cosine distance.
|
27 |
|
|
|
28 |
## Usage (Sentence-Transformers)
|
29 |
|
30 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
|
|
38 |
```python
|
39 |
from sentence_transformers import SentenceTransformer
|
40 |
sentences = [
|
41 |
+
"Pytanie: W jakim mieście urodził się Zbigniew Herbert?",
|
42 |
"Zbigniew Herbert</s>Zbigniew Bolesław Ryszard Herbert (ur. 29 października 1924 we Lwowie, zm. 28 lipca 1998 w Warszawie) – polski poeta, eseista i dramaturg.",
|
43 |
]
|
44 |
|
45 |
+
model = SentenceTransformer('ipipan/silver-retriever-base-v1')
|
46 |
embeddings = model.encode(sentences)
|
47 |
print(embeddings)
|
48 |
```
|
|
|
61 |
|
62 |
# Sentences we want sentence embeddings for
|
63 |
sentences = [
|
64 |
+
"Pytanie: W jakim mieście urodził się Zbigniew Herbert?",
|
65 |
"Zbigniew Herbert</s>Zbigniew Bolesław Ryszard Herbert (ur. 29 października 1924 we Lwowie, zm. 28 lipca 1998 w Warszawie) – polski poeta, eseista i dramaturg.",
|
66 |
]
|
67 |
# Load model from HuggingFace Hub
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained('ipipan/silver-retriever-base-v1')
|
69 |
+
model = AutoModel.from_pretrained('ipipan/silver-retriever-base-v1')
|
70 |
|
71 |
# Tokenize sentences
|
72 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|