Update README.md
Browse files
README.md
CHANGED
@@ -100,26 +100,33 @@ Eprint = {arXiv:2405.20204},
|
|
100 |
}
|
101 |
```
|
102 |
|
|
|
103 |
|
104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
If you want to merge two scores, we recommended 2 ways:
|
106 |
|
107 |
1. weighted average of text-text sim and text-image sim:
|
108 |
|
109 |
```python
|
110 |
-
#
|
111 |
-
alpha = 0.6
|
112 |
-
beta = 0.4
|
113 |
-
|
114 |
-
combined_scores = alpha * sim(query, document) + beta * sim(text, image)
|
115 |
```
|
116 |
|
117 |
2. apply z-score normalization before merging scores:
|
118 |
|
119 |
```python
|
120 |
# pseudo code
|
121 |
-
query_document_mean = np.mean(
|
122 |
-
query_document_std = np.std(
|
123 |
text_image_mean = np.mean(cos_sim_text_images)
|
124 |
text_image_std = np.std(cos_sim_text_images)
|
125 |
|
|
|
100 |
}
|
101 |
```
|
102 |
|
103 |
+
## FAQ
|
104 |
|
105 |
+
### I encounter this problem, what should I do?
|
106 |
+
|
107 |
+
```
|
108 |
+
ValueError: The model class you are passing has a `config_class` attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.jinaai.jina-clip-implementation.7f069e2d54d609ef1ad2eb578c7bf07b5a51de41.configuration_clip.JinaCLIPConfig'> and you passed <class 'transformers_modules.jinaai.jina-clip-implementation.7f069e2d54d609ef1ad2eb578c7bf07b5a51de41.configuration_cli.JinaCLIPConfig'>. Fix one of those so they match!
|
109 |
+
```
|
110 |
+
|
111 |
+
There was a bug in Transformers library between 4.40.x to 4.41.1. You can update transformers to >4.41.2 or <=4.40.0
|
112 |
+
|
113 |
+
### Givne one query, how can I merge its text-text and text-image cosine similarity?
|
114 |
+
|
115 |
+
Our emperical study shows that text-text cosine similarity is normally larger than text-image cosine similarity!
|
116 |
If you want to merge two scores, we recommended 2 ways:
|
117 |
|
118 |
1. weighted average of text-text sim and text-image sim:
|
119 |
|
120 |
```python
|
121 |
+
combined_scores = sim(text, text) + lambda * sim(text, image) # optimal lambda depends on your dataset, but in general lambda=2 can be a good choice.
|
|
|
|
|
|
|
|
|
122 |
```
|
123 |
|
124 |
2. apply z-score normalization before merging scores:
|
125 |
|
126 |
```python
|
127 |
# pseudo code
|
128 |
+
query_document_mean = np.mean(cos_sim_text_texts)
|
129 |
+
query_document_std = np.std(cos_sim_text_texts)
|
130 |
text_image_mean = np.mean(cos_sim_text_images)
|
131 |
text_image_std = np.std(cos_sim_text_images)
|
132 |
|