sberbank-ai
commited on
Commit
·
82bc2d2
1
Parent(s):
21b968e
Update README.md
Browse files
README.md
CHANGED
@@ -16,10 +16,11 @@ RUDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP
|
|
16 |
|
17 |
|
18 |
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
|
19 |
-
|
20 |
-
*
|
21 |
-
*
|
22 |
-
*
|
|
|
23 |
* Training Data Volume: `156 million text-image pairs`
|
24 |
|
25 |
# Model Description
|
@@ -32,11 +33,11 @@ Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices]
|
|
32 |
|
33 |
### Parameters
|
34 |
|
35 |
-
<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/
|
36 |
|
37 |
-
The maximum sequence length that this model may be used with depends on the modality and stands for
|
38 |
|
39 |
-
RUDOLPH
|
40 |
|
41 |
* num\_layers (24) — Number of hidden layers in the Transformer decoder.
|
42 |
* hidden\_size (1024) — Dimensionality of the hidden layers.
|
|
|
16 |
|
17 |
|
18 |
Model was trained by [Sber AI](https://github.com/sberbank-ai) and [SberDevices](https://sberdevices.ru/) teams.
|
19 |
+
|
20 |
+
* Tasks: ` text2image generation, self reranking, text ranking, image ranking, image2text generation, zero-shot image classification, text2text generation`
|
21 |
+
* Language: ` Russian`
|
22 |
+
* Type: ` decoder`
|
23 |
+
* Num Parameters: ` 350M`
|
24 |
* Training Data Volume: `156 million text-image pairs`
|
25 |
|
26 |
# Model Description
|
|
|
33 |
|
34 |
### Parameters
|
35 |
|
36 |
+
<img src=https://raw.githubusercontent.com/ai-forever/ru-dolph/master/pics/attention_masks.png height="20" border="2"/>
|
37 |
|
38 |
+
The maximum sequence length that this model may be used with depends on the modality and stands for 64 - 256 - 64 for the left text tokens, image tokens, and right text tokens, respectively.
|
39 |
|
40 |
+
RUDOLPH 350M is a Transformer-based decoder model with the following parameters:
|
41 |
|
42 |
* num\_layers (24) — Number of hidden layers in the Transformer decoder.
|
43 |
* hidden\_size (1024) — Dimensionality of the hidden layers.
|