Update README.md
Browse files
README.md
CHANGED
@@ -234,9 +234,9 @@ Galician, Asturian, Aragonese and Aranese. It amounts to 3,157,965,012 parallel
|
|
234 |
|
235 |
This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora.
|
236 |
Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
|
237 |
-
Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
|
238 |
|
239 |
-
![](./
|
240 |
|
241 |
Click the expand button below to see the full list of corpora included in the training data.
|
242 |
|
|
|
234 |
|
235 |
This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora.
|
236 |
Where little parallel Catalan <-> xx data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using
|
237 |
+
[Projecte Aina’s Spanish-Catalan model](https://huggingface.co/projecte-aina/aina-translator-es-ca). The final distribution of languages was as below:
|
238 |
|
239 |
+
![](./treemap.png)
|
240 |
|
241 |
Click the expand button below to see the full list of corpora included in the training data.
|
242 |
|