Update README.md
Browse files
README.md
CHANGED
@@ -233,7 +233,7 @@ The training corpus consists of 70 billion tokens of Catalan- and Spanish-centri
|
|
233 |
|
234 |
This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
|
235 |
|
236 |
-
|
237 |
|
238 |
Click the expand button below to see the full list of corpora included in the training data.
|
239 |
|
|
|
233 |
|
234 |
This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
|
235 |
|
236 |
+
And they you add them as a link without any text. For example: ![](./images/treemap.png)
|
237 |
|
238 |
Click the expand button below to see the full list of corpora included in the training data.
|
239 |
|