BSC-LT
/

salamandraTA-2B

text-generation

text-generation-inference

Inference Endpoints

🇪🇺 Region: EU

Model card Files Files and versions Community

AudreyVM commited on Nov 4, 2024

Commit

0710178

·

verified ·

1 Parent(s): d0041a9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -233,7 +233,7 @@ The training corpus consists of 70 billion tokens of Catalan- and Spanish-centri
 This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
 Click the expand button below to see the full list of corpora included in the training data.

 This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
+And they you add them as a link without any text. For example: ![](./images/treemap.png)
 Click the expand button below to see the full list of corpora included in the training data.