AudreyVM commited on
Commit
0710178
·
verified ·
1 Parent(s): d0041a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -233,7 +233,7 @@ The training corpus consists of 70 billion tokens of Catalan- and Spanish-centri
233
 
234
  This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
235
 
236
-
237
 
238
  Click the expand button below to see the full list of corpora included in the training data.
239
 
 
233
 
234
  This highly multilingual corpus is predominantly composed of data sourced from OPUS, with additional data taken from the NTEU project and Project Aina’s existing corpora. Where little parallel Catalan <-> data could be found, synthetic Catalan data was generated from the Spanish side of the collected Spanish <-> xx corpora using Project Aina’s es-> ca model. (link and correct name). The final distribution of languages was as below:
235
 
236
+ And they you add them as a link without any text. For example: ![](./images/treemap.png)
237
 
238
  Click the expand button below to see the full list of corpora included in the training data.
239