Update README.md
Browse files
README.md
CHANGED
@@ -247,10 +247,10 @@ Feel free to click the expand button below to see the full list of sources.
|
|
247 |
| proof-pile | en | [Link](https://huggingface.co/datasets/hoskinson-center/proof-pile) |
|
248 |
| RedPajama-Data T1 (StackExchange subset) | en | Computer, 2023 |
|
249 |
| The Pile (PhilPapers subset) | en | Gao et al., 2021 |
|
250 |
-
| Biomedical | es | Internally generated scientific dataset:
|
251 |
| HPLTDatasets v1 - Spanish | es | de Gibert et al., 2024 |
|
252 |
| Legal | es | Internally generated legal dataset: BOE, BORME, Senado, Congreso, Spanish court orders, DOGC |
|
253 |
-
| Scientific | es | Internally generated scientific dataset:
|
254 |
| Spanish Legal Domain Corpora | es | Gutiérrez-Fandiño et al., 2021 |
|
255 |
| Estonian National Corpus 2021 | et | Koppel & Kallas, 2022 |
|
256 |
| Estonian Reference Corpus | et | [Link](https://www.cl.ut.ee/korpused/segakorpus/) |
|
|
|
247 |
| proof-pile | en | [Link](https://huggingface.co/datasets/hoskinson-center/proof-pile) |
|
248 |
| RedPajama-Data T1 (StackExchange subset) | en | Computer, 2023 |
|
249 |
| The Pile (PhilPapers subset) | en | Gao et al., 2021 |
|
250 |
+
| Biomedical | es | Internally generated scientific dataset: Wikipedia LS, Pubmed, MeSpEn, patents, clinical cases, medical crawler |
|
251 |
| HPLTDatasets v1 - Spanish | es | de Gibert et al., 2024 |
|
252 |
| Legal | es | Internally generated legal dataset: BOE, BORME, Senado, Congreso, Spanish court orders, DOGC |
|
253 |
+
| Scientific | es | Internally generated scientific dataset: Dialnet, Scielo, CSIC, TDX, BSC, UCM |
|
254 |
| Spanish Legal Domain Corpora | es | Gutiérrez-Fandiño et al., 2021 |
|
255 |
| Estonian National Corpus 2021 | et | Koppel & Kallas, 2022 |
|
256 |
| Estonian Reference Corpus | et | [Link](https://www.cl.ut.ee/korpused/segakorpus/) |
|