Update README.md
Browse files
README.md
CHANGED
@@ -477,7 +477,7 @@ especially if the content originates from less-regulated sources or user-generat
|
|
477 |
|
478 |
This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
|
479 |
- Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
|
480 |
-
- Domain-specific or language-specific raw crawls (p.e. Spanish Crawling).
|
481 |
- Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
|
482 |
(p.e. CATalog).
|
483 |
|
|
|
477 |
|
478 |
This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
|
479 |
- Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
|
480 |
+
- Domain-specific or language-specific raw crawls, always respecting robots.txt (p.e. Spanish Crawling).
|
481 |
- Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
|
482 |
(p.e. CATalog).
|
483 |
|