jpalomar commited on
Commit
369f4ed
·
verified ·
1 Parent(s): 77ddccb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -477,7 +477,7 @@ especially if the content originates from less-regulated sources or user-generat
477
 
478
  This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
479
  - Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
480
- - Domain-specific or language-specific raw crawls (p.e. Spanish Crawling).
481
  - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
482
  (p.e. CATalog).
483
 
 
477
 
478
  This dataset is constituted by combining several sources, whose acquisition methods can be classified into three groups:
479
  - Web-sourced datasets with some preprocessing available under permissive license (p.e. Common Crawl).
480
+ - Domain-specific or language-specific raw crawls, always respecting robots.txt (p.e. Spanish Crawling).
481
  - Manually curated data obtained through collaborators, data providers (by means of legal assignment agreements) or open source projects
482
  (p.e. CATalog).
483