view article Article Data exploration and filtering with Nomic Atlas By visheratin β’ Mar 22, 2024 β’ 4
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 β’ 170
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 β’ 72
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 β’ 71
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality Jun 24, 2024 β’ 33
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10, 2024 β’ 24
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o By chilijung β’ May 31, 2024 β’ 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien β’ May 23, 2024 β’ 16
view article Article Synthetic data: save money, time and carbon with open source Feb 16, 2024 β’ 54