Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 12
A Comparative Study on Generative Models for High Resolution Solar Observation Imaging Paper • 2304.07169 • Published Apr 14, 2023
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs Paper • 2111.02114 • Published Nov 3, 2021
LAION-5B: An open large-scale dataset for training next generation image-text models Paper • 2210.08402 • Published Oct 16, 2022 • 4
Reproducible scaling laws for contrastive language-image learning Paper • 2212.07143 • Published Dec 14, 2022
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models Paper • 2406.02061 • Published Jun 4, 2024 • 1
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 50