kaizuberbuehler
's Collections
Datasets
updated
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper
•
2404.01197
•
Published
•
30
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper
•
2404.01294
•
Published
•
15
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper
•
2406.08707
•
Published
•
15
DataComp-LM: In search of the next generation of training sets for
language models
Paper
•
2406.11794
•
Published
•
50
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context
Reinforcement Learning
Paper
•
2406.08973
•
Published
•
87
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text
Paper
•
2406.08418
•
Published
•
29
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on
Mobile Devices
Paper
•
2406.08451
•
Published
•
24
argilla/magpie-ultra-v0.1
Viewer
•
Updated
•
50k
•
283
•
218
Viewer
•
Updated
•
48.6B
•
186k
•
1.81k
Viewer
•
Updated
•
61.6M
•
50.6k
•
671
Viewer
•
Updated
•
31.1M
•
23.6k
•
570
Viewer
•
Updated
•
546M
•
5.52k
•
757
Viewer
•
Updated
•
1M
•
2.02k
•
693
Viewer
•
Updated
•
2.14M
•
13.7k
•
591
Viewer
•
Updated
•
55.1k
•
63
•
92
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.24B
•
211k
•
585
Viewer
•
Updated
•
1.75M
•
322
•
81
Viewer
•
Updated
•
100k
•
11.1k
•
140
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
48
RedPajama: an Open Dataset for Training Large Language Models
Paper
•
2411.12372
•
Published
•
48
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
Paper
•
2411.07461
•
Published
•
22
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
113