igormolybog
's Collections
Datasets
updated
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
•
2311.06783
•
Published
•
26
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
Tuning
Paper
•
2311.07574
•
Published
•
14
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
Concept Understanding
Paper
•
2401.04575
•
Published
•
14
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
•
2402.00159
•
Published
•
61
Aya Dataset: An Open-Access Collection for Multilingual Instruction
Tuning
Paper
•
2402.06619
•
Published
•
54
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
•
2402.07625
•
Published
•
12
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
•
2402.10176
•
Published
•
36
StarCoder 2 and The Stack v2: The Next Generation
Paper
•
2402.19173
•
Published
•
136
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
•
2405.01470
•
Published
•
61
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
•
2405.01481
•
Published
•
25