Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Skier8402
's Collections
Swahili models
multimodal
Diffusion model tools
metrics
RAG-agents
Speech apps
Prompts
Interesting finds
Chat-agents
Datasets
LLM-transparency-tools
Data creation
Computer vision
Datasets
updated
5 days ago
Interesting datasets to help train LLMs and beyond
Upvote
-
Open-Orca/OpenOrca
Viewer
•
Updated
Oct 21, 2023
•
2.91M
•
8.21k
•
1.36k
NeelNanda/pile-10k
Viewer
•
Updated
Oct 14, 2022
•
10k
•
3.47k
•
17
legacy-datasets/mc4
Updated
Mar 5, 2024
•
8.27k
•
151
oscar-corpus/oscar
Updated
Mar 21, 2024
•
16.1k
•
181
deepset/prompt-injections
Viewer
•
Updated
Jul 30, 2024
•
662
•
724
•
49
epfl-llm/guidelines
Viewer
•
Updated
Mar 7, 2024
•
38k
•
666
•
115
wanng/midjourney-v5-202304-clean
Viewer
•
Updated
May 24, 2024
•
1.7M
•
140
•
86
CohereForAI/aya_dataset
Viewer
•
Updated
Jun 28, 2024
•
206k
•
1.6k
•
286
google/fleurs
Updated
Aug 25, 2024
•
15.8k
•
265
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
22.8k
•
573
microsoft/orca-math-word-problems-200k
Viewer
•
Updated
Mar 4, 2024
•
200k
•
934
•
427
HuggingFaceFW/fineweb
Viewer
•
Updated
16 days ago
•
48.6B
•
277k
•
1.83k
proj-persona/PersonaHub
Viewer
•
Updated
Oct 5, 2024
•
375k
•
2.35k
•
490
nyu-visionx/Cambrian-10M
Preview
•
Updated
Jul 8, 2024
•
6.39k
•
106
BAAI/Infinity-Instruct
Viewer
•
Updated
3 days ago
•
20.4M
•
5.24k
•
583
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
Aug 30, 2024
•
11.6k
•
773
•
233
meta-llama/Llama-3.1-405B-Instruct
Text Generation
•
Updated
Sep 25, 2024
•
20.1k
•
562
OpenAssistant/oasst2
Viewer
•
Updated
Jan 11, 2024
•
135k
•
1.24k
•
220
OpenAssistant/oasst1
Viewer
•
Updated
May 2, 2023
•
88.8k
•
2.75k
•
1.29k
HuggingFaceTB/smoltalk
Viewer
•
Updated
Nov 26, 2024
•
2.2M
•
6.31k
•
284
NovaSky-AI/Sky-T1_data_17k
Viewer
•
Updated
5 days ago
•
16.4k
•
1.74k
•
124
Upvote
-
Share collection
View history
Collection guide
Browse collections