Directly usable outputs from the pipeline
dataproc5
classroom
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
What is this?
A dataprocessing pipeline that uses huggingface datsets as intermediate data store.
Metadata are designed to be updated like a DAG, where some depends on others.
Workflows are gradually being built over time and maybe we'll see hundreds of data repos one day.
How do I use it?
To load files in local, Huggingface as well as S3 a tool is being developed in progress.
Collections
2
models
None public yet
datasets
9
dataproc5/metrics-danbooru2025-alltime-tag-counts
Viewer
•
Updated
•
687k
•
10
dataproc5/test-danbooru2025-tag-balanced-210k
Viewer
•
Updated
•
214k
•
1
dataproc5/test-danbooru2025-tag-balanced-60k
Viewer
•
Updated
•
63.1k
•
1
dataproc5/test-danbooru2025-tag-balanced-10k
Viewer
•
Updated
•
10k
•
2
dataproc5/test-danbooru2025-tag-balanced-2k
Viewer
•
Updated
•
2k
•
14
dataproc5/tmp-danbooru2025-balancing-tags
Viewer
•
Updated
•
8.62M
•
1
dataproc5/tmp-danbooru2025-row-priorities
Viewer
•
Updated
•
8.62M
•
1
dataproc5/metrics-danbooru2025-monthly-tag-counts
Viewer
•
Updated
•
5.9M
•
15
dataproc5/test-danbooru-meta-mod
Viewer
•
Updated
•
200k
•
2