Josh Bauer
josh-sematic
AI & ML interests
None yet
Recent Activity
new activity
1 day ago
airtrain-ai/fineweb-edu-fortified:Difference with fineweb-edu-dedup (smol-lm corpus)
liked
a Space
about 1 month ago
m-ric/package-download-history
Organizations
josh-sematic's activity
Difference with fineweb-edu-dedup (smol-lm corpus)
2
#106 opened 2 days ago
by
ferjorosa
No "\n\n" in the dataset?!
1
#104 opened 4 months ago
by
ymh233
Deduped version of fineweb on HuggingFace yields "This dataset has 218 files that have been marked as unsafe."
1
#103 opened 5 months ago
by
egor-pakhomov
CC-MAIN-2024-10
#102 opened 5 months ago
by
josh-sematic
CC-MAIN-2023-40
#100 opened 5 months ago
by
josh-sematic
CC-MAIN-2023-50
#101 opened 5 months ago
by
josh-sematic
CC-MAIN-2023-23
#99 opened 5 months ago
by
josh-sematic
CC-MAIN-2021-10
#80 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-45
#77 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-40
#76 opened 5 months ago
by
josh-sematic
CC-MAIN-2019-35
#75 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-34
#74 opened 5 months ago
by
josh-sematic
CC-MAIN-2019-30
#73 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-29
#72 opened 5 months ago
by
josh-sematic
CC-MAIN-2023-14
#98 opened 5 months ago
by
josh-sematic
CC-MAIN-2019-26
#71 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-24
#70 opened 5 months ago
by
josh-sematic
CC-MAIN-2019-22
#69 opened 5 months ago
by
josh-sematic
CC-MAIN-2020-16
#68 opened 5 months ago
by
josh-sematic