Am I missing something, or there is still no way to filter by model size while searching for models? It has been a requested feature since 2022, but I haven't seen any updates since! With the amount of different models coming out, I think the size filter would be a great extension of the search functionality, especially when looking for smaller models, which are a lot less prevalent.
Continuing my streak by releasing the Wikireading dataset: a large collection of scraped non-fiction books predominantly in Russian language. its5Q/wikireading
Here's the highlights: - ~7B tokens, or ~28B characters, making it a great candidate for use in pretraining - Contains non-fiction works from many knowledge domains - Includes both the original HTML and extracted text of book chapters
Just crossed 200,000 free public AI datasets shared by the community on Hugging Face! Text, image, video, audio, time-series & many more... Thanks everyone!
Made public a dataset of scraped teletype articles.
Here's the overview: - 3.3 million articles, predominantly in Russian and English - Includes original HTML, extracted text and metadata - All articles were run through language identification - Includes all public articles up until April 2024