Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
mesolitica
's Collections
Research papers
MaLLaM 🌙
Malaysian Mistral
Malaysian Llama
Malaysian Qwen
Malaysian Gemma
Malaysian Embedding
Malaysian Reranker
Malaysian CausalLM
Malaysian LLM2Vec
Malaysian Seq2Seq
Malaysian MaskLM
Malaysian pretraining dataset
Malay instructions dataset
Malaysian synthetic dataset
Speech-to-Text dataset
Malaysian Whisper
KenLM models
Text-to-Speech dataset
Malaysian Text-to-Speech
Malaysian Noisy Translation
Google Translate dataset
Visual Multimodal dataset
Audio Multimodal dataset
Multimodal Malaysian LLM
Malaysian pretraining dataset
updated
17 days ago
Dataset to pretrain or continue pretrain to induce locality, gathered up to 200B tokens.
Upvote
-
mesolitica/fineweb-filter-malaysian-context
Viewer
•
Updated
Aug 13, 2024
•
98.7M
•
1.13k
mesolitica/smollm-corpus-filter-malaysian-context
Preview
•
Updated
Aug 11, 2024
•
29
Upvote
-
Share collection
View history
Collection guide
Browse collections