view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 1 day ago β’ 24
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper β’ 2412.18619 β’ Published 19 days ago β’ 44
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 17 days ago β’ 116
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper β’ 2412.11768 β’ Published 19 days ago β’ 41
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ about 1 month ago β’ 75
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ Nov 13, 2024 β’ 98
view article Article βοΈ π§πΌβπΎ Let's grow some Domain Specific Datasets together By burtenshaw β’ Apr 29, 2024 β’ 29
view article Article RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled By Andyrasika β’ Apr 7, 2024 β’ 10