view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 1 day ago β’ 24
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper β’ 2412.19326 β’ Published 8 days ago β’ 17
PowerInfer/SmallThinker-3B-Preview Text Generation β’ Updated about 13 hours ago β’ 1.44k β’ β’ 196
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper β’ 2412.18319 β’ Published 11 days ago β’ 33
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper β’ 2412.14711 β’ Published 16 days ago β’ 14
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper β’ 2412.14590 β’ Published 16 days ago β’ 13
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents Paper β’ 2412.13194 β’ Published 17 days ago β’ 12
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 53
Smaller Language Models Are Better Instruction Evolvers Paper β’ 2412.11231 β’ Published 20 days ago β’ 26