Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper β’ 2411.19943 β’ Published Nov 29, 2024 β’ 57
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Paper β’ 2412.02592 β’ Published Dec 3, 2024 β’ 21
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 11 days ago β’ 232
LLM4SR: A Survey on Large Language Models for Scientific Research Paper β’ 2501.04306 β’ Published 11 days ago β’ 33
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 12 days ago β’ 77
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper β’ 2501.03895 β’ Published 12 days ago β’ 48
Personalized Graph-Based Retrieval for Large Language Models Paper β’ 2501.02157 β’ Published 16 days ago β’ 28
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System Paper β’ 2412.20005 β’ Published 22 days ago β’ 17
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated 28 days ago β’ 209
view article Article β΄οΈ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang β’ 16 days ago β’ 12
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper β’ 2501.01257 β’ Published 17 days ago β’ 47
view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 β’ Nov 21, 2024 β’ 34
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 17 days ago β’ 38
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 44
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 65 items β’ Updated 35 minutes ago β’ 508
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group β’ 19 items β’ Updated 29 days ago β’ 19
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published Dec 17, 2024 β’ 41