-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 54
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04905
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 8 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 54 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 24
-
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
CosmicMan: A Text-to-Image Foundation Model for Humans
Paper • 2404.01294 • Published • 16 -
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Paper • 2406.08707 • Published • 15 -
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 51
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 16 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 59 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 33 -
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Paper • 2408.07060 • Published • 42
-
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Textbooks Are All You Need
Paper • 2306.11644 • Published • 143 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 106 -
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper • 2211.08411 • Published • 3
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 70 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 74
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 21 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 12 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 35 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 62 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1