-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper • 2310.01798 • Published • 33 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 27 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2402.12875
-
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published -
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 12 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 63 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 136 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 53 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3
-
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 31 -
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Paper • 2409.12183 • Published • 36 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13 -
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
Paper • 2410.00531 • Published • 30
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 51 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Paper • 2205.10350 • Published • 2 -
Blockwise Parallel Decoding for Deep Autoregressive Models
Paper • 1811.03115 • Published • 2 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 6 -
Sequence-Level Knowledge Distillation
Paper • 1606.07947 • Published • 2