Giuliano
's Collections
LLM Reasoning
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
•
2203.14465
•
Published
•
8
Let's Verify Step by Step
Paper
•
2305.20050
•
Published
•
10
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
66
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
58
Alphazero-like Tree-Search can Guide Large Language Model Decoding and
Training
Paper
•
2309.17179
•
Published
•
2
Paper
•
2412.15115
•
Published
•
335
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Paper
•
2410.13639
•
Published
•
16
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
41
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
Mathematical Reasoning
Paper
•
2410.02884
•
Published
•
53
Tree of Problems: Improving structured problem solving with
compositionality
Paper
•
2410.06634
•
Published
•
8
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
91
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
•
2407.21787
•
Published
•
12
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
•
2408.03314
•
Published
•
54
🔍
QwQ-32B-Preview
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
•
2412.16145
•
Published
•
36
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
•
2411.07279
•
Published
•
3
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Paper
•
2410.18451
•
Published
•
16
Skywork/Skywork-Reward-Gemma-2-27B-v0.2
Text Classification
•
Updated
•
5.4k
•
25
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper
•
2408.15240
•
Published
•
13
Understanding Hidden Computations in Chain-of-Thought Reasoning
Paper
•
2412.04537
•
Published
Paper
•
2410.12832
•
Published
•
6
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
44
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
Learning
Paper
•
2410.02089
•
Published
•
12
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper
•
2402.06457
•
Published
•
9
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
•
2412.12881
•
Published
•
1
Reinforcement Learning Enhanced LLMs: A Survey
Paper
•
2412.10400
•
Published
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
Reinforcement Learning Perspective
Paper
•
2412.14135
•
Published
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
16
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
•
2501.01904
•
Published
•
4
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Paper
•
2411.11694
•
Published