LLM - a L-Hongbin Collection

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 20

Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 20

Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published Nov 20, 2024 • 15

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 42

MH-MoE:Multi-Head Mixture-of-Experts

Paper • 2411.16205 • Published Nov 25, 2024 • 24

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Paper • 2410.01131 • Published Oct 1, 2024 • 9

O1-OPEN/OpenO1-SFT

Viewer • Updated Dec 17, 2024 • 77.7k • 2.19k • 331

AI-MO/NuminaMath-CoT

Viewer • Updated Nov 25, 2024 • 860k • 3.98k • 327

GAIR/o1-journey

Viewer • Updated Oct 16, 2024 • 327 • 429 • 131

allenai/tulu-3-sft-mixture

Viewer • Updated Dec 2, 2024 • 939k • 4.12k • 96

CASIA-LM/ChineseWebText2.0

Viewer • Updated Dec 2, 2024 • 2k • 2.57k • 19

Yi-Lightning Technical Report

Paper • 2412.01253 • Published Dec 2, 2024 • 25

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 74

Weighted-Reward Preference Optimization for Implicit Model Fusion

Paper • 2412.03187 • Published Dec 4, 2024 • 9

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 103

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 17

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Paper • 2412.13795 • Published Dec 18, 2024 • 19

Qwen2.5 Technical Report

Paper • 2412.15115 • Published about 1 month ago • 340

A Post-Training Enhanced Optimization Approach for Small Language Models

Paper • 2411.02939 • Published Nov 5, 2024

amphora/QwQ-LongCoT-130K

Viewer • Updated 28 days ago • 133k • 1.28k • 135

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 48

O1-OPEN/OpenO1-SFT-Ultra

Viewer • Updated Dec 17, 2024 • 28M • 1.15k • 51

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Paper • 2412.14922 • Published about 1 month ago • 85

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Paper • 2412.17498 • Published 27 days ago • 21

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published 27 days ago • 45

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Paper • 2412.16849 • Published 28 days ago • 9

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 11 days ago • 232

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 5 days ago • 259

OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training

Paper • 2501.08197 • Published 5 days ago • 7