Jiaxin Huang's picture

1 2

Jiaxin Huang

teapot123

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

upvoted a paper 3 months ago

Taming Overconfidence in LLMs: Reward Calibration in RLHF

commented on a paper 3 months ago

Taming Overconfidence in LLMs: Reward Calibration in RLHF

View all activity

Organizations

teapot123's activity

upvoted a paper 4 days ago

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published 6 days ago • 35

upvoted a paper 3 months ago

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Paper • 2410.09724 • Published Oct 13, 2024 • 2

commented a paper 3 months ago

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Paper • 2410.09724 • Published Oct 13, 2024 • 2 •

authored 5 papers 3 months ago

Large Language Models Can Self-Improve

Paper • 2210.11610 • Published Oct 20, 2022

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Paper • 2211.03044 • Published Nov 6, 2022 • 1

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Paper • 2405.04086 • Published May 7, 2024 • 1

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Paper • 2410.09724 • Published Oct 13, 2024 • 2

Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning

Paper • 2410.10074 • Published Oct 14, 2024

updated 6 models 3 months ago

HINT-lab/mistral-7b-hermes-crm-skywork

Updated Oct 17, 2024 • 3

HINT-lab/llama3-8b-crm-final-v0.1

Updated Oct 17, 2024 • 5

HINT-lab/llama3-8b-final-ppo-c-v0.3

Text Generation • Updated Oct 17, 2024 • 7

HINT-lab/mistral-7b-ppo-c-hermes

Text Generation • Updated Oct 17, 2024 • 10

HINT-lab/llama3-8b-final-ppo-m-v0.3

Text Generation • Updated Oct 17, 2024 • 9

HINT-lab/mistral-7b-ppo-m-hermes

Text Generation • Updated Oct 17, 2024 • 7

updated a collection 3 months ago

VLM

10 items • Updated 14 days ago