Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
8
CoffeeBliss
CoffeeBliss
Follow
bird-of-paradise's profile picture
1 follower
·
1 following
AI & ML interests
None yet
Recent Activity
replied
to
lewtun
's
post
5 days ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
reacted
to
lewtun
's
post
with 🔥
5 days ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
liked
a model
7 days ago
bartowski/HuatuoGPT-o1-8B-GGUF
View all activity
Organizations
None yet
CoffeeBliss
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
7 days ago
bartowski/HuatuoGPT-o1-8B-GGUF
Text Generation
•
Updated
7 days ago
•
1.32k
•
5
liked
a model
8 days ago
FreedomIntelligence/HuatuoGPT-o1-8B
Text Generation
•
Updated
9 days ago
•
664
•
24
liked
a dataset
12 days ago
yulan-team/YuLan-Mini-Datasets
Updated
9 days ago
•
302
•
8
liked
a model
12 days ago
yulan-team/YuLan-Mini
Text Generation
•
Updated
4 days ago
•
670
•
27
liked
2 models
3 months ago
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
•
Updated
Dec 4, 2024
•
2.65M
•
•
1.2k
meta-llama/Llama-3.2-1B-Instruct
Text Generation
•
Updated
Oct 24, 2024
•
1.02M
•
•
681
liked
2 models
5 months ago
openbmb/MiniCPM-V-2_6
Image-Text-to-Text
•
Updated
Nov 15, 2024
•
36.9k
•
884
openbmb/MiniCPM-V-2_6-gguf
Updated
Aug 13, 2024
•
3.31k
•
148