CoffeeBliss's picture

1 2 8

CoffeeBliss

CoffeeBliss

·

AI & ML interests

None yet

Recent Activity

replied to lewtun's post 5 days ago

This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

reacted to lewtun's post with 🔥 5 days ago

This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

liked a model 7 days ago

bartowski/HuatuoGPT-o1-8B-GGUF

View all activity

Organizations

None yet

CoffeeBliss's activity

liked a model 7 days ago

bartowski/HuatuoGPT-o1-8B-GGUF

Text Generation • Updated 7 days ago • 1.32k • 5

liked a model 8 days ago

FreedomIntelligence/HuatuoGPT-o1-8B

Text Generation • Updated 9 days ago • 664 • 24

liked a dataset 12 days ago

yulan-team/YuLan-Mini-Datasets

Updated 9 days ago • 302 • 8

liked a model 12 days ago

yulan-team/YuLan-Mini

Text Generation • Updated 4 days ago • 670 • 27

liked 2 models 3 months ago

meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • Updated Dec 4, 2024 • 2.65M • • 1.2k

meta-llama/Llama-3.2-1B-Instruct

Text Generation • Updated Oct 24, 2024 • 1.02M • • 681

liked 2 models 5 months ago

openbmb/MiniCPM-V-2_6

Image-Text-to-Text • Updated Nov 15, 2024 • 36.9k • 884

openbmb/MiniCPM-V-2_6-gguf

Updated Aug 13, 2024 • 3.31k • 148