Leandro von Werra's picture

Leandro von Werra

lvwerra

AI & ML interests

NLP and RL

Recent Activity

Articles

Organizations

Hugging Face's profile picture Natural Language Processing with Transformers's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture Hugging Face Course's profile picture BigScience Catalogue Data's profile picture PubMed Central's profile picture BigScience Data's profile picture trl internal testing's profile picture evaluate's profile picture Data Days Zurich's profile picture Evaluate Comparison's profile picture Evaluate Metric's profile picture HuggingFaceM4's profile picture Evaluate Measurement's profile picture scikit-learn's profile picture TRL's profile picture CodeParrot's profile picture BigCode's profile picture CompVis's profile picture Hugging Face H4's profile picture Hugging Face OSS Metrics's profile picture BigBang's profile picture transfer-test-target's profile picture CompVis Community's profile picture Sphere Fall 2022's profile picture BigCode Data's profile picture Stack Overflow's profile picture Reading Group's profile picture Hugging Face Extreme-Scale's profile picture Need4Speed's profile picture Code Llama's profile picture Personal Coding Assistant's profile picture Hugging Face TB Research's profile picture Hugging Face Smol Cluster's profile picture Open LLM Leaderboard's profile picture gg-hf's profile picture Nanotron Research's profile picture Hugging Face SMOL's profile picture HuggingFaceFW's profile picture bigcode nvidia's profile picture hsramall's profile picture mlo-data-cleaning's profile picture HuggingFaceFW-Dev's profile picture StarCoder2 Data's profile picture Data Agents's profile picture CinePile collaboration's profile picture Hugging Face FineVideo's profile picture smol-explorers's profile picture swissai-hf-data's profile picture abcd4321's profile picture Hugging Face Science's profile picture eggs's profile picture LeMaterial's profile picture

lvwerra's activity

reacted to lewtun's post with 🔥 2 days ago
view post
Post
1704
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
liked a Space 10 days ago