Leandro von Werra's picture

Leandro von Werra

lvwerra

·

https://github.com/lvwerra

AI & ML interests

NLP and RL

Recent Activity

reacted to lewtun's post with 🔥 2 days ago

This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

liked a Space 10 days ago

data-agents/jupyter-agent

updated a Space 13 days ago

data-agents/jupyter-agent

View all activity

Articles

LeMaterial: an open source initiative to accelerate materials discovery and research

CinePile 2.0 - making stronger datasets with adversarial refinement

FineVideo: behind the scenes

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

A failed experiment: Infini-Attention, and why we should keep trying?

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Welcome Llama 3 - Meta's new open LLM

StarCoder2 and The Stack v2

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

The N Implementation Details of RLHF with PPO

Finetune Stable Diffusion Models with DDPO via TRL

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Fine-tune Llama 2 with DPO

The Falcon has landed in the Hugging Face ecosystem

Creating a Coding Assistant with StarCoder

StarCoder: A State-of-the-Art LLM for Code

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Evaluating Language Model Bias with 🤗 Evaluate

Announcing Evaluation on the Hub

Organizations

lvwerra's activity

upvoted a collection 27 days ago

🤖 Agents

21 items • Updated about 21 hours ago • 61

upvoted a paper 2 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 23

upvoted an article 3 months ago

Article

FineVideo: behind the scenes

Sep 23, 2024

• 27

upvoted a paper 3 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 138

upvoted a paper 4 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 124

upvoted an article 4 months ago

Article

Tool Use, Unified

Aug 12, 2024

• 69

upvoted 3 articles 5 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14, 2024

• 54

Article

XetHub is joining Hugging Face!

Aug 8, 2024

• 81

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Jul 23, 2024

• 225

upvoted 3 articles 6 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

• 72

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 294

Article

Our Transformers Code Agent beats the GAIA benchmark!

Jul 1, 2024

• 49

upvoted 3 papers 6 months ago

Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 42

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 87

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 45

upvoted an article 6 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24, 2024

• 180

upvoted an article 7 months ago

Article

Putting RL back in RLHF

Jun 12, 2024

• 66

upvoted a paper 7 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

upvoted a collection 7 months ago

Leaderboards and benchmarks ✨

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 83 items • Updated 15 days ago • 93

upvoted an article 8 months ago

Article

2024-04-22 - Hub Incident Post Mortem

By

•

May 17, 2024

• 17