trl internal testing

company

Activity Feed Request to join this org

AI & ML interests

Internal testing artifact mangement for trl library

Recent Activity

qgallouedec updated a dataset 18 days ago

trl-internal-testing/example-images

qgallouedec updated a collection about 1 month ago

Tiny models

qgallouedec updated a collection about 1 month ago

Tiny models

View all activity

trl-internal-testing's activity

lewtun

posted an update 5 days ago

Post

1864

This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

1 reply

qgallouedec

updated a dataset 18 days ago

trl-internal-testing/example-images

Viewer • Updated 18 days ago • 3 • 86k

lewtun

posted an update 19 days ago

Post

6631

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

qgallouedec

updated a collection about 1 month ago

Tiny models

Collection

23 items • Updated Nov 30, 2024 • 1

AI & ML interests

Recent Activity

Team members 6

trl-internal-testing's activity