Agustín Piqueres Lajarín's picture

Agustín Piqueres Lajarín

plaguss

·

plaguss

AI & ML interests

None yet

Recent Activity

reacted to lewtun's post with 🔥 5 days ago

This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

liked a Space 7 days ago

data-agents/jupyter-agent

liked a Space 18 days ago

HuggingFaceH4/blogpost-scaling-test-time-compute

View all activity

Articles

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Organizations

plaguss's activity

New activity in argilla/FinePersonas-v0.1 24 days ago

Removing embeddings information to reduce the size of this dataset

#6 opened 3 months ago by

New activity in argilla/FinePersonas-v0.1 3 months ago

noob questions

#4 opened 3 months ago by

How to run the persona-to-persona code?

#5 opened 3 months ago by

New activity in argilla/FinePersonas-v0.1 4 months ago

Multimodal Personas

#2 opened 4 months ago by

New activity in argilla/magpie-ultra-v0.1 5 months ago

Update README.md

#7 opened 5 months ago by

New activity in plaguss/distilabel-sample-evol-instruct 11 months ago

add distilabel and synthethic tag

#2 opened 11 months ago by

davidberenstein1957

New activity in argilla/distilabeled-Marcoro14-7B-slerp 12 months ago

update license

#2 opened 12 months ago by

add base_model

#1 opened 12 months ago by

New activity in argilla/distilabeled-OpenHermes-2.5-Mistral-7B 12 months ago

Can some one link me to gguf?

#1 opened 12 months ago by