5 17

Haitham Bou Ammar

hba123

AI & ML interests

LLMs, VLMs, Robotics, Reinforcement Learning, Bayesian Optimisation

Recent Activity

authored a paper 1 day ago

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

authored a paper 1 day ago

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

reacted to their post with 🚀 5 days ago

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this! I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO. Check it out: https://huggingface.co/blog/hba123/derivingdpo

View all activity

Articles

Deriving DPO's Loss

8 days ago

• 20

Organizations

None yet

hba123's activity

authored 2 papers 1 day ago

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 63

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

Paper • 2410.05102 • Published Oct 7, 2024

reacted to their post with 🚀 5 days ago

Post

1733

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.

Check it out: https://huggingface.co/blog/hba123/derivingdpo

posted an update 8 days ago

Post

1733

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.

Check it out: https://huggingface.co/blog/hba123/derivingdpo