5 17

Haitham Bou Ammar

hba123

AI & ML interests

LLMs, VLMs, Robotics, Reinforcement Learning, Bayesian Optimisation

Recent Activity

reacted to their post with 🚀 2 days ago

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this! I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO. Check it out: https://huggingface.co/blog/hba123/derivingdpo

posted an update 5 days ago

commented a paper 5 days ago

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

View all activity

Articles

Deriving DPO's Loss

5 days ago

• 19

Organizations

None yet

Posts 1

Post

1711

Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.

Check it out: https://huggingface.co/blog/hba123/derivingdpo

Papers 5

models

None public yet

datasets

None public yet