Fatih C. Akyon's picture

Fatih C. Akyon

fcakyon

AI & ML interests

multi-modal learning, video understanding

Recent Activity

Organizations

Deprem Yapay Zeka's profile picture Yapay Zekâ Araştırma İnisiyatifi's profile picture Radiology-ai's profile picture OBSS's profile picture fixit's profile picture Gradio-Blocks-Party's profile picture ultralytics+'s profile picture Video Transformers's profile picture Viddexa AI's profile picture Ultralytics's profile picture

fcakyon's activity

New activity in thwri/CogFlorence-2 1 day ago

🚩 Report: Not working

2
#1 opened 5 days ago by
fcakyon
New activity in microsoft/Florence-2-large 5 days ago

add_confidence_score

2
#56 opened 5 months ago by
haipingwu
New activity in BAAI/bge-m3 5 days ago
reacted to merve's post with 🤗👀🚀 10 days ago
view post
Post
3198
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧶

✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench

The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️

Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful 🔥
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield

They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥
·
replied to merve's post 10 days ago
view reply

Great share @merve 💯Apollo links in your post not working, giving 404 for me 🤔

New activity in microsoft/Florence-2-base 2 months ago

Inherit from GenerationMixin

#22 opened 2 months ago by
fcakyon
New activity in microsoft/Florence-2-large 2 months ago

Inherit from GenerationMixin

3
#80 opened 2 months ago by
Link161
updated a Space 2 months ago