Hugging Face Forums
ivelin
computer vision, vision-language models, multi modal transformers