Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model Paper • 2312.12423 • Published Dec 19, 2023 • 13
UniVTG: Towards Unified Video-Language Temporal Grounding Paper • 2307.16715 • Published Jul 31, 2023 • 11
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Paper • 2307.05463 • Published Jul 11, 2023 • 10