--- license: llama2 datasets: - wangyueqian/HawkEye-IT - wangyueqian/InternVid-G - OpenGVLab/VideoChat2-IT language: - en pipeline_tag: visual-question-answering --- #

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

This repo provides the checkpoint of HawkEye, and our implementation of VideoChat2. `videochat2-stage3-our_impl.pth` is the chekepoint of our reproduce of VideoChat2. You can use it as an substitution of `hawkeye.pth`. - The difference between it and HawkEye is: not trained with data from [InternVid-G](https://github.com/yellow-binary-tree/HawkEye/blob/main/internvid_g/README.md). - The difference between it and the original implementation of VideoChat2 is: the visual encoder is frozen, and not trained with image data from [VideoChat2-IT](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/DATA.md) For more details please refer to our [paper](https://arxiv.org/abs/2403.10228) and [github](https://github.com/yellow-binary-tree/HawkEye).