chenjoya
/

videollm-online-8b-v1plus

Video-Text-to-Text

online video understanding

video understanding

Model card Files Files and versions Community

videollm-online-8b-v1plus / README.md

chenjoya's picture

Add new task tag (#1)

b6541f5 verified 7 months ago

|

history blame contribute delete

2.69 kB

	---
	license: mit
	library_name: peft
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	datasets:
	- chenjoya/videollm-online-chat-ego4d-134k
	language:
	- en
	tags:
	- llama
	- llama-3
	- multimodal
	- llm
	- video stream
	- online video understanding
	- video understanding
	pipeline_tag: video-text-to-text
	---

	# Model Card for Model ID

	https://showlab.github.io/videollm-online/

	## Model Details

	* LLM: meta-llama/Meta-Llama-3-8B-Instruct
	* Vision Strategy:
	* Frame Encoder: google/siglip-large-patch16-384
	* Frame Tokens: CLS Token + Avg Pooled 3x3 Tokens
	* Frame FPS: 2 for training, 2~10 for inference
	* Frame Resolution: max resolution 384, with zero-padding to keep aspect ratio
	* Video Length: 10 minutes
	* Training Data: Ego4D Narration Stream 113K + Ego4D GoalStep Stream 21K

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/showlab/videollm-online
	- Paper: https://arxiv.org/abs/2406.11816

	## Uses

	- First, clone the github repository and follow the installation instruction:

	```sh
	git clone https://github.com/showlab/videollm-online
	```

	Ensure you have Miniconda and Python version >= 3.10 installed, then run:
	```sh
	conda install -y pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
	pip install transformers accelerate deepspeed peft editdistance Levenshtein tensorboard gradio moviepy submitit
	pip install flash-attn --no-build-isolation
	```

	PyTorch source will make ffmpeg installed, but it is an old version and usually make very low quality preprocessing. Please install newest ffmpeg following:
	```sh
	wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
	tar xvf ffmpeg-release-amd64-static.tar.xz
	rm ffmpeg-release-amd64-static.tar.xz
	mv ffmpeg-7.0.1-amd64-static ffmpeg
	```

	If you want to try our model with the audio in real-time streaming, please also clone ChatTTS.

	```sh
	pip install omegaconf vocos vector_quantize_pytorch cython
	git clone git+https://github.com/2noise/ChatTTS
	mv ChatTTS demo/rendering/
	```

	- Launch the gradio demo locally with:
	```sh
	python -m demo.app --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
	```

	- Or launch the CLI locally with:
	```sh
	python -m demo.cli --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
	```

	## Citation

	```
	@inproceedings{videollm-online,
	author = {Joya Chen and Zhaoyang Lv and Shiwei Wu and Kevin Qinghong Lin and Chenan Song and Difei Gao and Jia-Wei Liu and Ziteng Gao and Dongxing Mao and Mike Zheng Shou},
	title = {VideoLLM-online: Online Video Large Language Model for Streaming Video},
	booktitle = {CVPR},
	year = {2024},
	}
	```