File size: 268 Bytes
473c771 |
1 2 3 4 5 6 7 |
---
pipeline_tag: image-text-to-text
---
This repository contains the VisVM model described in [Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension](https://huggingface.co/papers/2412.03704).
Code: https://github.com/si0wang/VisVM |