--- pipeline_tag: image-text-to-text --- This repository contains the VisVM model described in [Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension](https://huggingface.co/papers/2412.03704). Code: https://github.com/si0wang/VisVM