metadata
pipeline_tag: image-text-to-text
This repository contains the VisVM model described in Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension.
pipeline_tag: image-text-to-text
This repository contains the VisVM model described in Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension.