Do generative video models learn physical principles from watching videos?
Abstract
AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.
Community
Are AI video models truly understanding the world, or just creating visually appealing illusions? This is the core question behind Physics-IQ, a new benchmark designed to rigorously test the physical reasoning abilities of AI video generation models. We provide a comprehensive dataset of 396 real-world videos covering diverse physical scenarios, from fluid dynamics to solid mechanics and more.
Our benchmark challenges models to predict the future of a scene, pushing them beyond simple pattern recognition. We use novel metrics like Spatial IoU, Spatiotemporal IoU, Weighted Spatial IoU, and MSE, to evaluate different aspects of physical understanding. The Physics-IQ score aggregates these measures and is normalized to physical variance observed between real-world videos. Our findings reveal that while current models produce visually realistic videos, they exhibit a significant lack of true physical understanding.
Explore our open-source dataset, evaluation code, and detailed results. Join us in pushing the boundaries of AI and physical reasoning! The project aims to quantify physical understanding, and make it possible to track future progress in the field
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper