I disagree. Sora has not learned physics, and is not a reliable guide to how to the world works. It is at best a semi-reliable guide to how the world looks. (View Highlight)
These two—how the world works, and how the world looks—are fundamentally different. (View Highlight)
A physics engine takes a list of objects and locations etc. and computes temporal updates, and allows you to render views.
At the core is a bit of basic physics: the notion of object permanence, the fact that other things being equal objects tend to persist over time (even if e.g., they are occluded from some perspective) (View Highlight)
Other videos feature chairs levitating, basketballs passing through rims and exploding, glasses leaping spontaneously into the air, people spontaneously changing size, etc, in what I earlier called Sora’s Surreal Physics. No actual physics is ever modeled, and the entities depicted are not constrained by the laws of physics. They do as pixels do, not as collections of atoms must. (View Highlight)
Whenever there is a conflict between physics and image sequence prediction, image sequence prediction wins. The laws of physics are about objects, not pixels; Sora knows only about pixels and image spaces, not objects. (View Highlight)