It's not possible to understand the world from just videos. This is a basic information theoretic argument that anyone should be able to derive for themselves. There is no mapping from pixels in video to principles about physical objects that can be meaningful in any real sense. There is no way to derive conservation of mass & energy no matter how many thousands of years of pixels are analyzed. An obvious concrete example (pun intended) is deriving the weight of a brick from just video observation. It can't be done because pixel measurements are not sufficient for deriving the mass of an object.
measurablefunc•54m ago