This demo combines the flexible task programming and reasoning of Gemini ER (what is the scene, and what should I do?) and classical camera calibration, kinematics, motion controllers. Each layer is independently swappable, and the AI model doesn't need to know anything about the robot's embodiment. This recreates the modularity of a Sense-Plan-Act architecture while retaining the semantic reasoning of a foundation AI model. A writeup explaining the tradeoffs is linked from the page
https://www.avikde.me/building-a-reasoning-hierarchical.