The proposed System M (Meta-control) is a nice theoretical fix, but the implementation is where the wheels usually come off. Integrating observation (A) and action (B) sounds great until the agent starts hallucinating its own feedback loops. Unless we can move away from this 'outsourced learning' where humans have to fix every domain mismatch, we're just building increasingly expensive parrots. I’m skeptical if 'bilevel optimization' is enough to bridge that gap or if we’re just adding another layer of complexity to a fundamentally limited transformer architecture.
aanet•3h ago
"he proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales. "
dasil003•2h ago
marsten•1h ago
ehnto•23m ago
iFire•16m ago