Tag Archives: Internal Model

A quantitative demonstration based on MDPs of the increasing need of a world model (learnt or given) as the complexity of the task and the performance of the agent increase

Jonathan Richens, David Abel, Alexis Bellot, Tom Everitt, General agents contain world models, arXiv cs:AI, Sep. 2025, arXiv:2506.01622.

Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent’s policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.

Interesting related work on internal models for action prediction and on the exploration/exploitation trade-off

Simón C. Smith; J. Michael Herrmann, Evaluation of Internal Models in Autonomous Learning, IEEE Transactions on Cognitive and Developmental Systems ( Volume: 11, Issue: 4, Dec. 2019), DOI: 10.1109/TCDS.2018.2865999.

Internal models (IMs) can represent relations between sensors and actuators in natural and artificial agents. In autonomous robots, the adaptation of IMs and the adaptation of the behavior are interdependent processes which have been studied under paradigms for self-organization of behavior such as homeokinesis. We compare the effect of various types of IMs on the generation of behavior in order to evaluate model quality across different behaviors. The considered IMs differ in the degree of flexibility and expressivity related to, respectively, learning speed and structural complexity of the model. We show that the different IMs generate different error characteristics which in turn lead to variations of the self-generated behavior of the robot. Due to the tradeoff between error minimization and complexity of the explored environment, we compare the models in the sense of Pareto optimality. Among the linear and nonlinear models that we analyze, echo-state networks achieve a particularly high performance which we explain as a result of the combination of fast learning and complex internal dynamics. More generally, we provide evidence that Pareto optimization is preferable in autonomous learning as it allows that a special solution can be negotiated in any particular environment.