Juan-Antonio Fernández-Madrigal | kipr

Clustering of states transitions in RLs

October 9, 2025 08:47 , Juan-Antonio Fernández-Madrigal

Yasaman Saffari, Javad Salimi Sartakhti, A Graph-based State Representation Learning for episodic reinforcement learning in task-oriented dialogue systems, Engineering Applications of Artificial Intelligence, Volume 160, Part A, 2025 10.1016/j.engappai.2025.111793.

Recent research in dialogue state tracking has made significant progress in tracking user goals using pretrained language models and context-driven approaches. However, existing work has primarily focused on contextual representations, often overlooking the structural complexity and topological properties of state transitions in episodic reinforcement learning tasks. In this study, we introduce a cutting-edge, dual-perspective state representation approach that provides a dynamic and inductive method for topological state representation learning in episodic reinforcement learning within task-oriented dialogue systems. The proposed model extracts inherent topological information from state transitions in the Markov Decision Process graph by employing a modified clustering technique to address the limitations of transductive graph representation learning. It inductively captures structural relationships and enables generalization to unseen states. Another key innovation of this approach is the incorporation of dynamic graph representation learning with task-specific rewards using Temporal Difference error. This captures topological features of state transitions, allowing the system to adapt to evolving goals and enhance decision-making in task-oriented dialogue systems. Experiments, including ablation studies, comparisons with existing approaches, and interpretability analysis, reveal that the proposed model significantly outperforms traditional contextual state representations, improving task success rates by 9%–13% across multiple domains. It also surpasses state-of-the-art Q-network-based methods, enhancing adaptability and decision-making in domains such as movie-ticket booking, restaurant reservations, and taxi ordering.

Posted in: Artificial Intelligence , Tagged: Clustering, Reinforcement learning, Transition graph

On the abstraction of actions

October 9, 2025 08:39 , Juan-Antonio Fernández-Madrigal

Bita Banihashemi, Giuseppe De Giacomo, Yves Lespérance, Abstracting situation calculus action theories, Artificial Intelligence, Volume 348, 2025 10.1016/j.artint.2025.104407.

We develop a general framework for agent abstraction based on the situation calculus and the ConGolog agent programming language. We assume that we have a high-level specification and a low-level specification of the agent, both represented as basic action theories. A refinement mapping specifies how each high-level action is implemented by a low-level ConGolog program and how each high-level fluent can be translated into a low-level formula. We define a notion of sound abstraction between such action theories in terms of the existence of a suitable bisimulation between their respective models. Sound abstractions have many useful properties that ensure that we can reason about the agent’s actions (e.g., executability, projection, and planning) at the abstract level, and refine and concretely execute them at the low level. We also characterize the notion of complete abstraction where all actions (including exogenous ones) that the high level thinks can happen can in fact occur at the low level. To facilitate verifying that one has a sound/complete abstraction relative to a mapping, we provide a set of necessary and sufficient conditions. Finally, we identify a set of basic action theory constraints that ensure that for any low-level action sequence, there is a unique high-level action sequence that it refines. This allows us to track/monitor what the low-level agent is doing and describe it in abstract terms (i.e., provide high-level explanations, for instance, to a client or manager).

Posted in: Artificial Intelligence , Tagged: Abstraction, Action

Learning representations from RL based on symmetries

October 9, 2025 08:35 , Juan-Antonio Fernández-Madrigal

Alexander Dean, Eduardo Alonso, Esther Mondragón, MAlgebras of actions in an agent’s representations of the world, Artificial Intelligence, Volume 348, 2025, 10.1016/j.tics.2025.06.009.

Learning efficient representations allows robust processing of data, data that can then be generalised across different tasks and domains, and it is thus paramount in various areas of Artificial Intelligence, including computer vision, natural language processing and reinforcement learning, among others. Within the context of reinforcement learning, we propose in this paper a mathematical framework to learn representations by extracting the algebra of the transformations of worlds from the perspective of an agent. As a starting point, we use our framework to reproduce representations from the symmetry-based disentangled representation learning (SBDRL) formalism proposed by [1] and prove that, although useful, they are restricted to transformations that respond to the properties of algebraic groups. We then generalise two important results of SBDRL –the equivariance condition and the disentangling definition– from only working with group-based symmetry representations to working with representations capturing the transformation properties of worlds for any algebra, using examples common in reinforcement learning and generated by an algorithm that computes their corresponding Cayley tables. Finally, we combine our generalised equivariance condition and our generalised disentangling definition to show that disentangled sub-algebras can each have their own individual equivariance conditions, which can be treated independently, using category theory. In so doing, our framework offers a rich formal tool to represent different types of symmetry transformations in reinforcement learning, extending the scope of previous proposals and providing Artificial Intelligence developers with a sound foundation to implement efficient applications.

Posted in: Reinforcement learning in AI , Tagged: Abstraction, Knowledge representation, Reinforcement learning

Short letter with evidences of the use of models in mammal decision making, relating it to reinforcement learning

October 9, 2025 08:29 , Juan-Antonio Fernández-Madrigal

Ivo Jacobs, Tomas Persson, Peter Gärdenfors, Model-based animal cognition slips through the sequence bottleneck, Trends in Cognitive Sciences, Volume 29, Issue 10, 2025, Pages 872-873, 10.1016/j.tics.2025.06.009.

In a recent article in TiCS, Lind and Jon-And argued that the sequence memory of animals constitutes a cognitive bottleneck, the ‘sequence bottleneck’, and that mental simulations require faithful representation of sequential information. They therefore concluded that animals cannot perform mental simulations, and that behavioral and neurobiological studies suggesting otherwise are best interpreted as results of associative learning. Through examples of predictive maps, cognitive control, and active sleep, we illustrate the overwhelming evidence that mammals and birds make model-based simulations, which suggests the sequence bottleneck to be more limited in scope than proposed by Lind and Jon-And […]

There is a response to this paper.

Posted in: Psycho-physiological bases of engineering , Tagged: Model-based reinforcement learning, Reinforcement learning

A good review of the state of the art in hybridizing NNs and physical knowledge

October 9, 2025 08:24 , Juan-Antonio Fernández-Madrigal

Mikel Merino-Olagüe, Xabier Iriarte, Carlos Castellano-Aldave, Aitor Plaza, Hybrid modelling and identification of mechanical systems using Physics-Enhanced Machine Learning, Engineering Applications of Artificial Intelligence, Volume 159, Part C, 2025, 10.1016/j.engappai.2025.111762.

Obtaining mathematical models for mechanical systems is a key subject in engineering. These models are essential for calculation, simulation and design tasks, and they are usually obtained from physical principles or by fitting a black-box parametric input–output model to experimental data. However, both methodologies have some limitations: physics based models may not take some phenomena into account and black-box models are complicated to interpretate. In this work, we develop a novel methodology based on discrepancy modelling, which combines physical principles with neural networks to model mechanical systems with partially unknown or unmodelled physics. Two different mechanical systems with partially unknown dynamics are successfully modelled and the values of their physical parameters are obtained. Furthermore, the obtained models enable numerical integration for future state prediction, linearization and the possibility of varying the values of the physical parameters. The results show how a hybrid methodology provides accurate and interpretable models for mechanical systems when some physical information is missing. In essence, the presented methodology is a tool to obtain better mathematical models, which could be used for analysis, simulation and design tasks.

Posted in: Systems and Signals , Tagged: Mechanical systems, Neural Networks Explanation, System modelling

Model-based RL that addresses the problem of building models that can produce off-distribution data more safely

September 25, 2025 05:13 , Juan-Antonio Fernández-Madrigal

X. -Y. Liu et al., DOMAIN: Mildly Conservative Model-Based Offline Reinforcement Learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7142-7155, Oct. 2025, 10.1109/TSMC.2025.3578666.

Model-based reinforcement learning (RL), which learns an environment model from the offline dataset and generates more out-of-distribution model data, has become an effective approach to the problem of distribution shift in offline RL. Due to the gap between the learned and actual environment, conservatism should be incorporated into the algorithm to balance accurate offline data and imprecise model data. The conservatism of current algorithms mostly relies on model uncertainty estimation. However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism. To address the above issues, this article proposes a mildly conservative model-based offline RL algorithm (DOMAIN) without estimating model uncertainty, and designs the adaptive sampling distribution of model samples, which can adaptively adjust the model data penalty. In this article, we theoretically demonstrate that the Q value learned by the DOMAIN outside the region is a lower bound of the true Q value, the DOMAIN is less conservative than previous model-based offline RL algorithms, and has the guarantee of safety policy improvement. The results of extensive experiments show that DOMAIN outperforms prior RL algorithms and the average performance has improved by 1.8% on the D4RL benchmark.

Related: 10.1109/TSMC.2025.3583392

Posted in: Reinforcement learning in AI , Tagged: Model-based reinforcement learning

DRL in non-stationary environments

September 25, 2025 05:08 , Juan-Antonio Fernández-Madrigal

Y. Fu and Y. Gao, Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7012-7023, Oct. 2025, 10.1109/TSMC.2025.3578730.

Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.

Posted in: Reinforcement learning theory , Tagged: Deep reinforcement learning, Monte Carlo Tree Search, Non-stationary environments

Accelerating recognition processes of images through NNs that do not vary their weights

September 11, 2025 09:17 , Juan-Antonio Fernández-Madrigal

Yanli Yang, A brain-inspired projection contrastive learning network for instantaneous learning, Engineering Applications of Artificial Intelligence, Volume 158, 2025 , 10.1016/j.engappai.2025.111524.

The biological brain can learn quickly and efficiently, while the learning of artificial neural networks is astonishing time-consuming and energy-consuming. Biosensory information is quickly projected to the memory areas to be identified or to be signed with a label through biological neural networks. Inspired by the fast learning of biological brains, a projection contrastive learning model is designed for the instantaneous learning of samples. This model is composed of an information projection module for rapid information representation and a contrastive learning module for neural manifold disentanglement. An algorithm instance of projection contrastive learning is designed to process some machinery vibration signals and is tested on several public datasets. The test on a mixed dataset containing 1426 training samples and 14,260 testing samples shows that the running time of our algorithm is approximately 37 s and that the average processing time is approximately 2.31 ms per sample, which is comparable to the processing speed of a human vision system. A prominent feature of this algorithm is that it can track the decision-making process to provide an explanation of outputs in addition to its fast running speed.

Posted in: Psycho-physiological bases of engineering , Tagged: Neural networks

Adapting a shared teleoperation system to network delays

September 4, 2025 08:42 , Juan-Antonio Fernández-Madrigal

B. Güleçyüz et al., Enhancing Shared Autonomy in Teleoperation Under Network Delay: Transparency- and Confidence-Aware Arbitration, IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 9654-9661, Oct. 2025, 10.1109/LRA.2025.3596436.

Shared autonomy bridges human expertise with machine intelligence, yet existing approaches often overlook the impact of teleoperation delays. To address this gap, we propose a novel shared autonomy approach that enables robots to gradually learn from teleoperated demonstrations while adapting to network delays. Our method improves intent prediction by accounting for delayed feedback to the human operator and adjusts the arbitration function to balance reduced human confidence due to delay with confidence in learned autonomy. To ensure system stability, which might be compromised by delay and arbitration of human and autonomy control forces, we introduce a three-port extension of the Time-Domain Passivity Approach with Energy Reflection (TDPA-ER). Experimental validation with 12 participants demonstrated improvements in intent prediction accuracy, task performance, and the quality of final learned autonomy, highlighting the potential of our approach to enhance teleoperation and learning quality in remote environments.

Posted in: Human teleoperation , Tagged: Shared autonomy

A review of cognitive costs of decision making

September 4, 2025 08:36 , Juan-Antonio Fernández-Madrigal

Christin Schulze, Ada Aka, Daniel M. Bartels, Stefan F. Bucher, Jake R. Embrey, Todd M. Gureckis, Gerald Häubl, Mark K. Ho, Ian Krajbich, Alexander K. Moore, Gabriele Oettingen, Joan D.K. Ongchoco, Ryan Oprea, Nicholas Reinholtz, Ben R. Newell, A timeline of cognitive costs in decision-making, Trends in Cognitive Sciences, Volume 29, Issue 9, 2025, Pages 827-839, 10.1016/j.tics.2025.04.004.

Recent research from economics, psychology, cognitive science, computer science, and marketing is increasingly interested in the idea that people face cognitive costs when making decisions. Reviewing and synthesizing this research, we develop a framework of cognitive costs that organizes concepts along a temporal dimension and maps out when costs occur in the decision-making process and how they impact decisions. Our unifying framework broadens the scope of research on cognitive costs to a wider timeline of cognitive processing. We identify implications and recommendations emerging from our framework for intervening on behavior to tackle some of the most pressing issues of our day, from improving health and saving decisions to mitigating the consequences of climate change.

Posted in: Psycho-physiological bases of engineering , Tagged: Decision making

« Previous 1 2 3 4 … 80 Next »

Author Archives: Juan-antonio Fernández-madrigal

Clustering of states transitions in RLs

Yasaman Saffari, Javad Salimi Sartakhti, A Graph-based State Representation Learning for episodic reinforcement learning in task-oriented dialogue systems, Engineering Applications of Artificial Intelligence, Volume 160, Part A, 2025 10.1016/j.engappai.2025.111793.

On the abstraction of actions

Bita Banihashemi, Giuseppe De Giacomo, Yves Lespérance, Abstracting situation calculus action theories, Artificial Intelligence, Volume 348, 2025 10.1016/j.artint.2025.104407.

Learning representations from RL based on symmetries

Alexander Dean, Eduardo Alonso, Esther Mondragón, MAlgebras of actions in an agent’s representations of the world, Artificial Intelligence, Volume 348, 2025, 10.1016/j.tics.2025.06.009.

Short letter with evidences of the use of models in mammal decision making, relating it to reinforcement learning

Ivo Jacobs, Tomas Persson, Peter Gärdenfors, Model-based animal cognition slips through the sequence bottleneck, Trends in Cognitive Sciences, Volume 29, Issue 10, 2025, Pages 872-873, 10.1016/j.tics.2025.06.009.

A good review of the state of the art in hybridizing NNs and physical knowledge

Mikel Merino-Olagüe, Xabier Iriarte, Carlos Castellano-Aldave, Aitor Plaza, Hybrid modelling and identification of mechanical systems using Physics-Enhanced Machine Learning, Engineering Applications of Artificial Intelligence, Volume 159, Part C, 2025, 10.1016/j.engappai.2025.111762.

Model-based RL that addresses the problem of building models that can produce off-distribution data more safely

X. -Y. Liu et al., DOMAIN: Mildly Conservative Model-Based Offline Reinforcement Learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7142-7155, Oct. 2025, 10.1109/TSMC.2025.3578666.

DRL in non-stationary environments

Y. Fu and Y. Gao, Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7012-7023, Oct. 2025, 10.1109/TSMC.2025.3578730.

Accelerating recognition processes of images through NNs that do not vary their weights

Yanli Yang, A brain-inspired projection contrastive learning network for instantaneous learning, Engineering Applications of Artificial Intelligence, Volume 158, 2025 , 10.1016/j.engappai.2025.111524.

Adapting a shared teleoperation system to network delays

B. Güleçyüz et al., Enhancing Shared Autonomy in Teleoperation Under Network Delay: Transparency- and Confidence-Aware Arbitration, IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 9654-9661, Oct. 2025, 10.1109/LRA.2025.3596436.

A review of cognitive costs of decision making

Post Navigation

Fields, areas and lines of research

Archives

Author Archives: Juan-antonio Fernández-madrigal

Yasaman Saffari, Javad Salimi Sartakhti, A Graph-based State Representation Learning for episodic reinforcement learning in task-oriented dialogue systems, Engineering Applications of Artificial Intelligence, Volume 160, Part A, 2025 10.1016/j.engappai.2025.111793.

Bita Banihashemi, Giuseppe De Giacomo, Yves Lespérance, Abstracting situation calculus action theories, Artificial Intelligence, Volume 348, 2025 10.1016/j.artint.2025.104407.

Alexander Dean, Eduardo Alonso, Esther Mondragón, MAlgebras of actions in an agent’s representations of the world, Artificial Intelligence, Volume 348, 2025, 10.1016/j.tics.2025.06.009.

Ivo Jacobs, Tomas Persson, Peter Gärdenfors, Model-based animal cognition slips through the sequence bottleneck, Trends in Cognitive Sciences, Volume 29, Issue 10, 2025, Pages 872-873, 10.1016/j.tics.2025.06.009.

Mikel Merino-Olagüe, Xabier Iriarte, Carlos Castellano-Aldave, Aitor Plaza, Hybrid modelling and identification of mechanical systems using Physics-Enhanced Machine Learning, Engineering Applications of Artificial Intelligence, Volume 159, Part C, 2025, 10.1016/j.engappai.2025.111762.

X. -Y. Liu et al., DOMAIN: Mildly Conservative Model-Based Offline Reinforcement Learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7142-7155, Oct. 2025, 10.1109/TSMC.2025.3578666.

Y. Fu and Y. Gao, Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7012-7023, Oct. 2025, 10.1109/TSMC.2025.3578730.

Yanli Yang, A brain-inspired projection contrastive learning network for instantaneous learning, Engineering Applications of Artificial Intelligence, Volume 158, 2025 , 10.1016/j.engappai.2025.111524.

B. Güleçyüz et al., Enhancing Shared Autonomy in Teleoperation Under Network Delay: Transparency- and Confidence-Aware Arbitration, IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 9654-9661, Oct. 2025, 10.1109/LRA.2025.3596436.

Post Navigation

Fields, areas and lines of research

Transversal topics, methods and tools

Archives