Category Archives: Cognitive Sciences

Unexpected consequences of training smarthome systems with reinforcement learning: effects on human behaviours

S. Suman, A. Etemad and F. Rivest, TPotential Impacts of Smart Homes on Human Behavior: A Reinforcement Learning Approach, IEEE Transactions on Artificial Intelligence, vol. 3, no. 4, pp. 567-580, Aug. 2022 DOI: 10.1109/TAI.2021.3127483.

Smart homes are becoming increasingly popular as a result of advances in machine learning and cloud computing. Devices, such as smart thermostats and speakers, are now capable of learning from user feedback and adaptively adjust their settings to human preferences. Nonetheless, these devices might in turn impact human behavior. To investigate the potential impacts of smart homes on human behavior, we simulate a series of hierarchical-reinforcement learning-based human models capable of performing various activities\u2014namely, setting temperature and humidity for thermal comfort inside a Q-Learning-based smart home model. We then investigate the possibility of the human models\u2019 behaviors being altered as a result of the smart home and the human model adapting to one another. For our human model, the activities are based on hierarchical-reinforcement learning. This allows the human to learn how long it must continue a given activity and decide when to leave it. We then integrate our human model in the environment along with the smart home model and perform rigorous experiments considering various scenarios involving a model of a single human and models of two different humans with the smart home. Our experiments show that with the smart home, the human model can exhibit unexpected behaviors such as frequent changing of activities and an increase in the time required to modify the thermal preferences. With two human models, we interestingly observe that certain combinations of models result in normal behaviors, while other combinations exhibit the same unexpected behaviors as those observed from the single human experiment.

Improving the quality of memory replay in RL through an evolutionary algorithm biologically inspired

M. Ramicic and A. Bonarini, Augmented Memory Replay in Reinforcement Learning With Continuous Control, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 485-496, June 2022 DOI: 10.1109/TCDS.2021.3050723.

Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent\u2019s state space enabling it to scale up to more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which acts as a protective layer of higher cognitive functions found in mammalian brain: active memory consolidation mitigates the effect of forgetting previous memories by dynamically processing the new ones. Similar dynamics are implemented by the proposed augmented memory replay or AMR algorithm. The architecture of AMR , based on a simple artificial neural network is able to provide an augmentation policy which modifies each of the agents experiences by augmenting their relevance prior to storing them in the replay memory. The function approximator of AMR is evolved using genetic algorithm in order to obtain the specific augmentation policy function that yields the best performance of a learning agent in a specific environment given by its received cumulative reward. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains.

Modelling the perception of time in the human brain through RL with eligibility traces

I. Louren�o, R. Mattila, R. Ventura and B. Wahlberg, A Biologically Inspired Computational Model of Time Perception, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 258-268, June 2022 DOI: 10.1109/TCDS.2021.3120301.

Time perception\u2014how humans and animals perceive the passage of time\u2014forms the basis for important cognitive skills, such as decision making, planning, and communication. In this work, we propose a framework for examining the mechanisms responsible for time perception. We first model neural time perception as a combination of two known timing sources: internal neuronal mechanisms and external (environmental) stimuli, and design a decision-making framework to replicate them. We then implement this framework in a simulated robot. We measure the robot\u2019s success on a temporal discrimination task originally performed by mice to evaluate their capacity to exploit temporal knowledge. We conclude that the robot is able to perceive time similarly to animals when it comes to their intrinsic mechanisms of interpreting time and performing time-aware actions. Next, by analyzing the behavior of agents equipped with the framework, we propose an estimator to infer characteristics of the timing mechanisms intrinsic to the agents. In particular, we show that from their empirical action probability distribution, we are able to estimate parameters used for perceiving time. Overall, our work shows promising results when it comes to drawing conclusions regarding some of the characteristics present in biological timing mechanisms.

NOTE: See also H. Basgol, I. Ayhan and E. Ugur, “Time Perception: A Review on Psychological, Computational, and Robotic Models,” in IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 301-315, June 2022, doi: 10.1109/TCDS.2021.3059045.

Dealing with the exploration with a nice introduction to the problem

Jiayi Lu, Shuai Han, Shuai L�, Meng Kang, Junwei Zhang, Sampling diversity driven exploration with state difference guidance, Expert Systems with Applications, Volume 203, 2022 DOI: 10.1016/j.eswa.2022.117418.

Exploration is one of the key issues of deep reinforcement learning, especially in the environments with sparse or deceptive rewards. Exploration based on intrinsic rewards can handle these environments. However, these methods cannot take both global interaction dynamics and local environment changes into account simultaneously. In this paper, we propose a novel intrinsic reward for off-policy learning, which not only encourages the agent to take actions not fully learned from a global perspective, but also instructs the agent to trigger remarkable changes in the environment from a local perspective. Meanwhile, we propose the double-actors\u2013double-critics framework to combine intrinsic rewards with extrinsic rewards to avoid the inappropriate combination of intrinsic and extrinsic rewards in previous methods. This framework can be applied to off-policy learning algorithms based on the actor\u2013critic method. We provide a comprehensive evaluation of our approach on the MuJoCo benchmark environments. The results demonstrate that our method can perform effective exploration in the environments with dense, deceptive and sparse rewards. Besides, we conduct sufficient ablation and quantitative analyses to intrinsic rewards. Furthermore, we also verify the superiority and rationality of our double-actors\u2013double-critics framework through comparative experiments.

Increasing exploration when the agent performs worse, decreasing when performing better, in the context of DQN for distributing computation among cloud and edge servers, also dealing with hybridization of RL with Fuzzy

Do Bao Son, Ta Huu Binh, Hiep Khac Vo, Binh Minh Nguyen, Huynh Thi Thanh Binh, Shui Yu, Value-based reinforcement learning approaches for task offloading in Delay Constrained Vehicular Edge Computing, Engineering Applications of Artificial Intelligence, Volume 113, 2022 DOI: 10.1016/j.engappai.2022.104898.

In the age of booming information technology, human-being has witnessed the need for new paradigms with both high computational capability and low latency. A potential solution is Vehicular Edge Computing (VEC). Previous work proposed a Fuzzy Deep Q-Network in Offloading scheme (FDQO) that combines Fuzzy rules and Deep Q-Network (DQN) to improve DQN\u2019s early performance by using Fuzzy Controller (FC). However, we notice that frequent usage of FC can hinder the future growth performance of model. One way to overcome this issue is to remove Fuzzy Controller entirely. We introduced an algorithm called baseline DQN (b-DQN), represented by its two variants Static baseline DQN (Sb-DQN) and Dynamic baseline DQN (Db-DQN), to modify the exploration rate base on the average rewards of closest observations. Our findings confirm that these baseline DQN algorithms surpass traditional DQN models in terms of average Quality of Experience (QoE) in 100 time slots by about 6%, but still suffer from poor early performance (such as in the first 5 time slots). Here, we introduce baseline FDQO (b-FDQO). This algorithm has a strategy to modify the Fuzzy Logic usage instead of removing it entirely while still observing the rewards to modify the exploration rate. It brings a higher average QoE in the first 5 time slots compared to other non-fuzzy-logic algorithms by at least 55.12%, prevent the model from getting too bad result over all time slots, while having the late performance as good as that of b-DQN.

Live-RL enhancement / reduction of unsafe situations by reducing the transition possibility of unsafe actions

Serhat Duman, Hamdi Tolga Kahraman, Yusuf Sonmez, Ugur Guvenc, Mehmet Kati, Sefa Aras, A powerful meta-heuristic search algorithm for solving global optimization and real-world solar photovoltaic parameter estimation problems, Engineering Applications of Artificial Intelligence, Volume 111, 2022 DOI: 10.1016/j.engappai.2022.104763.

The teaching-learning-based artificial bee colony (TLABC) is a new hybrid swarm-based metaheuristic search algorithm. It combines the exploitation of the teaching learning-based optimization (TLBO) with the exploration of the artificial bee colony (ABC). With the hybridization of these two nature-inspired swarm intelligence algorithms, a robust method has been proposed to solve global optimization problems. However, as with swarm-based algorithms, with the TLABC method, it is a great challenge to effectively simulate the selection process. Fitness-distance balance (FDB) is a powerful recently developed method to effectively imitate the selection process in nature. In this study, the three search phases of the TLABC algorithm were redesigned using the FDB method. In this way, the FDB-TLABC algorithm, which imitates nature more effectively and has a robust search performance, was developed. To investigate the exploitation, exploration, and balanced search capabilities of the proposed algorithm, it was tested on standard and complex benchmark suites (Classic, IEEE CEC 2014, IEEE CEC 2017, and IEEE CEC 2020). In order to verify the performance of the proposed FDB-TLABC for global optimization problems and in the photovoltaic parameter estimation problem (a constrained real-world engineering problem) a very comprehensive and qualified experimental study was carried out according to IEEE CEC standards. Statistical analysis results confirmed that the proposed FDB-TLABC provided the best optimum solution and yielded a superior performance compared to other optimization methods.

Interesting summary of photovoltaic modelling

Serhat Duman, Hamdi Tolga Kahraman, Yusuf Sonmez, Ugur Guvenc, Mehmet Kati, Sefa Aras, A powerful meta-heuristic search algorithm for solving global optimization and real-world solar photovoltaic parameter estimation problems, Engineering Applications of Artificial Intelligence, Volume 111, 2022 DOI: 10.1016/j.engappai.2022.104763.

The teaching-learning-based artificial bee colony (TLABC) is a new hybrid swarm-based metaheuristic search algorithm. It combines the exploitation of the teaching learning-based optimization (TLBO) with the exploration of the artificial bee colony (ABC). With the hybridization of these two nature-inspired swarm intelligence algorithms, a robust method has been proposed to solve global optimization problems. However, as with swarm-based algorithms, with the TLABC method, it is a great challenge to effectively simulate the selection process. Fitness-distance balance (FDB) is a powerful recently developed method to effectively imitate the selection process in nature. In this study, the three search phases of the TLABC algorithm were redesigned using the FDB method. In this way, the FDB-TLABC algorithm, which imitates nature more effectively and has a robust search performance, was developed. To investigate the exploitation, exploration, and balanced search capabilities of the proposed algorithm, it was tested on standard and complex benchmark suites (Classic, IEEE CEC 2014, IEEE CEC 2017, and IEEE CEC 2020). In order to verify the performance of the proposed FDB-TLABC for global optimization problems and in the photovoltaic parameter estimation problem (a constrained real-world engineering problem) a very comprehensive and qualified experimental study was carried out according to IEEE CEC standards. Statistical analysis results confirmed that the proposed FDB-TLABC provided the best optimum solution and yielded a superior performance compared to other optimization methods.

Action selection strategy for model-free RL based on neurophysiology

D. Wang, S. Chen, Y. Hu, L. Liu and H. Wang, Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model, IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 1, pp. 219-233, March 2022 DOI: 10.1109/TCDS.2020.3035778.

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor\u2013critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

The brain as a communication network

John D. Mollon, Chie Takahashi, Marina V. Danilova, What kind of network is the brain? Trends in Cognitive Sciences, Volume 26, Issue 4, 2022, Pages 312-324 DOI: 10.1016/j.tics.2022.01.007.

The different areas of the cerebral cortex are linked by a network of white matter, comprising the myelinated axons of pyramidal cells. Is this network a neural net, in the sense that representations of the world are embodied in the structure of the net, its pattern of nodes, and connections? Or is it a communications network, where the same physical substrate carries different information from moment to moment? This question is part of the larger question of whether the brain is better modeled by connectionism or by symbolic artificial intelligence (AI), but we review it in the specific context of the psychophysics of stimulus comparison and the format and protocol of information transmission over the long-range tracts of the brain.

An hypothesis that human perception can only be done in real-time if prediction mechanisms go ahead and save the gap caused by the processing of inputs, which actually cannot be done in real-time (plus further post-processing and adjustment of past perceptions)

Hinze Hogendoorn, Perception in real-time: predicting the present, reconstructing the past, Trends in Cognitive Sciences, Volume 26, Issue 2, 2022 DOI: 10.1016/j.tics.2021.11.003.

We feel that we perceive events in the environment as they unfold in real-time. However, this intuitive view of perception is impossible to implement in the nervous system due to biological constraints such as neural transmission delays. I propose a new way of thinking about real-time perception: at any given moment, instead of representing a single timepoint, perceptual mechanisms represent an entire timeline. On this timeline, predictive mechanisms predict ahead to compensate for delays in incoming sensory input, and reconstruction mechanisms retroactively revise perception when those predictions do not come true. This proposal integrates and extends previous work to address a crucial gap in our understanding of a fundamental aspect of our everyday life: the experience of perceiving the present.