Tag Archives: Deep Neural Networks

An interesting survey -before the “generative AI” boom- of the integration of sub-symbolic (for learning) and symbolic (for reasoning) systems

Artur d’Avila Garcez, Luis C. Lamb, Neurosymbolic AI: The 3rd Wave, arXiv:2012.05876 [cs.AI] https://arxiv.org/abs/2012.05876v2.

Current advances in Artificial Intelligence (AI) and Machine Learning (ML) have achieved unprecedented impact across research communities and industry. Nevertheless, concerns about trust, safety, interpretability and accountability of AI were raised by influential thinkers. Many have identified the need for well-founded knowledge representation and reasoning to be integrated with deep learning and for sound explainability. Neural-symbolic computing has been an active area of research for many years seeking to bring together robust learning in neural networks with reasoning and explainability via symbolic representations for network models. In this paper, we relate recent and early research results in neurosymbolic AI with the objective of identifying the key ingredients of the next wave of AI systems. We focus on research that integrates in a principled way neural network-based learning with symbolic knowledge representation and logical reasoning. The insights provided by 20 years of neural-symbolic computing are shown to shed new light onto the increasingly prominent role of trust, safety, interpretability and accountability of AI. We also identify promising directions and challenges for the next decade of AI research from the perspective of neural-symbolic systems.

Equivalence between Transformers and SVMs

Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak, Transformers as Support Vector Machines, arXiv:2308.16898 [cs.LG], https://arxiv.org/abs/2308.16898.

Since its inception in “Attention Is All You Need”, transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens X and makes them interact through pairwise similarities computed as softmax(XQK⊤X⊤), where (K,Q) are the trainable key-query parameters. In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs. This formalism allows us to characterize the implicit bias of 1-layer transformers optimized with gradient descent: (1) Optimizing the attention layer with vanishing regularization, parameterized by (K,Q), converges in direction to an SVM solution minimizing the nuclear norm of the combined parameter W=KQ⊤. Instead, directly parameterizing by W minimizes a Frobenius norm objective. We characterize this convergence, highlighting that it can occur toward locally-optimal directions rather than global ones. (2) Complementing this, we prove the local/global directional convergence of gradient descent under suitable geometric conditions. Importantly, we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points. (3) While our theory applies primarily to linear prediction heads, we propose a more general SVM equivalence that predicts the implicit bias with nonlinear heads. Our findings are applicable to arbitrary datasets and their validity is verified via experiments. We also introduce several open problems and research directions. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.

Leveraging the unexplainability and opacity of NNs to generate random numbers

Y. Almardeny, A. Benavoli, N. Boujnah and E. Naredo, A Reinforcement Learning System for Generating Instantaneous Quality Random Sequences, IEEE Transactions on Artificial Intelligence, vol. 4, no. 3, pp. 402-415, June 2023 DOI: 10.1109/TAI.2022.3161893.

Random numbers are essential to most computer applications. Still, producing high-quality random sequences is a big challenge. Inspired by the success of artificial neural networks and reinforcement learning, we propose a novel and effective end-to-end learning system to generate pseudorandom sequences that operates under the upside-down reinforcement learning framework. It is based on manipulating the generalized information entropy metric to derive commands that instantaneously guide the agent toward the optimal random behavior. Using a wide range of evaluation tests, the proposed approach is compared against three state-of-the-art accredited pseudorandom number generators (PRNGs). The experimental results agree with our theoretical study and show that the proposed framework is a promising candidate for a wide range of applications.

Embedding actual knowledge into Deep Learning to improve its reliability

Lutter M, Peters J., Combining physics and deep learning to learn continuous-time dynamics models, The International Journal of Robotics Research. 2023;42(3):83-107 DOI: 10.1177/02783649231169492.

Deep learning has been widely used within learning algorithms for robotics. One disadvantage of deep networks is that these networks are black-box representations. Therefore, the learned approximations ignore the existing knowledge of physics or robotics. Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles. To learn dynamics models with deep networks that guarantee physically plausible dynamics, we introduce physics-inspired deep networks that combine first principles from physics with deep learning. We incorporate Lagrangian mechanics within the model learning such that all approximated models adhere to the laws of physics and conserve energy. Deep Lagrangian Networks (DeLaN) parametrize the system energy using two networks. The parameters are obtained by minimizing the squared residual of the Euler\u2013Lagrange differential equation. Therefore, the resulting model does not require specific knowledge of the individual system, is interpretable, and can be used as a forward, inverse, and energy model. Previously these properties were only obtained when using system identification techniques that require knowledge of the kinematic structure. We apply DeLaN to learning dynamics models and apply these models to control simulated and physical rigid body systems. The results show that the proposed approach obtains dynamics models that can be applied to physical systems for real-time control. Compared to standard deep networks, the physics-inspired models learn better models and capture the underlying structure of the dynamics.

Using CNNs trained with image data to predict time series data

Aniello De Santo, Antonino Ferraro, Antonio Galli, Vincenzo Moscato, Giancarlo Sperl�, Evaluating time series encoding techniques for Predictive Maintenance, Expert Systems with Applications, Volume 210, 2022 DOI: 10.1016/j.eswa.2022.118435.

Predictive Maintenance has become an important component in modern industrial scenarios, as a way to minimize down-times and fault rate for different equipment. In this sense, while machine learning and deep learning approaches are promising due to their accurate predictive abilities, their data-heavy requirements make them significantly limited in real world applications. Since one of the main issues to overcome is lack of consistent training data, recent work has explored the possibility of adapting well-known deep-learning models for image recognition, by exploiting techniques to encode time series as images. In this paper, we propose a framework for evaluating some of the best known time series encoding techniques, together with Convolutional Neural Network-based image classifiers applied to predictive maintenance tasks. We conduct an extensive empirical evaluation of these approaches for the failure prediction task on two real-world datasets (PAKDD2020 Alibaba AI OPS Competition and NASA bearings), also comparing their performances with respect to the state-of-the-art approaches. We further discuss advantages and limitation of the exploited models when coupled with proper data augmentation techniques.

Reducing outliers in time series with singular spectrum analysis and use of deep learning for change detection

Muktesh Gupta, Rajesh Wadhvani, Akhtar Rasool, Real-time Change-Point Detection: A deep neural network-based adaptive approach for detecting changes in multivariate time series data, Expert Systems with Applications, Volume 209, 2022 DOI: 10.1016/j.eswa.2022.118260.

The behavior of a time series may be affected by various factors. Changes in mean, variance, frequency, and auto-correlation are the most common. Change-Point Detection (CPD) aims to track down abrupt statistical characteristic changes in time series that can benefit many applications in different domains. As demonstrated in recently introduced CPD methodologies, deep learning approaches have the potential to identify more subtle changes. However, due to improper handling of data and insufficient training, these methodologies generate more false alarms and are not efficient enough in detecting change-points. In real-time CPD algorithms, preprocessed data plays a vital role in increasing the algorithm\u2019s efficiency and minimizing false alarm rates. Therefore, preprocessing of data should be a part of the algorithm, but in the existing methods, preprocessing of data is done initially, and then the whole dataset is passed to the CPD algorithm. A new three-phase architecture is proposed to address this issue, in which all phases, from preprocessing to CPD, work in an adaptive manner. The phases are integrated into a pipeline, allowing the algorithm to work in real-time. Our proposed strategy performs optimally and consistently based on performance metrics resulting from experiments on real-world datasets and artifacts. This work effectively addresses the issue of non-stationary data normalization using deep learning approaches. To reduce noise and outliers from the data, a recursive version of singular spectrum analysis is introduced. It is demonstrated that the method\u2019s performance has significantly improved by combining adaptive preprocessing with deep learning CPD techniques.

NOTE: See also C. Ma, L. Zhang, W. Pedrycz and W. Lu, “The Long-Term Prediction of Time Series: A Granular Computing-Based Design Approach,” in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 10, pp. 6326-6338, Oct. 2022, doi: 10.1109/TSMC.2022.3144395.

See also https://babel.isa.uma.es/kipr/?p=1548

Example of non-NN approach that produces better results in classification tasks than NNs

Jiang, Zhiying and Yang, Matthew and Tsirlin, Mikhail and Tang, Raphael and Dai, Yiqin and Lin, Jimmy, Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors, . Findings of the Association for Computational Linguistics: ACL 2023 URL.

Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that??s easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets.It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in the few-shot setting, where labeled data are too scarce to train DNNs effectively.

Discrete Q-learning used, along a Deep CNN for localization, for mobile robot navigation

Amirhossein Shantia, Rik Timmers, Yiebo Chong, Cornel Kuiper, Francesco Bidoia, Lambert Schomaker, Marco Wiering, Two-stage visual navigation by deep neural networks and multi-goal reinforcement learning, . Robotics and Autonomous Systems, Volume 138, 2021 DOI: 10.1016/j.robot.2021.103731.

In this paper, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. We train a deep neural network for estimating the robot’s position in the environment using ground truth information provided by a classical localization and mapping approach. The second simpler multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep network. In the experiments, we first compare different architectures to select the best deep network for location estimation, and then compare the effects of the multi-goal reinforcement learning method to traditional reinforcement learning. The results show a significant improvement when multi-goal reinforcement learning is used. Furthermore, the results of the location estimator show that a deep network can learn and generalize in different environments using camera images with high accuracy in both position and orientation.

“Early exit” deep neural networks (i.e., CNN that provide outputs at intermediate points)

Scardapane, S., Scarpiniti, M., Baccarelli, E. et al. , Why Should We Add Early Exits to Neural Networks? . Cogn Comput 12, 954–966 (2020) DOI: 10.1007/s12559-020-09734-4.

Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Recently, some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a number of advantages, including (i) significant reductions of the inference time, (ii) reduced tendency to overfitting and vanishing gradients, and (iii) capability of being distributed over multi-tier computation platforms. In addition, they connect to the wider themes of biological plausibility and layered cognitive reasoning. In this paper, we provide a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios. We also describe in-depth their application scenarios in 5G and Fog computing environments, as long as some of the open research questions connected to them.

Simultaneous localization, mapping and semantic labelling in mobile robots

Taniguchi, Akira, Hagiwara, Yoshinobu, Taniguchi, Tadahiro, Inamura, Tetsunari, Improved and scalable online learning of spatial concepts and language models with mapping, Autonomous Robots 44(6), DOI: 10.1007/s10514-020-09905-0.

We propose a novel online learning algorithm, called SpCoSLAM 2.0, for spatial concepts and lexical acquisition with high accuracy and scalability. Previously, we proposed SpCoSLAM as an online learning algorithm based on unsupervised Bayesian probabilistic model that integrates multimodal place categorization, lexical acquisition, and SLAM. However, our original algorithm had limited estimation accuracy owing to the influence of the early stages of learning, and increased computational complexity with added training data. Therefore, we introduce techniques such as fixed-lag rejuvenation to reduce the calculation time while maintaining an accuracy higher than that of the original algorithm. The results show that, in terms of estimation accuracy, the proposed algorithm exceeds the original algorithm and is comparable to batch learning. In addition, the calculation time of the proposed algorithm does not depend on the amount of training data and becomes constant for each step of the scalable algorithm. Our approach will contribute to the realization of long-term spatial language interactions between humans and robots.