Estimating the bandwidth of a communication channel for adjusting the bitrate in high-definition video streaming, using Pareto and Gamma distributions (that are conjugate) in a bayesian estimation framework

Javadtalab, A.; Semsarzadeh, M.; Khanchi, A.; Shirmohammadi, S.; Yassine, A., Continuous One-Way Detection of Available Bandwidth Changes for Video Streaming Over Best-Effort Networks, Instrumentation and Measurement, IEEE Transactions on , vol.64, no.1, pp.190,203, Jan. 2015. DOI: 10.1109/TIM.2014.2331423

Video streaming over best-effort networks, such as the Internet, is now a significant application used by most Internet users. However, best-effort networks are characterized by dynamic and unpredictable changes in the available bandwidth, which adversely affect the quality of video. As such, it is important to have real-time detection mechanisms of bandwidth changes to ensure that video is adapted to the available bandwidth and transmitted at the highest quality. In this paper, we propose a Bayesian instantaneous end-to-end bandwidth change prediction model and method to detect and predict one-way bandwidth changes at the receiver. Unlike existing congestion detection mechanisms, which use network parameters such as packet loss probability, round trip time (RTT), or jitter, our approach uses weighted interarrival time of video packets at the receiver side. Furthermore, our approach is continuous, since it measures available bandwidth changes with each incoming video packet, and therefore detects congestion occurrence in <200 ms, on average, which is significantly faster than existing approaches. In addition, it is a one-way scheme, since it only takes into account the characteristics of the incoming path and not the outgoing path, as opposed to other approaches, which use RTT and are hence less accurate. In this paper, we provide extensive experimental simulations and real-world network implementation. Our results indicate that the proposed detection method is superior to existing solutions.

Good related work of graph-based SLAM algorithms that employ some reduction technique on the graph to improve long-term operation, and proposal of a new method of reduction

Carlevaris-Bianco, N.; Kaess, M.; Eustice, R.M., Generic Node Removal for Factor-Graph SLAM, Robotics, IEEE Transactions on , vol.30, no.6, pp.1371,1385, Dec. 2014. DOI: 10.1109/TRO.2014.2347571

This paper reports on a generic factor-based method for node removal in factor-graph simultaneous localization and mapping (SLAM), which we call generic linear constraints (GLCs). The need for a generic node removal tool is motivated by long-term SLAM applications, whereby nodes are removed in order to control the computational cost of graph optimization. GLC is able to produce a new set of linearized factors over the elimination clique that can represent either the true marginalization (i.e., dense GLC) or a sparse approximation of the true marginalization using a Chow-Liu tree (i.e., sparse GLC). The proposed algorithm improves upon commonly used methods in two key ways: First, it is not limited to graphs with strictly full-state relative-pose factors and works equally well with other low-rank factors, such as those produced by monocular vision. Second, the new factors are produced in such a way that accounts for measurement correlation, which is a problem encountered in other methods that rely strictly upon pairwise measurement composition. We evaluate the proposed method over multiple real-world SLAM graphs and show that it outperforms other recently proposed methods in terms of Kullback–Leibler divergence. Additionally, we experimentally demonstrate that the proposed GLC method provides a principled and flexible tool to control the computational complexity of long-term graph SLAM, with results shown for ${34.9}, {rm {h}}$ of real-world indoor–outdoor data covering ${147.4}{hbox{ km}}$ collected over $27$ mapping sessions spanning a period of $15$ months.

A good summary and classification of state-of-the-art motion planning algorithms and proposal of a new one that improve the expected computational cost

Rickert, M.; Sieverling, A.; Brock, O., Balancing Exploration and Exploitation in Sampling-Based Motion Planning, Robotics, IEEE Transactions on , vol.30, no.6, pp.1305,1317, Dec. 2014. DOI: 10.1109/TRO.2014.2340191

We present the exploring/exploiting tree (EET) algorithm for motion planning. The EET planner deliberately trades probabilistic completeness for computational efficiency. This tradeoff enables the EET planner to outperform state-of-the-art sampling-based planners by up to three orders of magnitude. We show that these considerable speedups apply for a variety of challenging real-world motion planning problems. The performance improvements are achieved by leveraging work space information to continuously adjust the sampling behavior of the planner. When the available information captures the planning problem’s inherent structure, the planner’s sampler becomes increasingly exploitative. When the available information is less accurate, the planner automatically compensates by increasing local configuration space exploration. We show that active balancing of exploration and exploitation based on workspace information can be a key ingredient to enabling highly efficient motion planning in practical scenarios.

SLAM as a least-squares optimization problem and reduction of the cost through the use of spherical covariance matrices that approximate the original, sparse ones

Heng Wang, Shoudong Huang, Kasra Khosoussi, Udo Frese, Gamini Dissanayake, Bingbing Liu, Dimensionality reduction for point feature SLAM problems with spherical covariance matrices, Automatica, Volume 51, January 2015, Pages 149-157, ISSN 0005-1098. DOI: 10.1016/j.automatica.2014.10.114

The main contribution of this paper is the dimensionality reduction for multiple-step 2D point feature based Simultaneous Localization and Mapping (SLAM), which is an extension of our previous work on one-step SLAM (Wang et al., 2013). It has been proved that SLAM with multiple robot poses and a number of point feature positions as variables is equivalent to an optimization problem with only the robot orientations as variables, when the associated uncertainties can be described using spherical covariance matrices. This reduces the dimension of original problem from 3 m + 2 n to m only (where m is the number of poses and n is the number of features). The optimization problem after dimensionality reduction can be solved numerically using the unconstrained optimization algorithms. While dimensionality reduction may not provide computational saving for all nonlinear optimization problems, for some SLAM problems we can achieve benefits such as improvement on time consumption and convergence. For the special case of two-step SLAM when the orientation information from odometry is not incorporated, an algorithm that can guarantee to obtain the globally optimal solution (in the maximum likelihood sense) is derived. Simulation and experimental datasets are used to verify the equivalence between the reduced nonlinear optimization problem and the original full optimization problem, as well as the proposed new algorithm for obtaining the globally optimal solution for two-step SLAM.

Reinforcement learning for discovering the parameters of the physical model of a system

S.P. Nageshrao, G.A.D. Lopes, D. Jeltsema, R. Babuška, Passivity-based reinforcement learning control of a 2-DOF manipulator arm, Mechatronics, Volume 24, Issue 8, December 2014, Pages 1001-1007, ISSN 0957-4158, DOI: 10.1016/j.mechatronics.2014.10.005.

Passivity-based control (PBC) is commonly used for the stabilization of port-Hamiltonian (PH) systems. The PH framework is suitable for multi-domain systems, for example mechatronic devices or micro-electro-mechanical systems. Passivity-based control synthesis for PH systems involves solving partial differential equations, which can be cumbersome. Rather than explicitly solving these equations, in our approach the control law is parameterized and the unknown parameter vector is learned using an actor\u2013critic reinforcement learning algorithm. The key advantages of combining learning with PBC are: (i) the complexity of the control design procedure is reduced, (ii) prior knowledge about the system, given in the form of a PH model, speeds up the learning process, (iii) physical meaning can be attributed to the learned control law. In this paper we extended the learning-based PBC method to a regulation problem and present the experimental results for a two-degree-of-freedom manipulator. We show that the learning algorithm is capable of achieving feedback regulation in the presence of model uncertainties.

A reinforcement learning controller to tune sub-controllers

Kevin Van Vaerenbergh, Peter Vrancx, Yann-Michaël De Hauwere, Ann Nowé, Erik Hostens, Christophe Lauwerys, Tuning hydrostatic two-output drive-train controllers using reinforcement learning, Mechatronics, Volume 24, Issue 8, December 2014, Pages 975-985, ISSN 0957-4158. DOI: 10.1016/j.mechatronics.2014.07.005

When controlling a complex system consisting of several subsystems, a simple divide and conquer approach is to design a controller for each system separately. However, this does not necessarily result in a good overall control behavior. Especially when there are strong interactions between the subsystems, the selfish behavior of one controller might deteriorate the performance of the other subsystems. An alternative approach is to design a global controller for the entire mechatronic system. Such a design procedure might result in more optimal behavior, however it requires a lot more effort, especially when the interactions between the different subsystems cannot be modeled exactly or if the number of parameters is large.
In this paper we present a hybrid approach to this problem that overcomes the problems encountered when using several independent subsystems. Starting from such a system with individual subsystem controllers, we add a global layer which uses reinforcement learning to simultaneously tune the lower level controllers. While each subsystem still has its own individual controller, the reinforcement learning layer is used to tune these controllers in order to optimize global system behavior. This mitigates both the problem of subsystems behaving selfishly without the added complexity of designing a global controller for the entire system. Our approach is validated on a hydrostatic drive train.

A new variant of Q-learning that alleviates its slow learning speed (with a brief review of reinforcement learning algorithms)

J.C. van Rooijen, I. Grondman, R. Babuška, Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy, Mechatronics, Volume 24, Issue 8, December 2014, Pages 966-974, ISSN 0957-4158. DOI: 10.1016/j.mechatronics.2014.05.007

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune.
In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator.

Estimating states of a human teleoperator and studying their influence in performing control

Yunyi Jia, Ning Xi, Shuang Liu, Yunxia Wang, Xin Li, and Sheng Bi, Quality of teleoperator adaptive control for telerobotic operations The International Journal of Robotics Research December 2014 33: 1765-1781, first published on November 13, 2014. DOI: 10.1177/0278364914556124

Extensive studies have been conducted on telerobotic operations for decades due to their widespread applications in a variety of areas. Most studies have been focused on two major issues: stability and telepresence. Few have studied the influence of the operation status of the teleoperator on the performance of telerobotic operations. As subnormal operation status of the teleoperator may result in insufficient and even incorrect operations, the quality of teleoperator (QoT) is an important impact on the performance of the telerobotic operations in terms of the efficiency and safety even if both the stability and telepresence are guaranteed. Therefore, this paper investigates the online identification of the QoT and its application to telerobotic operations. The QoT is identified based on five QoT indicators which are generated based on the teleoperator’s brain EEG signals. A QoT adaptive control method is designed to adapt the velocity and responsivity of the robotic system to the operation status of the teleoperator such that the teleoperation efficiency and safety can be enhanced. The online QoT identification method was conducted on various teleoperators and the QoT adaptive control method was implemented on a mobile manipulator teleoperation system. The experimental results demonstrated the effectiveness and advantages of the proposed methods.

Probabilistic models of several sensors plus a method for distinguishing the different hypotheses from the posterior of a PF

V. Alvarez-Santos, A. Canedo-Rodriguez, R. Iglesias, X.M. Pardo, C.V. Regueiro, M. Fernandez-Delgado, Route learning and reproduction in a tour-guide robot, Robotics and Autonomous Systems, Volume 63, Part 2, January 2015, Pages 206-213, ISSN 0921-8890. DOI: 10.1016/j.robot.2014.07.013

Traditionally, route information is introduced in tour-guide robots by experts in robotics. In the tour-guide robot that we are developing, we allow the robot to learn new routes while following an instructor. In this paper we describe the route recording process that takes place while following a human, as well as, how those routes are later reproduced.

A key element of both route recording and reproduction is a robust multi-sensorial localization algorithm that we have designed, which is able to combine various sources of information to obtain an estimate of the robot’s pose. In this work we detail how the algorithm works, and how we use it to record routes. Moreover, we describe how our robot reproduces routes, including path planning within route points, and dynamic obstacle avoidance for safe navigation. Finally, we show through several trajectories how the robot was able to learn and reproduce different routes.