Tag Archives: Useful For Teaching

Reinforcement learning when a human is the one providing the rewards to the algorithm

W. Bradley Knox, Peter Stone, Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance, Artificial Intelligence, Volume 225, August 2015, Pages 24-50, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.03.009.

Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward, this article investigates the problem of learning from human reward through six experiments, focusing on the relationships between reward positivity, which is how generally positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity, whether task learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer intends to teach. This investigation is motivated by the observation that an agent can pursue different learning objectives, leading to different resulting behaviors. We search for learning objectives that lead the agent to behave as the trainer intends.
We identify and empirically support a “positive circuits” problem with low discounting (i.e., high discount factors) for episodic, goal-based tasks that arises from an observed bias among humans towards giving positive reward, resulting in an endorsement of myopic learning for such domains. We then show that converting simple episodic tasks to be non-episodic (i.e., continuing) reduces and in some cases resolves issues present in episodic tasks with generally positive reward and—relatedly—enables highly successful learning with non-myopic valuation in multiple user studies. The primary learning algorithm introduced in this article, which we call “vi-tamer”, is the first algorithm to successfully learn non-myopically from reward generated by a human trainer; we also empirically show that such non-myopic valuation facilitates higher-level understanding of the task. Anticipating the complexity of real-world problems, we perform further studies—one with a failure state added—that compare (1) learning when states are updated asynchronously with local bias—i.e., states quickly reachable from the agent’s current state are updated more often than other states—to (2) learning with the fully synchronous sweeps across each state in the vi-tamer algorithm. With these locally biased updates, we find that the general positivity of human reward creates problems even for continuing tasks, revealing a distinct research challenge for future work.

Heuristic, real-time search with weighted heuristic function

Nicolás Rivera, Jorge A. Baier, Carlos Hernández, Incorporating weights into real-time heuristic search, Artificial Intelligence, Volume 225, August 2015, Pages 1-23, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.03.008.

Multiplying the heuristic function by a weight greater than one is a well-known technique in heuristic search. When this technique is applied to A* with an admissible heuristic it yields substantial runtime savings, at the expense of sacrificing solution optimality. Its applicability to real-time heuristic search, a search approach that builds upon heuristic search, however, has only been explored by a few studies. In this article we present two new approaches to using weights in real-time heuristic search, applicable to a wide range of algorithms. The first one, weighted lookahead, is a variant of an existing approach by Shimbo and Ishida, and utilizes the weight while the algorithm performs lookahead search. The second one, weighted update, incorporates the weight to the edges of the search graph during the learning phase. We implemented both techniques within LSS-LRTA* and evaluated them in path-planning benchmarks. We show that weighted lookahead outperforms an existing approach by Shimbo and Ishida but that it does not improve over existing approaches that do not use weights. Weighted update, on the other hand, yields performance improvements of up to one order of magnitude both in solution cost and total search time. To illustrate further the generality of weighted update, we incorporate the technique in two other well-known real-time heuristic search algorithms: LRTA*-LS and daLSS-LRTA*, and we empirically show significant improvements for LRTA*-LS and modest but still important improvements for daLSS-LRTA*. We analyze the properties of weighted update in depth, showing, among other things, that it guarantees termination. Convergence behavior of LSS-LRTA*, modified to use weighted update, is also analyzed. In such a setting, we prove solutions are w-optimal, and provide additional bounds on solution quality that in practice are tighter than w-optimality.

Analysis of the deterioration of several Kalman Filters depending on the amount of uncertainty in the observations, when the observation model is non-linear

Mark R. Morelande and Ángel F. García-Fernández, Analysis of Kalman Filter Approximations for Nonlinear Measurements, IEEE Transactions on signal processing, vol. 61, no. 22, 2013 DOI: 10.1109/TSP.2013.2279367.

A theoretical analysis is presented of the correction step of the Kalman filter (KF) and its various approximations for the case of a nonlinear measurement equation with additive Gaussian noise. The KF is based on a Gaussian app roximation to the joint density of the state and the measurement. The analysis metric is the Kullback-Leibler divergence of this approximation from the true joint density. The purpose of the analysis is to provide a quantitative tool for understanding and assessing the performance of the KF and its variants in nonlinear scenarios. This is illustrated using a numerical example.

Robot kidnapping detection based on support vector machines

Dylan Campbell, Mark Whitty, Metric-based detection of robot kidnapping with an SVM classifier, Robotics and Autonomous Systems, Volume 69, July 2015, Pages 40-51, ISSN 0921-8890, DOI: 10.1016/j.robot.2014.08.004.

Kidnapping occurs when a robot is unaware that it has not correctly ascertained its position, potentially causing severe map deformation and reducing the robot’s functionality. This paper presents metric-based techniques for real-time kidnap detection, utilising either linear or SVM classifiers to identify all kidnapping events during the autonomous operation of a mobile robot. In contrast, existing techniques either solve specific cases of kidnapping, such as elevator motion, without addressing the general case or remove dependence on local pose estimation entirely, an inefficient and computationally expensive approach. Three metrics that measured the quality of a pose estimate were evaluated and a joint classifier was constructed by combining the most discriminative quality metric with a fourth metric that measured the discrepancy between two independent pose estimates. A multi-class Support Vector Machine classifier was also trained using all four metrics and produced better classification results than the simpler joint classifier, at the cost of requiring a larger training dataset. While metrics specific to 3D point clouds were used, the approach can be generalised to other forms of data, including visual, provided that two independent ways of estimating pose are available.

A nice SLAM approach based on hybrid Normal Distribution Transform (NDT) + occupancy grid maps intended for long term operation in dynamic environments

Erik Einhorn, Horst-Michael Gross, Generic NDT mapping in dynamic environments and its application for lifelong SLAM, Robotics and Autonomous Systems, Volume 69, July 2015, Pages 28-39, ISSN 0921-8890, DOI: 10.1016/j.robot.2014.08.008.

In this paper, we present a new, generic approach for Simultaneous Localization and Mapping (SLAM). First of all, we propose an abstraction of the underlying sensor data using Normal Distribution Transform (NDT) maps that are suitable for making our approach independent from the used sensor and the dimension of the generated maps. We present several modifications for the original NDT mapping to handle free-space measurements explicitly. We additionally describe a method to detect and handle dynamic objects such as moving persons. This enables the usage of the proposed approach in highly dynamic environments. In the second part of this paper we describe our graph-based SLAM approach that is designed for lifelong usage. Therefore, the memory and computational complexity is limited by pruning the pose graph in an appropriate way.

Abstract data-type for exchanging information in real-time systems, prioritizing the access to newest data rather than to oldest

Dantam, N.T.; Lofaro, D.M.; Hereid, A.; Oh, P.Y.; Ames, A.D.; Stilman, M., The Ach Library: A New Framework for Real-Time Communication, Robotics & Automation Magazine, IEEE , vol.22, no.1, pp.76,85, March 2015, DOI: 10.1109/MRA.2014.2356937.

Correct real-time software is vital for robots in safety-critical roles such as service and disaster response. These systems depend on software for locomotion, navigation, manipulation, and even seemingly innocuous tasks such as safely regulating battery voltage. A multiprocess software design increases robustness by isolating errors to a single process, allowing the rest of the system to continue operation. This approach also assists with modularity and concurrency. For real-time tasks, such as dynamic balance and force control of manipulators, it is critical to communicate the latest data sample with minimum latency. There are many communication approaches intended for both general-purpose and real-time needs [9], [13], [15], [17], [19]. Typical methods focus on reliable communication or network transparency and accept a tradeoff of increased message latency or the potential to discard newer data. By focusing instead on the specific case of real-time communication on a single host, we reduce communication latency and guarantee access to the latest sample. We present a new interprocess communication (IPC) library, Ach which addresses this need, and discuss its application for real-time multiprocess control on three humanoid robots (Figure 1). (Ach is available at http://www.golems.org/projects/ach.html. The name Ach comes from the common abbreviation for the motor neurotransmitter Acetylcholine and the computer networking term ACK.).

On the role of emotions in cognition, in particular in cognitive control

Michael Inzlicht, Bruce D. Bartholow, Jacob B. Hirsh, 2015, Emotional foundations of cognitive control, Trends in Cognitive Sciences, Volume 19, Issue 3, March 2015, Pages 126-132, DOI: 10.1016/j.tics.2015.01.004.

Often seen as the paragon of higher cognition, here we suggest that cognitive control is dependent on emotion. Rather than asking whether control is influenced by emotion, we ask whether control itself can be understood as an emotional process. Reviewing converging evidence from cybernetics, animal research, cognitive neuroscience, and social and personality psychology, we suggest that cognitive control is initiated when goal conflicts evoke phasic changes to emotional primitives that both focus attention on the presence of goal conflicts and energize conflict resolution to support goal-directed behavior. Critically, we propose that emotion is not an inert byproduct of conflict but is instrumental in recruiting control. Appreciating the emotional foundations of control leads to testable predictions that can spur future research.

A nice review of the problem of kinematic modeling of wheeled mobile robots and a new approach that delays the use of coordinate frames

Alonzo Kelly and Neal Seegmiller, 2015, Recursive kinematic propagation for wheeled mobile robots, The International Journal of Robotics Research, 34: 288-313, DOI: 10.1177/0278364914551773.

The problem of wheeled mobile robot kinematics is formulated using the transport theorem of vector algebra. Doing so postpones the introduction of coordinates until after the expressions for the relevant Jacobians have been derived. This approach simplifies the derivation while also providing the solution to the general case in 3D, including motion over rolling terrain. Angular velocity remains explicit rather than encoded as the time derivative of a rotation matrix. The equations are derived and can be implemented recursively using a single equation that applies to all cases. Acceleration kinematics are uniquely derivable in reasonable effort. The recursive formulation also leads to efficient computer implementations that reflect the modularity of real mechanisms.

Interesting and gentle introduction to WCET analysis and synchronous design for hard real-time systems

Pascal Raymond, Claire Maiza, Catherine Parent-Vigouroux, Fabienne Carrier, Mihail Asavoae, 2015, Timing analysis enhancement for synchronous program, Real-Time Systems, Volume 51, Issue 2, pp 192-220, DOI: 10.1007/s11241-015-9219-y.

Real-time critical systems can be considered as correct if they compute both right and fast enough. Functionality aspects (computing right) can be addressed using high level design methods, such as the synchronous approach that provides languages, compilers and verification tools. Real-time aspects (computing fast enough) can be addressed with static timing analysis, that aims at discovering safe bounds on the worst-case execution time (WCET) of the binary code. In this paper, we aim at improving the estimated WCET in the case where the binary code comes from a high-level synchronous design. The key idea is that some high-level functional properties may imply that some execution paths of the binary code are actually infeasible, and thus, can be removed from the worst-case candidates. In order to automatize the method, we show (1) how to trace semantic information between the high-level design and the executable code, (2) how to use a model-checker to prove infeasibility of some execution paths, and (3) how to integrate such infeasibility information into an existing timing analysis framework. Based on a realistic example, we show that there is a large possible improvement for a reasonable computation time overhead.

Demonstration that students benefit from using colors while teaching electrical circuit analysis

Reisslein, J.; Johnson, A.M.; Reisslein, M., (2015), Color Coding of Circuit Quantities in Introductory Circuit Analysis Instruction, Education, IEEE Transactions on , vol.58, no.1, pp.7,14, DOI: 10.1109/TE.2014.2312674

Learning the analysis of electrical circuits represented by circuit diagrams is often challenging for novice students. An open research question in electrical circuit analysis instruction is whether color coding of the mathematical symbols (variables) that denote electrical quantities can improve circuit analysis learning. The present study compared two groups of high school students undergoing their first introductory learning of electrical circuit analysis. One group learned with circuit variables in black font. The other group learned with colored circuit variables, with blue font indicating variables related to voltage, red font indicating those related to current, and black font indicating those related to resistance. The color group achieved significantly higher post-test scores, gave higher ratings for liking the instruction and finding it helpful, and had lower ratings of cognitive load than the black-font group. These results indicate that color coding of the notations for quantities in electrical circuit diagrams aids the circuit analysis learning of novice students.