Online Structure, Parameter, and Utility Updating of Bayesian Decision Networks for Decision-Theoretic Agents
Authors: Iheanyi Umez-Eronini, F. Sahin (Rochester Institute of Technology)
Abstract
The design of planners for agents, such as robots, typically must tackle two problems. First, a set of rules or function mapping agent/environment states to agent actions must be determined ahead of time, usually utilizing a small set of sample data from the agent's actual environment, sensors, and effectors, or from simulation data. Second, when the agent is actually in operation, it needs to be able to update its rules or state-action mapping to account for situations that were not encountered in pervious data or modeled in simulation. In the simplest implementation, rule based planners utilize prior knowledge to generate decisions for significant potential agent states and behaviors to handle all other states. In traditional learning methods, prior knowledge is used to select the more significant samples in the search space. Since searching the entire space of possible agent states is a computationally intractable problem in all but the most trivial of cases, prior knowledge is essential for developing agent planners. However, the use of prior knowledge does not guarantee optimal behavior and rules based on prior knowledge break down when the environment changes or an assumption turns out to be invalid.
Rule based planners can handle incorrect assumptions or unanticipated data by incorporating some randomness in their response, however, this leads to sub-optimal or even potentially bad decision making by the agent. Learning algorithms suffer from the same problem of incomplete prior data though reinforcement learning algorithms, such as Q-Learning are able to adapt to a changing environment or unplanned states, however, because of its slow speed, Q-Learning does not provided significant additional value in real-time applications.
Bayesian networks (BNs) have attracted a lot of attention in the realm of decision making because the graph representations of these networks encode conditional dependencies (and independencies) and causal links between variables. Causal links encode temporal information as well, allowing decisions to be made in the proper order. Since the graph encodes the dependencies among all variables, this model readily handles situations where some data entries are missing or unknown. Influence diagrams are made up of three types of nodes, chance, decision, and utility. Chance nodes contain the possible states of a particular variable, or the observed state of that variable. Decision nodes contain sets of actions; a causally linked set of decision nodes form plans. Utility nodes define a utility function, giving a value for each possible action, given the set of observations. Maximizing (or minimizing) utility over a set of decisions allows one to determine the best or optimal decision given current (an previous) knowledge.
Given a particular structure, there are algorithms to learn the network parameters (conditional probability distributions among variables) from incomplete or complete data. There are also algorithms to update these parameters as new data is collected - allowing the network to better model the dependencies between variables. However, the number of possible network representations for a given set of nodes is exponential in the number of nodes. As complexity increases, it may not be possible, even for a domain expert, to specify the exact dependencies between variables. There are algorithms available to learn the structure of a network from data which utilizes various heuristics to minimize the search space and score different network representations. These methods allow learning a complex structure in an acceptable amount of time, even without having full knowledge of dependencies among variables. By utilizing parameter learning algorithms, the network is able to improve the model from the limited data available for learning.
