Controller for partially observable systems

US 7,966,276 B2
Filed: 07/10/2007
Issued: 06/21/2011
Est. Priority Date: 07/13/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A controller, for controlling a system on the basis of measurement data received from a plurality of sensors indicative of a state of the system, wherein the controller comprises:

a system model and corresponding measurement models for the plurality of sensors of the system;

a stochastic estimator for receiving measurement data from the plurality of sensors and for generating, with reference to the system model, a plurality of samples each representative of the state of the system;

a rule set corresponding to the system model, defining, for each of a plurality of possible samples representing possible states of the system, information defining an action to be carried out in the system; and

an action selector, for receiving an output of the stochastic estimator and for selecting, with reference to the rule set, information defining one or more corresponding actions to be performed in the system;

wherein the controller is configured to;

(i) construct an initial partially observed Markov decision process (POMDP) model representing the dynamics of the system to be controlled, wherein the POMDP model comprises a representation of the states of the system, a measurement model, one or more control actions, and measures of benefit likely to arise from the selection of particular control actions;

(ii) transform the initial POMDP model into a subsidiary Markov decision process (MDP) model, comprising generating a sample state space representation for the subsidiary model, and generating an initial probabilistic system model and control rule set using the sample state representation; and

(iii) use observations of the system and of the environment by a plurality of sensors to update the control rule set and the probabilistic system model of the subsidiary MDP based upon the observed effects of selected control actions and with reference to the measures of the benefit system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A controller is provided, operable to control a system on the basis of measurement data received from a plurality of sensors indicative of a state of the system, with at least partial autonomy, but in environments in which it is not possible to fully determine the state of the system on the basis of such sensor measurement data. The controller, includes: a system model, defining at least a set of probabilities for the dynamical evolution of the system and corresponding measurement models for the plurality of sensors of the system; a stochastic estimator operable to receive measurement data from the sensors and, with reference to the system model, to generate a plurality of samples each representative of the state of the system; a rule set corresponding to the system model, defining, for each of a plurality of possible samples representing possible states of the system, information defining an action to be carried out in the system; and an action selector, operable to receive an output of the stochastic estimator and to select, with reference to the rule set, information defining one or more corresponding actions to be performed in the system.

Citations

9 Claims

1. A controller, for controlling a system on the basis of measurement data received from a plurality of sensors indicative of a state of the system, wherein the controller comprises:
- a system model and corresponding measurement models for the plurality of sensors of the system;
  
  a stochastic estimator for receiving measurement data from the plurality of sensors and for generating, with reference to the system model, a plurality of samples each representative of the state of the system;
  
  a rule set corresponding to the system model, defining, for each of a plurality of possible samples representing possible states of the system, information defining an action to be carried out in the system; and
  
  an action selector, for receiving an output of the stochastic estimator and for selecting, with reference to the rule set, information defining one or more corresponding actions to be performed in the system;
  
  wherein the controller is configured to;
  
  (i) construct an initial partially observed Markov decision process (POMDP) model representing the dynamics of the system to be controlled, wherein the POMDP model comprises a representation of the states of the system, a measurement model, one or more control actions, and measures of benefit likely to arise from the selection of particular control actions;
  
  (ii) transform the initial POMDP model into a subsidiary Markov decision process (MDP) model, comprising generating a sample state space representation for the subsidiary model, and generating an initial probabilistic system model and control rule set using the sample state representation; and
  
  (iii) use observations of the system and of the environment by a plurality of sensors to update the control rule set and the probabilistic system model of the subsidiary MDP based upon the observed effects of selected control actions and with reference to the measures of the benefit system.

2. A method for controlling a system that enables autonomous operation of the system in an environment in which selected control actions have uncertain consequences, the method comprising:
- (i) constructing an initial partially observed Markov decision process (POMDP) model representing the dynamics of the system to be controlled, wherein the POMDP model includes a representation of the states of the system, a measurement model, one or more control actions, and measures of a benefit likely to arise from the selection of particular control actions;
  
  (ii) transforming the initial POMDP model into a subsidiary Markov decision process (MDP) model, including generating a sample state space representation for the subsidiary model, and generating an initial probabilistic system model and control rule set using the sample state representation; and
  
  (iii) using observations of the system and of the environment by a plurality of sensors to update the control rule set and the probabilistic system model of the subsidiary MDP based upon the observed effects of selected control actions and with reference to the measures of the benefit system.
- View Dependent Claims (3, 4, 5)
- - 3. The method according to claim 2, wherein operation (ii) includes:
    - identifying a set of state subspaces, with reference to the measurement model;
      
      choosing a number of samples sufficient to represent uncertainties in each subspace and, by statistical sampling and the application of a particle filter, generating a sample state space representation specific to a predetermined control problem;
      
      calculating conditional sample-state transition probabilities by taking combinatorial products of the state transition probabilities from the initial POMDP model, renormalized as required, and generating a dynamical model based upon the sample state representation;
      
      formulating one or more mean reward functions in terms of the sample state representation and combining the one or more functions with the state space representation and the dynamical model to redefine the predetermined control problem as an MDP subsidiary to the initial POMDP model; and
      
      solving the subsidiary MDP using an MDP solution algorithm to obtain an initial optimal control solution for the system, and tabulating this solution to create the control rule set for the system.
  - 4. The method according to claim 3, wherein operation (iii) includes:
    - constructing a recursive estimate of one or more states of the system given previous observations, and a corresponding recursive estimate of a net return from the benefits of future control actions;
      
      identifying, based on the current sample state and the control rule set, one or more control actions that, once taken, impact the temporal evolution of the system states;
      
      receiving an immediate reward, based upon a chosen one of the one or more control actions and the resulting state of the system, that represents the benefit of the chosen action, and an expected net return, calculated from the range of possible immediate rewards, representing an overall performance of the control system; and
      
      iteratively updating the control rule set and the probabilistic system model on the basis of the observed measurements, the received rewards, and the net return, to thereby refine the basis for future control actions in response to changes in the system and its environment.
  - 5. The method according to claim 2, wherein operation (iii) includes:
    - constructing a recursive estimate of one or more states of the system given previous observations, and a corresponding recursive estimate of a net return from the benefits of future control actions;
      
      identifying, based on the current sample state and the control rule set, one or more control actions that, once taken, impact the temporal evolution of the system states;
      
      receiving an immediate reward, based upon a chosen one of the one or more control actions and the resulting state of the system, that represents the benefit of the chosen action, and an expected net return, calculated from the range of possible immediate rewards, representing an overall performance of the control system; and
      
      iteratively updating the control rule set and the probabilistic system model on the basis of the observed measurements, the received rewards, and the net return, to thereby refine the basis for future control actions in response to changes in the system and its environment.

6. A computer readable medium having a computer program, which is executable by a computer, comprising:
- a program code arrangement having program code for controlling a system that enables autonomous operation of the system in an environment in which selected control actions have uncertain consequences, by performing the following;
  
  (i) constructing an initial partially observed Markov decision process(POMDP) model representing the dynamics of the system to be controlled, wherein the POMDP model includes a representation of the states of the system, a measurement model, one or more control actions, and measures of a benefit likely to arise from the selection of particular control actions;
  
  (ii) transforming the initial POMDP model into a subsidiary Markov decision process (MDP) model, including generating a sample state space representation for the subsidiary model, and generating an initial probabilistic system model and control rule set using the sample state representation; and
  
  (iii) using observations of the system and of the environment by a plurality of sensors to update the control rule set and the probabilistic system model of the subsidiary MDP based upon the observed effects of selected control actions and with reference to the measures of the benefit system.
- View Dependent Claims (7, 8, 9)
- - 7. The computer readable medium according to claim 6, wherein operation (ii) includes:
    - identifying a set of state subspaces, with reference to the measurement model;
      
      choosing a number of samples sufficient to represent uncertainties in each subspace and, by statistical sampling and the application of a particle filter, generating a sample state space representation specific to a predetermined control problem;
      
      calculating conditional sample-state transition probabilities by taking combinatorial products of the state transition probabilities from the initial POMDP model, renormalized as required, and generating a dynamical model based upon the sample state representation;
      
      formulating one or more mean reward functions in terms of the sample state representation and combining the one or more functions with the state space representation and the dynamical model to redefine the predetermined control problem as an MDP subsidiary to the initial POMDP model; and
      
      solving the subsidiary MDP using an MDP solution algorithm to obtain an initial optimal control solution for the system, and tabulating this solution to create the control rule set for the system.
  - 8. The computer readable medium according to claim 7, wherein operation (iii) includes:
    - constructing a recursive estimate of one or more states of the system given previous observations, and a corresponding recursive estimate of a net return from the benefits of future control actions;
      
      identifying, based on the current sample state and the control rule set, one or more control actions that, once taken, impact the temporal evolution of the system states;
      
      receiving an immediate reward, based upon a chosen one of the one or more control actions and the resulting state of the system, that represents the benefit of the chosen action, and an expected net return, calculated from the range of possible immediate rewards, representing an overall performance of the control system; and
      
      iteratively updating the control rule set and the probabilistic system model on the basis of the observed measurements, the received rewards, and the net return, to thereby refine the basis for future control actions in response to changes in the system and its environment.
  - 9. The computer readable medium according to claim 6, wherein operation (iii) includes:
    - constructing a recursive estimate of one or more states of the system given previous observations, and a corresponding recursive estimate of a net return from the benefits of future control actions;
      
      identifying, based on the current sample state and the control rule set, one or more control actions that, once taken, impact the temporal evolution of the system states;
      
      receiving an immediate reward, based upon a chosen one of the one or more control actions and the resulting state of the system, that represents the benefit of the chosen action, and an expected net return, calculated from the range of possible immediate rewards, representing an overall performance of the control system; and
      
      iteratively updating the control rule set and the probabilistic system model on the basis of the observed measurements, the received rewards, and the net return, to thereby refine the basis for future control actions in response to changes in the system and its environment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
BAE Systems Plc
Original Assignee
BAE Systems Plc
Inventors
Cade, Neil Alexander
Primary Examiner(s)
BARNES-BULLOCK, CRYSTAL JOY

Application Number

US11/884,565
Publication Number

US 20090299496A1
Time in Patent Office

1,442 Days
Field of Search

700/28, 700/29, 700/44, 700/45, 700/108, 700/47, 700/49, 700/104, 702182-185, 702/189, 703/2, 706/10, 706/14, 706/25, 706 45- 48, 706 59- 61, 714/26
US Class Current

706/47
CPC Class Codes

G05B 13/024 in which a parameter or coe...

G05B 17/02 electric

Controller for partially observable systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Controller for partially observable systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links