INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
First Claim
1. An information processing device comprising:
- calculating means configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained byperforming learning of said state transition probability model stipulated bya state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, andan observation probability that a predetermined observation value will be observed from said state,usingan action performed by said agent, andan observation value observed at said agent when said agent performs an action; and
determining means configured to determine an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing device includes: a calculating unit configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of the state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from the state, using an action performed by the agent, and an observation value observed at the agent when the agent performs an action; and a determining unit configured to determine an action to be performed next by the agent using the current-state series candidate in accordance with a predetermined strategy.
-
Citations
16 Claims
-
1. An information processing device comprising:
-
calculating means configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of said state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from said state, using an action performed by said agent, and an observation value observed at said agent when said agent performs an action; and determining means configured to determine an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An information processing method comprising the steps of:
-
calculating of a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of said state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from said state, using an action performed by said agent, and an observation value observed at said agent when said agent performs an action; and determining an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy.
-
-
14. A program causing a computer serving as:
-
calculating means configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of said state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from said state, using an action performed by said agent, and an observation value observed at said agent when said agent performs an action; and determining means configured to determine an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy.
-
-
15. An information processing device comprising:
-
a calculating unit configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of said state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from said state, using an action performed by said agent, and an observation value observed at said agent when said agent performs an action; and a determining unit configured to determine an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy.
-
-
16. A program causing a computer serving as:
-
a calculating unit configured to calculate a current-state series candidate that is a state series for an agent capable of actions reaching the current state, based on a state transition probability model obtained by performing learning of said state transition probability model stipulated by a state transition probability that a state will be transitioned according to each of actions performed by an agent capable of actions, and an observation probability that a predetermined observation value will be observed from said state, using an action performed by said agent, and an observation value observed at said agent when said agent performs an action; and a determining unit configured to determine an action to be performed next by said agent using said current-state series candidate in accordance with a predetermined strategy.
-
Specification