Information providing device and non-transitory computer readable medium storing information providing program
First Claim
1. An information providing device comprising:
- an agent electronic control unit includinga state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states,an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions,a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation unit that is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit, andan information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is equal to or larger than the threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
An information providing device includes an agent ECU that sets a reward function through the use of history data on a response, from a driver, to an operation proposal for an in-vehicle component, and calculates a probability distribution of performance of each of actions constructing an action space in each of states constructing a state space, through reinforced learning based on the reward function. The agent ECU calculates a dispersion degree of the probability distribution. The agent ECU makes a trial-and-error operation proposal to select a target action from a plurality of candidates and output the target action when the dispersion degree of the probability distribution is equal to or larger than a threshold, and makes a definitive operation proposal to fix and output a target action when the value of the dispersion degree of the probability distribution is smaller than the threshold.
9 Citations
11 Claims
-
1. An information providing device comprising:
-
an agent electronic control unit including a state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states, an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions, a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function, a dispersion degree computation unit that is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit, and an information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is equal to or larger than the threshold. - View Dependent Claims (2, 3)
-
-
4. An information providing device comprising:
-
an agent electronic control unit including a state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states, an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions, a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function, a dispersion degree computation unit that is configured to compute a dispersion degree of the state space by summating the dispersion degree of the probability distribution that is calculated by the reinforced learning unit as to the plurality of the states constructing the state space, and an information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the state space that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the state space that is computed by the dispersion degree computation unit is equal to or larger than the threshold. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable medium that stores an information providing program, comprising:
-
the information providing program that is programmed to cause a computer to realize a state space construction function of defining a state of a vehicle by associating a plurality of types of vehicle data with one another, and constructing a state space as a set of a plurality of states, an action space construction function of defining, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and constructing an action space as a set of a plurality of actions, a reinforced learning function of accumulating a history of the response, from the driver, to the operation proposal for the in-vehicle component, setting a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculating a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function, a dispersion degree computation function of computing a dispersion degree of the probability distribution that is calculated through the reinforced learning function, and an information providing function of making a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed through the dispersion degree computation function is smaller than a threshold, and making a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed through the dispersion degree computation function is equal to or larger than the threshold.
-
-
11. A non-transitory computer readable medium that stores an information providing program, comprising:
-
the information providing program that is programmed to cause a computer to realize a state space construction function of defining a state of a vehicle by associating a plurality of types of vehicle data with one another, and constructing a state space as a set of a plurality of states, an action space construction function of defining, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and constructing an action space as a set of a plurality of actions, a reinforced learning function of accumulating a history of the response, from the driver, to the operation proposal for the in-vehicle component, setting a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculating a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function, a dispersion degree computation function of computing a dispersion degree of the state space by summating the dispersion degree of the probability distribution that is calculated through the reinforced learning function as to the plurality of the states constructing the state space, and an information providing function of making a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the state space that is computed through the dispersion degree computation function is smaller than a threshold, and making a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the state space that is computed through the dispersion degree computation function is equal to or larger than the threshold.
-
Specification