Information providing device and non-transitory computer readable medium storing information providing program

US 9,939,791 B2
Filed: 03/07/2017
Issued: 04/10/2018
Est. Priority Date: 03/11/2016
Status: Active Grant

First Claim

Patent Images

1. An information providing device comprising:

an agent electronic control unit includinga state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states,an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions,a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation unit that is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit, andan information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is equal to or larger than the threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information providing device includes an agent ECU that sets a reward function through the use of history data on a response, from a driver, to an operation proposal for an in-vehicle component, and calculates a probability distribution of performance of each of actions constructing an action space in each of states constructing a state space, through reinforced learning based on the reward function. The agent ECU calculates a dispersion degree of the probability distribution. The agent ECU makes a trial-and-error operation proposal to select a target action from a plurality of candidates and output the target action when the dispersion degree of the probability distribution is equal to or larger than a threshold, and makes a definitive operation proposal to fix and output a target action when the value of the dispersion degree of the probability distribution is smaller than the threshold.

9 Citations

View as Search Results

11 Claims

1. An information providing device comprising:
- an agent electronic control unit includinga state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states,an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions,a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation unit that is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit, andan information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed by the dispersion degree computation unit is equal to or larger than the threshold.
- View Dependent Claims (2, 3)
- - 2. The information providing device according to claim 1, whereinthe reinforced learning unit is configured to set, as the reward function, a frequency of performing the operation of the in-vehicle component through the driver'"'"'s response to the operation proposal for the in-vehicle component, and update the reward function in accordance with a change in an operation history of the operation of the in-vehicle component when the in-vehicle component is operated in accordance with the operation proposal for the in-vehicle component.
  - 3. The information providing device according to claim 1, whereinthe state space construction unit is configured to construct the state space as a set of states as a group of data that associate an operation situation of the in-vehicle component, characteristics of a passenger or passengers of the vehicle and a running situation of the vehicle with one another.

4. An information providing device comprising:
- an agent electronic control unit includinga state space construction unit that is configured to define a state of a vehicle by associating a plurality of types of vehicle data with one another, and construct a state space as a set of a plurality of states,an action space construction unit that is configured to define, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and construct an action space as a set of a plurality of actions,a reinforced learning unit that is configured to accumulate a history of the response, from the driver, to the operation proposal for the in-vehicle component, set a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculate a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation unit that is configured to compute a dispersion degree of the state space by summating the dispersion degree of the probability distribution that is calculated by the reinforced learning unit as to the plurality of the states constructing the state space, andan information providing unit that is configured to make a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the state space that is computed by the dispersion degree computation unit is smaller than a threshold, and make a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the state space that is computed by the dispersion degree computation unit is equal to or larger than the threshold.
- View Dependent Claims (5, 6, 7, 8, 9)
- - 5. The information providing device according to claim 4, whereinthe reinforced learning unit is configured to adopt, as a policy, mapping from each of the states constructing the state space to each of the actions constructing the action space, set, as a state value function, an expected value of a cumulative reward that is obtained when the policy is followed in each of the states, estimate, as an optimal action value function, an expected value of a cumulative reward that is always obtained when an optimal policy is followed after a predetermined action is selected from the action space in each of the states constructing the state space on an assumption that the optimal policy is the policy that maximizes the state value function in all the states constructing the state space, and calculate the probability distribution based on the estimated optimal action value function, andthe information providing unit is configured to make the definitive operation proposal targeting an action that maximizes the optimal action value function in a present state, when the dispersion degree of the state space that is computed by the dispersion degree computation unit is smaller than the threshold.
  - 6. The information providing device according to claim 5, whereinthe information providing unit is configured to make the trial-and-error operation proposal with such a tendency as to enhance a frequency of selecting an action as a target as a probability density of the probability distribution of the action in the present state rises, when the dispersion degree of the state space that is computed by the dispersion degree computation unit is equal to or larger than the threshold.
  - 7. The information providing device according to claim 5, whereinthe dispersion degree computation unit is configured to define, as an entropy, the dispersion degree of the probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, and define the dispersion degree of the state space as an average entropy, andthe information providing unit is configured to select the definitive operation proposal or the trial-and-error operation proposal with such a tendency as to enhance a frequency of making the trial-and-error operation proposal as an ϵ
    - -value increases, while using an ϵ
      
      -greedy method in which a value of the average entropy is set as the ϵ
      
      -value.
  - 8. The information providing device according to claim 4, whereinthe reinforced learning unit is configured to set, as the reward function, a frequency of performing the operation of the in-vehicle component through the driver'"'"'s response to the operation proposal for the in-vehicle component, and update the reward function in accordance with a change in an operation history of the operation of the in-vehicle component when the in-vehicle component is operated in accordance with the operation proposal for the in-vehicle component.
  - 9. The information providing device according to claim 4, whereinthe state space construction unit is configured to construct the state space as a set of states as a group of data that associate an operation situation of the in-vehicle component, characteristics of a passenger or passengers of the vehicle and a running situation of the vehicle with one another.

10. A non-transitory computer readable medium that stores an information providing program, comprising:
- the information providing program that is programmed to cause a computer to realizea state space construction function of defining a state of a vehicle by associating a plurality of types of vehicle data with one another, and constructing a state space as a set of a plurality of states,an action space construction function of defining, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and constructing an action space as a set of a plurality of actions,a reinforced learning function of accumulating a history of the response, from the driver, to the operation proposal for the in-vehicle component, setting a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculating a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation function of computing a dispersion degree of the probability distribution that is calculated through the reinforced learning function, andan information providing function of making a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the probability distribution that is computed through the dispersion degree computation function is smaller than a threshold, and making a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the probability distribution that is computed through the dispersion degree computation function is equal to or larger than the threshold.

11. A non-transitory computer readable medium that stores an information providing program, comprising:
- the information providing program that is programmed to cause a computer to realizea state space construction function of defining a state of a vehicle by associating a plurality of types of vehicle data with one another, and constructing a state space as a set of a plurality of states,an action space construction function of defining, as an action, data indicating contents of an operation of an in-vehicle component that is performed through a response, from a driver, to an operation proposal for the in-vehicle component, and constructing an action space as a set of a plurality of actions,a reinforced learning function of accumulating a history of the response, from the driver, to the operation proposal for the in-vehicle component, setting a reward function as an index representing an appropriateness degree of the operation proposal for the in-vehicle component while using the accumulated history, and calculating a probability distribution of performance of each of the actions constructing the action space in each of the states constructing the state space, through reinforced learning based on the reward function,a dispersion degree computation function of computing a dispersion degree of the state space by summating the dispersion degree of the probability distribution that is calculated through the reinforced learning function as to the plurality of the states constructing the state space, andan information providing function of making a definitive operation proposal to fix a target action as a target of the operation proposal and output the target action when the dispersion degree of the state space that is computed through the dispersion degree computation function is smaller than a threshold, and making a trial-and-error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and output the target action when the dispersion degree of the state space that is computed through the dispersion degree computation function is equal to or larger than the threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Toyota Jidosha Kabushiki Kaisha (Toyota Motor Corporation)
Original Assignee
Toyota Jidosha Kabushiki Kaisha (Toyota Motor Corporation)
Inventors
Koga, Ko
Primary Examiner(s)
Shaawat, Mussa A

Application Number

US15/452,106
Publication Number

US 20170261947A1
Time in Patent Office

399 Days
Field of Search
US Class Current
CPC Class Codes

B60R 16/0373   Voice control in general G10L

B60W 40/09   Driving style or behaviour

B60W 50/10   Interpretation of driver re...

G05B 13/0265   the criterion being a learn...

G06N 5/00   Computing arrangements usin...

G06N 5/04   Inference or reasoning models

Information providing device and non-transitory computer readable medium storing information providing program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Information providing device and non-transitory computer readable medium storing information providing program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others