×

FRAMEWORK AND METHODS OF DIVERSE EXPLORATION FOR FAST AND SAFE POLICY IMPROVEMENT

  • US 20190228309A1
  • Filed: 01/24/2019
  • Published: 07/25/2019
  • Est. Priority Date: 01/25/2018
  • Status: Active Grant
First Claim
Patent Images

1. A method of learning and deploying a set of behavior policies for an artificial agent, selected from a set of behavior policies, each having a statistically expected return no worse than a lower bound of policy performance which excludes a portion of the set of behavior policies, comprising iteratively improving a behavior policy for each iteration of policy improvement, employing a diverse exploration strategy which strives for behavior diversity in a space of stochastic policies by deploying a diverse set comprising a plurality of behavior policies which are ensured as being safe during each iteration of policy improvement and assessing performance of the artificial agent.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×