METHOD OF UPDATING POLICY FOR CONTROLLING ACTION OF ROBOT AND ELECTRONIC DEVICE PERFORMING THE METHOD

US 20200134505A1
Filed: 06/13/2019
Published: 04/30/2020
Est. Priority Date: 10/30/2018
Status: Active Grant

First Claim

Patent Images

1. A method of updating a policy associated with controlling an action of a robot, the method comprising:

receiving a plurality of learning datasets generated by a plurality of heterogeneous agents;

generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents; and

updating the policy associated with controlling the action of the robot based on the weighted learning database to generate an updated policy.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A tendency of an action of a robot may vary based on learning data used for training. The learning data may be generated by an agent performing an identical or similar task to a task of the robot. An apparatus and method for updating a policy for controlling an action of a robot may update the policy of the robot using a plurality of learning data sets generated by a plurality of heterogeneous agents, such that the robot may appropriately act even in an unpredicted environment.

4 Citations

View as Search Results

19 Claims

1. A method of updating a policy associated with controlling an action of a robot, the method comprising:
- receiving a plurality of learning datasets generated by a plurality of heterogeneous agents;
  
  generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents; and
  
  updating the policy associated with controlling the action of the robot based on the weighted learning database to generate an updated policy.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein a first agent of the plurality of heterogeneous agents is configured to generate a first learning dataset of the plurality of learning datasets such that the first learning dataset includes a plurality of learning data items including a current state, the action, and a reward, the current state including information on a surrounding environment of the first agent measured by the first agent, the action being performed by the first agent for the current state, and the reward being an assessment value of the action.
  - 3. The method of claim 1, wherein the plurality of learning datasets include a first learning dataset generated by a first agent of the plurality of heterogeneous agents and a second learning dataset generated by a second agent of the plurality of heterogeneous agents, and the weight sets include a first weight set associated with the first agent and a second weight set associated with the second agent, and the generating the weighted learning database comprises:
    - generating at least one first weighted learning data item based on the first learning dataset and the first weight set;
      
      generating at least one second weighted learning data item based on the second learning dataset and the second weight set; and
      
      generating the weighted learning database including the first weighted learning data item and the second weighted learning data item.
  - 4. The method of claim 3, wherein the generating of the first weighted learning data item comprises:
    - calculating a number of data items corresponding to the first weight set for the first agent; and
      
      generating the first weighted learning data item based on the number of data items and the first learning dataset.
  - 5. The method of claim 1, wherein the updating the policy comprises:
    - updating the policy such that a reward value for the action of the robot increases.
  - 6. The method of claim 1, further comprising:
    - acquiring direct learning data of the robot generated based on the updated policy;
      
      generating a direct learning database including the direct learning data; and
      
      updating the policy based on the direct learning database.
  - 7. The method of claim 6, wherein the weighted learning database includes the direct learning database such that the updating the policy based on the direct learning database comprises:
    - updating the policy based on the weighted learning database.
  - 8. The method of claim 6, wherein the updating the policy based on the direct learning database comprises:
    - updating the policy in response to a set number of items of the direct learning data being generated.
  - 9. The method of claim 6, wherein the updating the policy based on the direct learning database comprises:
    - updating the policy in response to a reward value calculated based on the policy being greater than or equal to a set value.
  - 10. The method of claim 6, wherein the acquiring the direct learning data of the robot based on the updated policy comprises:
    - generating a current state of the robot using at least one sensor associated with the robot;
      
      controlling the action of the robot using the updated policy;
      
      calculating a reward for the action of the robot; and
      
      generating the direct learning data including the current state of the robot, the action of the robot, and the reward for the action of the robot.
  - 11. A non-transitory computer-readable medium comprising computer readable instructions that, when executed by a computer, cause the computer to perform the method of claim 1.

12. An electronic device configured to update a policy associated with controlling an action of a robot, the electronic device comprising:
- a memory configured to store a program for updating the action of the robot; and
  
  a processor configured to execute the program to,receive a plurality of learning datasets generated by a plurality of heterogeneous agents,generate a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents,acquire direct learning data of the robot generated based on the weighted learning database and the policy associated with controlling the action of the robot, andupdate the policy based on at least the direct learning data.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The electronic device of claim 12, wherein the processor is configured to update the policy by,updating the policy in response to a set number of items of the direct learning data being generated.
  - 14. The electronic device of claim 12, wherein the processor is configured to update the policy by,updating the policy in response to a reward value calculated based on the policy being greater than or equal to a set value.
  - 15. The electronic device of claim 12, wherein the processor is configured to update the policy by,updating the policy such that a reward value for the action of the robot increases.
  - 16. The electronic device of claim 12, wherein the processor is configured to acquire the direct learning data by,generating a current state of the robot using at least one sensor associated with the robot,controlling the action of the robot using the policy,calculating a reward for the action of the robot, andgenerating the direct learning data including the current state, the action, and the reward for the action of the robot.

17. A method of updating a policy associated with controlling an action of a robot, the method comprising:
- receiving a plurality of learning datasets generated by a plurality of heterogeneous agents;
  
  generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents;
  
  acquiring direct learning data of the robot generated based on the weighted learning database and the policy associated with controlling the action of the robot; and
  
  updating the policy based on at least the direct learning data.
- View Dependent Claims (18, 19)
- - 18. The method of claim 17, wherein the updating of the policy based on the direct learning data comprises:
    - updating the policy based on the weighted learning database and the direct learning data.
  - 19. The method of claim 17, wherein the acquiring of the direct learning data comprises:
    - generating a current state of the robot using at least one sensor associated with the robot;
      
      controlling the action of the robot using the policy;
      
      calculating a reward for the action of the robot; and
      
      generating the direct learning data including the current state, the action, and the reward for the action of the robot.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Jang, Jun-Won, Kim, Kyung-Rock, Ha, Taesin

Granted Patent

US 11,631,028 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

B25J 9/163   learning, adaptive, model b...

G05B 19/4155   characterised by programme ...

G05B 2219/49065   Execute learning mode first...

G06N 20/00   Machine learning

G06N 3/006   based on simulated virtual ...

G06N 3/008   based on physical entities ...

G06N 3/088   Non-supervised learning, e....

METHOD OF UPDATING POLICY FOR CONTROLLING ACTION OF ROBOT AND ELECTRONIC DEVICE PERFORMING THE METHOD

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

4 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD OF UPDATING POLICY FOR CONTROLLING ACTION OF ROBOT AND ELECTRONIC DEVICE PERFORMING THE METHOD

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links