METHOD OF UPDATING POLICY FOR CONTROLLING ACTION OF ROBOT AND ELECTRONIC DEVICE PERFORMING THE METHOD
First Claim
Patent Images
1. A method of updating a policy associated with controlling an action of a robot, the method comprising:
- receiving a plurality of learning datasets generated by a plurality of heterogeneous agents;
generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents; and
updating the policy associated with controlling the action of the robot based on the weighted learning database to generate an updated policy.
1 Assignment
0 Petitions
Accused Products
Abstract
A tendency of an action of a robot may vary based on learning data used for training. The learning data may be generated by an agent performing an identical or similar task to a task of the robot. An apparatus and method for updating a policy for controlling an action of a robot may update the policy of the robot using a plurality of learning data sets generated by a plurality of heterogeneous agents, such that the robot may appropriately act even in an unpredicted environment.
4 Citations
19 Claims
-
1. A method of updating a policy associated with controlling an action of a robot, the method comprising:
-
receiving a plurality of learning datasets generated by a plurality of heterogeneous agents; generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents; and updating the policy associated with controlling the action of the robot based on the weighted learning database to generate an updated policy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An electronic device configured to update a policy associated with controlling an action of a robot, the electronic device comprising:
-
a memory configured to store a program for updating the action of the robot; and a processor configured to execute the program to, receive a plurality of learning datasets generated by a plurality of heterogeneous agents, generate a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents, acquire direct learning data of the robot generated based on the weighted learning database and the policy associated with controlling the action of the robot, and update the policy based on at least the direct learning data. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method of updating a policy associated with controlling an action of a robot, the method comprising:
-
receiving a plurality of learning datasets generated by a plurality of heterogeneous agents; generating a weighted learning database based on the plurality of learning datasets and weight sets associated with the plurality of heterogeneous agents; acquiring direct learning data of the robot generated based on the weighted learning database and the policy associated with controlling the action of the robot; and updating the policy based on at least the direct learning data. - View Dependent Claims (18, 19)
-
Specification