Method and system for mapping states and actions of an intelligent agent
First Claim
1. A method for mapping states and actions of an intelligent artificial agent, the method comprising the steps of:
- creating, by said at least one or more processors, at least a policy defining a behavior of said intelligent artificial agent, said policy comprising a set of policies for said intelligent artificial agent, each of said policies comprising at least a full or partial agent state information for mapping to an agent'"'"'s action;
creating a policy manifold, said policy manifold comprising a point on a surface associated with a policy, said surface comprising a set of surface points where each of said surface point is associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points;
associating each of said policies to said policy coordinate on said policy manifold;
organizing the policy coordinates of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policy coordinates are configured to reflect policy dissimilarities among each of said policies;
comparing actions produced by each policy having the same state information to determine dissimilarity between neighboring policies;
applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system comprise providing means and method for producing, modifying, and/or exploiting the structure of a policy manifold. Each of the policies at least comprises information for mapping state and/or sensory information as input to action preferences as output. One or more processing units assign each of the policies a policy coordinate on a policy manifold. The policy coordinate may in part be determined by a dissimilarity matrix or other means for organizing the coordinates of the policies on the policy manifold according to the properties of the policies and the topology of the policy manifold. The policy manifold comprises a dimensionality that is lower than a combined dimensionality of the input and the output, wherein the policy manifold at least in part determines a behavior of the intelligent artificial agent.
24 Citations
16 Claims
-
1. A method for mapping states and actions of an intelligent artificial agent, the method comprising the steps of:
-
creating, by said at least one or more processors, at least a policy defining a behavior of said intelligent artificial agent, said policy comprising a set of policies for said intelligent artificial agent, each of said policies comprising at least a full or partial agent state information for mapping to an agent'"'"'s action; creating a policy manifold, said policy manifold comprising a point on a surface associated with a policy, said surface comprising a set of surface points where each of said surface point is associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points; associating each of said policies to said policy coordinate on said policy manifold; organizing the policy coordinates of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policy coordinates are configured to reflect policy dissimilarities among each of said policies; comparing actions produced by each policy having the same state information to determine dissimilarity between neighboring policies; applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more processing units being configured to assemble a set of policies for an intelligent artificial agent, each of said policies at least comprising information for mapping state or sensory information as input to action preferences as output; said one or more processing units being further configured to implement said set of policies in a table, wherein table entries represent a probability of choosing said action preferences given said state and sensory information; said one or more processing units being further configured to generate a dissimilarity matrix between at least a pair of said policies; said one or more processing units being further configured to assign each of said policies a policy coordinate on a policy manifold, said policy coordinate being at least in part determined by said dissimilarity matrix, said policy manifold comprising a dimensionality being lower than a combined dimensionality of said input and said output, wherein said policy manifold at least in part determines a behavior of the intelligent artificial agent; said one or more processing units being further configured to organize the policies of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policies in the policy manifold are configured to reflect policy dissimilarities among each of said policies; a source of information data, said information data source comprising a device, said information data comprising said policy manifold of said device, wherein a learning update is applied to a policy of said policy manifold to produce policies with less dissimilarities; a sensory device, said sensory device is configured to receive said information data of said device; and said one or more processing units being further configured to apply an update to neighboring policies of said updated policy, and to repeat the process to generate a dissimilarity matrix and assign policy coordinates subsequent to the applied update, in which policies having similar output for similar input are assigned neighboring coordinates to provide a smoothness to said policy manifold and a distance between policy coordinates corresponds to a degree of dissimilarity between the associated policies, said inputs and said outputs are associated with coordinates or coordinate ranges in the policy manifold, and only policies whose coordinates overlap with said input and said output coordinates or coordinate ranges are allowed to access said inputs and said outputs. - View Dependent Claims (11)
-
-
12. A non-transitory computer-readable storage medium with an executable program stored thereon, wherein the program instructs one or more processors to perform the following steps:
-
assembling a set of policies, for an intelligent artificial agent, each of said policies at least comprising information for mapping state or sensory information as input to action preferences as output; creating a policy manifold, said policy manifold comprising a point on a surface, said surface comprising a set of surface points associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points; generating a dissimilarity matrix between said policies; assigning each of said policies a policy coordinate on said policy manifold, said policy coordinate being at least in part determined by said dissimilarity matrix, said policy manifold comprising a dimensionality being lower than a combined dimensionality of said input and said output, wherein said policy manifold at least in part determines a behavior of the intelligent artificial agent; organizing the policies of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policies in the policy manifold are configured to reflect policy dissimilarities among each of said policies; applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance. - View Dependent Claims (13, 14, 15, 16)
-
Specification