Method and system for mapping states and actions of an intelligent agent

US 9,311,600 B1
Filed: 06/02/2013
Issued: 04/12/2016
Est. Priority Date: 06/03/2012
Status: Active Grant

First Claim

Patent Images

1. A method for mapping states and actions of an intelligent artificial agent, the method comprising the steps of:

creating, by said at least one or more processors, at least a policy defining a behavior of said intelligent artificial agent, said policy comprising a set of policies for said intelligent artificial agent, each of said policies comprising at least a full or partial agent state information for mapping to an agent'"'"'s action;

creating a policy manifold, said policy manifold comprising a point on a surface associated with a policy, said surface comprising a set of surface points where each of said surface point is associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points;

associating each of said policies to said policy coordinate on said policy manifold;

organizing the policy coordinates of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policy coordinates are configured to reflect policy dissimilarities among each of said policies;

comparing actions produced by each policy having the same state information to determine dissimilarity between neighboring policies;

applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system comprise providing means and method for producing, modifying, and/or exploiting the structure of a policy manifold. Each of the policies at least comprises information for mapping state and/or sensory information as input to action preferences as output. One or more processing units assign each of the policies a policy coordinate on a policy manifold. The policy coordinate may in part be determined by a dissimilarity matrix or other means for organizing the coordinates of the policies on the policy manifold according to the properties of the policies and the topology of the policy manifold. The policy manifold comprises a dimensionality that is lower than a combined dimensionality of the input and the output, wherein the policy manifold at least in part determines a behavior of the intelligent artificial agent.

24 Citations

View as Search Results

16 Claims

1. A method for mapping states and actions of an intelligent artificial agent, the method comprising the steps of:
- creating, by said at least one or more processors, at least a policy defining a behavior of said intelligent artificial agent, said policy comprising a set of policies for said intelligent artificial agent, each of said policies comprising at least a full or partial agent state information for mapping to an agent'"'"'s action;
  
  creating a policy manifold, said policy manifold comprising a point on a surface associated with a policy, said surface comprising a set of surface points where each of said surface point is associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points;
  
  associating each of said policies to said policy coordinate on said policy manifold;
  
  organizing the policy coordinates of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policy coordinates are configured to reflect policy dissimilarities among each of said policies;
  
  comparing actions produced by each policy having the same state information to determine dissimilarity between neighboring policies;
  
  applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method as recited in claim 1, further comprising the step of assigning similar policies to neighboring coordinates to help provide a smoothness to said policy manifold, wherein said policy manifold is less smoother if said neighboring policies are more similar to distant policies than said updated policies.
  - 3. The method as recited in claim 2, in which said distance between policy coordinates corresponds to a degree of dissimilarity between the associated policies.
  - 4. The method as recited in claim 3, in which an input and an outputs of said policy are associated with said policy coordinates in the policy manifold, and only policies whose coordinates overlap with said input and said output coordinates are allowed to access said input and said outputs.
  - 5. The method as recited in claim 1, further comprising the step of applying an update to at least one neighboring policy coordinate of a modified policy coordinate of said policy manifold, wherein the policy manifold is smoother if policies of neighboring policy coordinates of said modified policy coordinate is more similar to policies of said modified policy coordinates than distant policies.
  - 6. The method as recited in claim 5, further comprising the step of organizing at least one policy in a policy manifold according to a property of said at least one policy or a topology of said policy manifold.
  - 7. The method as recited in claim 5, further comprising the step of repeating the steps of creating a policy defining a behavior of said intelligent artificial agent and assigning similar policy coordinates subsequent to the step of applying said learning update.
  - 8. The method as recited in claim 5, in which said at least one or more processors generating a dissimilarity matrix between at least a pair of said policies, and said policy coordinate is at least in part determined by said dissimilarity matrix.
  - 9. The method as recited in claim 1, further comprising the step of generating a grid pattern for said coordinates, said policy further comprising a starting condition when said policy may begin or continue, said policy further comprising a termination condition when said policy may terminate.

10. A system comprising:
- one or more processing units being configured to assemble a set of policies for an intelligent artificial agent, each of said policies at least comprising information for mapping state or sensory information as input to action preferences as output;
  
  said one or more processing units being further configured to implement said set of policies in a table, wherein table entries represent a probability of choosing said action preferences given said state and sensory information;
  
  said one or more processing units being further configured to generate a dissimilarity matrix between at least a pair of said policies;
  
  said one or more processing units being further configured to assign each of said policies a policy coordinate on a policy manifold, said policy coordinate being at least in part determined by said dissimilarity matrix, said policy manifold comprising a dimensionality being lower than a combined dimensionality of said input and said output, wherein said policy manifold at least in part determines a behavior of the intelligent artificial agent;
  
  said one or more processing units being further configured to organize the policies of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policies in the policy manifold are configured to reflect policy dissimilarities among each of said policies;
  
  a source of information data, said information data source comprising a device, said information data comprising said policy manifold of said device, wherein a learning update is applied to a policy of said policy manifold to produce policies with less dissimilarities;
  
  a sensory device, said sensory device is configured to receive said information data of said device; and
  
  said one or more processing units being further configured to apply an update to neighboring policies of said updated policy, and to repeat the process to generate a dissimilarity matrix and assign policy coordinates subsequent to the applied update, in which policies having similar output for similar input are assigned neighboring coordinates to provide a smoothness to said policy manifold and a distance between policy coordinates corresponds to a degree of dissimilarity between the associated policies, said inputs and said outputs are associated with coordinates or coordinate ranges in the policy manifold, and only policies whose coordinates overlap with said input and said output coordinates or coordinate ranges are allowed to access said inputs and said outputs.
- View Dependent Claims (11)
- - 11. The system as recited in claim 10, said one or more processing units being further configured to be operable to generate a grid pattern for said coordinates.

12. A non-transitory computer-readable storage medium with an executable program stored thereon, wherein the program instructs one or more processors to perform the following steps:
- assembling a set of policies, for an intelligent artificial agent, each of said policies at least comprising information for mapping state or sensory information as input to action preferences as output;
  
  creating a policy manifold, said policy manifold comprising a point on a surface, said surface comprising a set of surface points associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points;
  
  generating a dissimilarity matrix between said policies;
  
  assigning each of said policies a policy coordinate on said policy manifold, said policy coordinate being at least in part determined by said dissimilarity matrix, said policy manifold comprising a dimensionality being lower than a combined dimensionality of said input and said output, wherein said policy manifold at least in part determines a behavior of the intelligent artificial agent;
  
  organizing the policies of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policies in the policy manifold are configured to reflect policy dissimilarities among each of said policies;
  
  applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The program instructing the processor as recited in claim 12, in which similar policies are assigned neighboring coordinates to provide a smoothness to said policy manifold, and a distance between policy coordinates corresponds to a degree of dissimilarity between the associated policies.
  - 14. The program instructing the processor as recited in claim 13, in which said input and said outputs are associated with coordinates or coordinate ranges in the policy manifold, and only policies whose coordinates overlap with said input and said output coordinates or coordinate ranges are allowed to access said input and said outputs.
  - 15. The program instructing the processor as recited in claim 12, in which said update is a result of a learning process of a neural network.
  - 16. The program instructing the processor as recited in claim 12, further comprising the step of generating a grid pattern for said updated policy coordinates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Corporation Of America (Sony Group Corp.)
Original Assignee
Cogitai, Inc.
Inventors
Ring, Mark Bishop
Primary Examiner(s)
Rifkin, Ben
Assistant Examiner(s)
Tran, Mai T

Application Number

US13/907,936
Time in Patent Office

1,045 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

B25J 9/163   learning, adaptive, model b...

G05B 2219/33002   Artificial intelligence AI,...

G06N 20/00   Machine learning

G06N 3/006   based on simulated virtual ...

G06N 5/043   Distributed expert systems;...

Method and system for mapping states and actions of an intelligent agent

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for mapping states and actions of an intelligent agent

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links