Convergent actor critic-based fuzzy reinforcement learning apparatus and method

US 20020198854A1
Filed: 12/21/2001
Published: 12/26/2002
Est. Priority Date: 03/30/2001
Status: Active Grant

First Claim

Patent Images

1. A software program for providing instructions to a processor which controls a system for applying actor-critic based fuzzy reinforcement learning, comprising:

a database of fuzzy-logic rules for mapping input data to output commands for modifying a system state; and

a reinforcement learning algorithm for updating the fuzzy-logic rules database based on effects on the system state of the output commands mapped from the input data, and wherein the reinforcement learning algorithm is configured to converge at least one parameter of the system state towards at least approximately an optimum value following multiple mapping and updating iterations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is controlled by an actor-critic based fuzzy reinforcement learning algorithm that provides instructions to a processor of the system for applying actor-critic based fuzzy reinforcement learning. The system includes a database of fuzzy-logic rules for mapping input data to output commands for modifying a system state, and a reinforcement learning algorithm for updating the fuzzy-logic rules database based on effects on the system state of the output commands mapped from the input data. The reinforcement learning algorithm is configured to converge at least one parameter of the system state to at least approximately an optimum value following multiple mapping and updating iterations. The reinforcement learning algorithm may be based on an update equation including a derivative with respect to said at least one parameter of a logarithm of a probability function for taking a selected action when a selected state is encountered. The reinforcement learning algorithm may be configured to update the at least one parameter based on said update equation. The system may include a wireless transmitter.

Citations

12 Claims

1. A software program for providing instructions to a processor which controls a system for applying actor-critic based fuzzy reinforcement learning, comprising:
- a database of fuzzy-logic rules for mapping input data to output commands for modifying a system state; and
  
  a reinforcement learning algorithm for updating the fuzzy-logic rules database based on effects on the system state of the output commands mapped from the input data, and wherein the reinforcement learning algorithm is configured to converge at least one parameter of the system state towards at least approximately an optimum value following multiple mapping and updating iterations.
- View Dependent Claims (2, 3, 4)
- - 2. The software program of claim 1, wherein the reinforcement learning algorithm is based on an update equation including a derivative with respect to said at least one parameter of a logarithm of a probability function for taking a selected action when a selected state is encountered.
  - 3. The software program of claim 2, wherein the reinforcement learning algorithm is configured to update the at least one parameter based on said update equation.
  - 4. The software program of any of claims 1-3, wherein the system includes a wireless transmitter.

5. A method of controlling a system including a processor for applying actor-critic based fuzzy reinforcement learning, comprising the operations:
- mapping input data to output commands for modifying a system state according to fuzzy-logic rules;
  
  updating the fuzzy-logic rules based on effects on the system state of the output commands mapped from the input data; and
  
  converging at least one parameter of the system state towards at least approximately an optimum value following multiple mapping and updating iterations.
- View Dependent Claims (6, 7, 8)
- - 6. The method of claim 5, wherein the updating operation includes taking a derivative with respect to said at least one parameter of a logarithm of a probability function for taking a selected action when a selected state is encountered.
  - 7. The method of claim 6, wherein the updating operation includes updating the at least one parameter based on said derivative.
  - 8. The method of any of claims 5-7, wherein the system includes a wireless transmitter.

9. A system controlled by an actor-critic based fuzzy reinforcement learning algorithm which provides instructions to a processor of the system for applying actor-critic based fuzzy reinforcement learning, comprising:
- the processor;
  
  at least one system component whose actions are controlled by said processor;
  
  at least one storage medium accessible by said processor, including data stored therein corresponding to;
  
  a database of fuzzy-logic rules for mapping input data to output commands for modifying a system state; and
  
  a reinforcement learning algorithm for updating the fuzzy-logic rules database based on effects on the system state of the output commands mapped from the input data, and wherein the reinforcement learning algorithm is configured to converge at least one parameter of the system state towards at least approximately an optimum value following multiple mapping and updating iterations.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9, wherein the reinforcement learning algorithm is based on an update equation including a derivative with respect to said at least one parameter of a logarithm of a probability function for taking a selected action when a selected state is encountered.
  - 11. The system of claim 10, wherein the reinforcement learning algorithm is configured to update the at least one parameter based on said update equation.
  - 12. The system of any of claims 9-11, wherein said at least one system component comprises a wireless transmitter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intelligent Inference Systems Corporation
Original Assignee
Intelligent Inference Systems Corporation
Inventors
Vengrov, David, Berenji, Hamid R.

Granted Patent

US 6,917,925 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

G06N 7/023 Learning or tuning the para...

Convergent actor critic-based fuzzy reinforcement learning apparatus and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Convergent actor critic-based fuzzy reinforcement learning apparatus and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links