Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control

US 9,818,297 B2
Filed: 12/10/2012
Issued: 11/14/2017
Est. Priority Date: 12/16/2011
Status: Active Grant

First Claim

Patent Images

1. A system for adaptive traffic signal control comprising:

an agent comprising;

a processor;

a communication interface for coupling to a traffic signal array at a first intersection and to one or more other agents; and

a memory storing computer readable instructions that, when executed by the processor, cause the processor to generate and provide to the traffic signal array a control action for the traffic signal array by continuously updating in real-time a joint control policy for causing the agent to collaborate with the one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising;

tracking the control action at each update of the joint control policy and,updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on;

the tracked control actions;

respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and

gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of multi-agent reinforcement learning for integrated and networked adaptive traffic controllers (MARLIN-ATC). Agents linked to traffic signals generate control actions for an optimal control policy based on traffic conditions at the intersection and one or more other intersections. The agent provides a control action considering the control policy for the intersection and one or more neighboring intersections. Due to the cascading effect of the system, each agent implicitly considers the whole traffic environment, which results in an overall optimized control policy.

29 Citations

View as Search Results

18 Claims

1. A system for adaptive traffic signal control comprising:
- an agent comprising;
  
  a processor;
  
  a communication interface for coupling to a traffic signal array at a first intersection and to one or more other agents; and
  
  a memory storing computer readable instructions that, when executed by the processor, cause the processor to generate and provide to the traffic signal array a control action for the traffic signal array by continuously updating in real-time a joint control policy for causing the agent to collaborate with the one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising;
  
  tracking the control action at each update of the joint control policy and,updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on;
  
  the tracked control actions;
  
  respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and
  
  gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein each other intersection is adjacent to the first intersection.
  - 3. The system of claim 1, wherein the agent adapts the joint control policy to stochastic traffic patterns.
  - 4. The system of claim 1, further comprising:
    - a traffic condition module, executed on the processor, configured to observe local traffic conditions at the traffic signal array that are used, in conjunction with the joint control policy, by the agent to generate the control action.
  - 5. The system of claim 4, wherein the joint control policy used by the agent to generate the control action considers local traffic conditions at the selected neighbouring traffic signal arrays.
  - 6. The system of claim 4, wherein the updating of the joint control policy is based on a state vector for the agent comprising an index of a current green phase of the traffic signal array, elapsed time of a current phase and maximum queue lengths determined based on the observed traffic conditions.
  - 7. The system of claim 4, wherein the cumulative reward is defined as any reduction in total cumulative delay at the traffic signal array based on the observed traffic conditions, and wherein determination of the cumulative reward differs between agents.
  - 8. The system of claim 1, wherein the agent determines the joint control policy via the application of game theory.
  - 9. The system of claim 1, wherein the agent continuously updates in real-time the joint control policy with two or more other selected neighbouring traffic signal arrays located at the other intersections.

10. A method for adaptive traffic signal control comprising:
- storing computer-readable instructions in a memory of an agent;
  
  executing the computer-readable instructions with a processor of the agent, causing the agent to;
  
  generate a control action for a traffic signal array at a first intersection with which the agent is in communication by continuously updating in real-time a joint control policy with one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy for causing the agent to collaborate with the one or more other agents, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising;
  
  tracking the control action at each update of the joint control policy, updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on;
  
  the tracked control actions;
  
  respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and
  
  gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents; and
  
  providing the control action to the traffic signal array via a communication interface of the agent.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, wherein each other intersection is adjacent to the first intersection.
  - 12. The method of claim 10, further comprising adapting the joint control policy to stochastic traffic patterns.
  - 13. The method of claim 10, further comprising:
    - observing, by a traffic condition module of the agent, the traffic condition module executed on the processor, local traffic conditions at the traffic signal array that are used, in conjunction with the joint control policy, by the agent to generate the control action.
  - 14. The method of claim 13, wherein the joint control policy used by the agent to generate the control action considers local traffic conditions at the selected neighbouring traffic signal arrays.
  - 15. The method of claim 13, wherein the updating of the joint control policy is based on a state vector for the agent comprising an index of a current green phase of the traffic signal array, elapsed time of a current phase and maximum queue lengths determined based on the observed traffic conditions.
  - 16. The method of claim 13, wherein the cumulative reward is defined as any reduction in total cumulative delay at the traffic signal array based on the observed traffic conditions, and wherein determination of the cumulative reward differs between agents.
  - 17. The method of claim 10, wherein the agent determines the joint control policy via the application of game theory.
  - 18. The method of claim 10, wherein the agent continuously updates in real-time the joint control policy with two or more selected neighbouring traffic signal arrays located at the other intersections.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pragmatek Transport Innovations, Inc.
Original Assignee
Pragmatek Transport Innovations, Inc.
Inventors
El-Tantawy, Samah, Abdulhai, Baher
Primary Examiner(s)
NGUYEN, LAURA N

Application Number

US14/364,998
Publication Number

US 20150102945A1
Time in Patent Office

1,800 Days
Field of Search

None
US Class Current
CPC Class Codes

G08G 1/081 Plural intersections under ...

G08G 1/083 Controlling the allocation ...

Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links