Multi-agent reinforcement learning for integrated and networked adaptive traffic signal control
First Claim
Patent Images
1. A system for adaptive traffic signal control comprising:
- an agent comprising;
a processor;
a communication interface for coupling to a traffic signal array at a first intersection and to one or more other agents; and
a memory storing computer readable instructions that, when executed by the processor, cause the processor to generate and provide to the traffic signal array a control action for the traffic signal array by continuously updating in real-time a joint control policy for causing the agent to collaborate with the one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising;
tracking the control action at each update of the joint control policy and,updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on;
the tracked control actions;
respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and
gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method of multi-agent reinforcement learning for integrated and networked adaptive traffic controllers (MARLIN-ATC). Agents linked to traffic signals generate control actions for an optimal control policy based on traffic conditions at the intersection and one or more other intersections. The agent provides a control action considering the control policy for the intersection and one or more neighboring intersections. Due to the cascading effect of the system, each agent implicitly considers the whole traffic environment, which results in an overall optimized control policy.
29 Citations
18 Claims
-
1. A system for adaptive traffic signal control comprising:
an agent comprising; a processor; a communication interface for coupling to a traffic signal array at a first intersection and to one or more other agents; and a memory storing computer readable instructions that, when executed by the processor, cause the processor to generate and provide to the traffic signal array a control action for the traffic signal array by continuously updating in real-time a joint control policy for causing the agent to collaborate with the one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising; tracking the control action at each update of the joint control policy and, updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on; the tracked control actions; respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method for adaptive traffic signal control comprising:
-
storing computer-readable instructions in a memory of an agent; executing the computer-readable instructions with a processor of the agent, causing the agent to; generate a control action for a traffic signal array at a first intersection with which the agent is in communication by continuously updating in real-time a joint control policy with one or more other agents in communication with the agent, the one or more other agents controlling selected neighbouring traffic signal arrays located at other intersections neighbouring the first intersection along two dimensions, the joint control policy for causing the agent to collaborate with the one or more other agents, the joint control policy comprising a traffic optimization policy simultaneously considering both of the two dimensions, determination of the joint control policy comprising; tracking the control action at each update of the joint control policy, updating of a Q-value or a Q-factor of the joint control policy to improve a cumulative reward, the updating of the joint control policy being based on; the tracked control actions; respective selected control actions and individual control policies exchanged by the agent with the one or more other agents for negotiation, each individual control policy defining a mapping from a traffic state to a control action for the respective agent; and gain messages exchanged by the agent with the one or more other agents comprising, for the exchanged selected control actions and individual control policies, maximum gain values determined by each agent to be obtainable by the respective agent changing its selected control action to the selected actions of the other agents; and providing the control action to the traffic signal array via a communication interface of the agent. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification