Method and system for vision-centric deep-learning-based road situation analysis

US 9,760,806 B1
Filed: 05/11/2016
Issued: 09/12/2017
Est. Priority Date: 05/11/2016
Status: Active Grant

First Claim

Patent Images

1. A method for vision-centric deep-learning-based road situation analysis, comprising:

receiving real-time traffic environment visual input from at least one camera;

determining, using a recurrent you only look once (ROLO) engine, at least one initial region of interest from the real-time traffic environment visual input by using a convolutional neural networks (CNN) training method;

verifying, using the recurrent you only look once (ROLO) engine, the at least one initial region of interest to determine if a detected object in the at least one initial region of interest is a candidate object to be tracked by using the CNN training method;

in response to determining the detected object is a candidate object, tracking, using a plurality of long short-term memory units (LSTMs), the detected object based on the real-time traffic environment visual input, and predicting a future status of the detected object by using the CNN training method; and

determining if a warning signal is to be presented to a driver of a vehicle based on the predicted future status of the detected object.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with various embodiments of the disclosed subject matter, a method and a system for vision-centric deep-learning-based road situation analysis are provided. The method can include: receiving real-time traffic environment visual input from a camera; determining, using a ROLO engine, at least one initial region of interest from the real-time traffic environment visual input by using a CNN training method; verifying the at least one initial region of interest to determine if a detected object in the at least one initial region of interest is a candidate object to be tracked; using LSTMs to track the detected object based on the real-time traffic environment visual input, and predicting a future status of the detected object by using the CNN training method; and determining if a warning signal is to be presented to a driver of a vehicle based on the predicted future status of the detected object.

Citations

20 Claims

1. A method for vision-centric deep-learning-based road situation analysis, comprising:
- receiving real-time traffic environment visual input from at least one camera;
  
  determining, using a recurrent you only look once (ROLO) engine, at least one initial region of interest from the real-time traffic environment visual input by using a convolutional neural networks (CNN) training method;
  
  verifying, using the recurrent you only look once (ROLO) engine, the at least one initial region of interest to determine if a detected object in the at least one initial region of interest is a candidate object to be tracked by using the CNN training method;
  
  in response to determining the detected object is a candidate object, tracking, using a plurality of long short-term memory units (LSTMs), the detected object based on the real-time traffic environment visual input, and predicting a future status of the detected object by using the CNN training method; and
  
  determining if a warning signal is to be presented to a driver of a vehicle based on the predicted future status of the detected object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein tracking the detected object further comprises:
    - tracking the detected object based at least in partial on real-time signals of the detected object from a lidar sensor and an infrared sensor.
  - 3. The method of claim 1, wherein the future status of the detected object is determined by calculating a distance between the detected object and the vehicle, a speed of the detected object, and a moving direction of the detected object.
  - 4. The method of claim 1, wherein the candidate object to be tracked comprises:
    - a road line, another vehicle near the vehicle, a pedestrian, an obstacle in front of the vehicle, and a traffic sign.
  - 5. The method of claim 1, wherein the CNN training method comprises:
    - a pre-training phase of convolutional layers for feature learning;
      
      a you only look once (YOLO) training phase for object detection; and
      
      a LSTM training phase for object tracking.
  - 6. The method of claim 5, wherein the pre-training phase of convolutional layers comprises generating a feature cube to represent visual features of a plurality of detected objects.
  - 7. The method of claim 6, wherein the YOLO training phase for object detection comprises translating the feature cube to a tensor representation.
  - 8. The method of claim 5, before the LSTM training phase, further comprising:
    - encoding the feature cube into feature vectors.
  - 9. The method of claim 5, wherein the LSTM training phase for object tracking is performed in together with a Kalman filter.
  - 10. The method of claim 1, wherein the CNN training method comprises using a convolutional neural networks having a plurality of convolutional layers followed by two fully connected layers.

11. A system for vision-centric deep-learning-based road situation analysis, comprising:
- at least one camera for receiving real-time traffic environment visual input;
  
  a recurrent you only look once (ROLO) engine configured for;
  
  determining at least one initial region of interest from the real-time traffic environment visual input by using a convolutional neural networks (CNN) training method, andverifying the at least one initial region of interest to determine if a detected object in the at least one initial region of interest is a candidate object to be tracked by using the CNN training method;
  
  a plurality of long short-term memory units (LSTMs) configured for;
  
  in response to determining the detected object is a candidate object, tracking the detected object based on the real-time traffic environment visual input, andpredicting a future status of the detected object by using the CNN training method; and
  
  a decision making agent for determining if a warning signal to be presented to a driver of a vehicle based on the predicted future status of the detected object.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11, further comprising:
    - a sensor fusion configured for processing real-time signals of the detected object from a lidar sensor and an infrared sensor.
  - 13. The system of claim 11, wherein the plurality of long short-term memory units (LSTMs) are further configured for calculating a distance between the detected object and the vehicle, a speed of the detected object, and a moving direction of the detected object.
  - 14. The system of claim 11, further comprising:
    - a road line recognition module for determining if the detected object is a road line;
      
      a pedestrian detection module for determining if the detected object is a pedestrian;
      
      an obstacle detection module for determine if the detected object is an obstacle in front of the vehicle; and
      
      a traffic sign recognition module for determine if the detected object is a traffic sign.
  - 15. The system of claim 11, wherein the recurrent you only look once (ROLO) engine comprises a convolutional neural networks (CNN) for generating a feature cube to represent visual features of a plurality of detected objects.
  - 16. The system of claim 15, wherein the convolutional neural networks (CNN) is further configured for translating the feature cube to a tensor representation.
  - 17. The system of claim 15, wherein the convolutional neural networks (CNN) is further configured for encoding the feature cube into feature vectors before a LSTM training phase.
  - 18. The system of claim 17, wherein the plurality of long short-term memory units (LSTMs) are further configured for performing the LSTM training phase for object tracking in together with a Kalman filter.
  - 19. The system of claim 15, wherein the convolutional neural networks (CNN) has a plurality of convolutional layers followed by two fully connected layers.
  - 20. The system of claim 11, further comprising a human-computer interface to present the warning signal to the driver of the vehicle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TCL Research America, Inc. (TCL Technology Group Corp.)
Original Assignee
TCL Research America, Inc. (TCL Technology Group Corp.)
Inventors
Ning, Guanghan, Wang, Haohong, Bo, Wenqiang, Ren, Xiaobo
Primary Examiner(s)
Lu, Tom Y

Application Number

US15/152,094
Time in Patent Office

489 Days
Field of Search

382103, 382104, 382106, 382107
US Class Current
CPC Class Codes

B60W 2050/143   Alarm means

B60W 2050/146   Display means

B60W 2420/40   Photo, light or radio wave ...

B60W 2420/408   Radar; Laser, e.g. lidar

B60W 2554/00   Input parameters relating t...

B60W 50/14   Means for informing the dri...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06T 2207/20081   Training; Learning

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30261   Obstacle

G06T 7/277   involving stochastic approa...

G06V 10/454   Integrating the filters int...

G06V 20/56   exterior to a vehicle by us...

Method and system for vision-centric deep-learning-based road situation analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for vision-centric deep-learning-based road situation analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links