Triggering video surveillance using embedded voice, speech, or sound recognition

US 10,582,167 B2
Filed: 08/31/2015
Issued: 03/03/2020
Est. Priority Date: 08/31/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a computer system, an audio signal captured from an area to be monitored via video surveillance;

recognizing, by the computer system via an embedded recognition component, a voice, speech phrase, or environmental sound in the audio signal;

determining, by the computer system, that the recognized voice, speech phrase, or environmental sound corresponds to a predefined trigger condition;

in response to the determining, detecting, by the computer system, whether the audio signal includes one or more aspects that characterize the audio signal as being from a pre-recorded television program or pre-recorded piece of music; and

if the one or more aspects are not detected in the audio signal;

transmitting, by the computer system, a signal to one or more video capture devices to begin video recording of the area; and

transmitting, by the computer system, an alert to a mobile device of an individual indicating that video surveillance of the area has been initiated.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for automatically triggering video surveillance using embedded voice, speech, or sound recognition are provided. In one embodiment, a computer system can receive an audio signal captured from an area to be monitored via video surveillance. The computer system can further recognize, via an embedded recognition component, a voice, speech phrase, or environmental sound in the audio signal, and can determine that the recognized voice, speech phrase, or environmental sound corresponds to a predefined trigger condition. The computer system can then automatically transmit a signal to one or more video capture devices to begin video recording of the area.

9 Citations

24 Claims

1. A method comprising:
- receiving, by a computer system, an audio signal captured from an area to be monitored via video surveillance;
  
  recognizing, by the computer system via an embedded recognition component, a voice, speech phrase, or environmental sound in the audio signal;
  
  determining, by the computer system, that the recognized voice, speech phrase, or environmental sound corresponds to a predefined trigger condition;
  
  in response to the determining, detecting, by the computer system, whether the audio signal includes one or more aspects that characterize the audio signal as being from a pre-recorded television program or pre-recorded piece of music; and
  
  if the one or more aspects are not detected in the audio signal;
  
  transmitting, by the computer system, a signal to one or more video capture devices to begin video recording of the area; and
  
  transmitting, by the computer system, an alert to a mobile device of an individual indicating that video surveillance of the area has been initiated.
- View Dependent Claims (2, 3, 4, 5, 6, 19, 20, 21, 22, 23, 24)
- - 2. The method of claim 1 wherein the recognizing of the voice, speech phrase, or environmental sound is performed entirely locally by the embedded recognition component, without interacting with any remote computing resources.
  - 3. The method of claim 1 wherein the predefined condition indicates that a security breach or an emergency situation has occurred in the area to be monitored.
  - 4. The method of claim 1 wherein the predefined condition is configured by a user of the computer system.
  - 5. The method of claim 4 wherein the area to be monitored is within a home, and wherein the predefined condition is configured by a homeowner or occupant of the home.
  - 6. The method of claim 1 further comprising presenting a user interface on the mobile device of the individual that includes controls for controlling operation of the one or more video capture devices.
  - 19. The method of claim 1 wherein if the one or more aspects are detected, the computer system avoids transmitting the signal to the one or more video capture devices to begin the video recording.
  - 20. The method of claim 1 further comprising:
    - upon determining that the recognized voice, speech phrase, or environmental sound corresponds to the predefined trigger condition, providing the audio signal to the individual for screening, prior to transmitting the signal to the one or more video capture devices to begin the video recording.
  - 21. The method of claim 1 further comprising, subsequently to transmitting the alert:
    - identifying an occurrence of a predefined termination event; and
      
      in response to identifying the occurrence of the predefined termination event, transmitting a signal to the one or more video capture devices to stop the video recording.
  - 22. The method of claim 21 wherein the predefined termination event is detection of a particular object in video recorded by the one or more video capture devices.
  - 23. The method of claim 21 wherein the predefined termination event is detection of a particular voice, speech phrase, or voice indicating that video surveillance of the area is no longer needed.
  - 24. The method of claim 1 wherein determining that the recognized voice, speech phrase, or environmental sound corresponds to the predefined trigger condition comprises:
    - determining that the recognized voice is an unknown voice that does not correspond to any of a group of known users enrolled into the computer system.

7. A non-transitory computer readable medium having stored thereon program code executable by a processor, the program code comprising:
- code that causes the processor to receive an audio signal captured from an area to be monitored via video surveillance;
  
  code that causes the processor to recognize, via an embedded recognition component, a voice, speech phrase, or environmental sound in the audio signal;
  
  code that causes the processor to determine that the recognized voice, speech phrase, or environmental sound corresponds to a predefined trigger condition;
  
  in response to the determining, code that causes the processor to detect whether the audio signal includes one or more aspects that characterize the audio signal as being from a pre-recorded television program or pre-recorded piece of music; and
  
  if the one or more aspects are not detected in the audio signal;
  
  code that causes the processor to transmit a signal to one or more video capture devices to begin video recording of the area; and
  
  code that causes the processor to transmit an alert to a mobile device of an individual indicating that video surveillance of the area has been initiated.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The non-transitory computer readable medium of claim 7 wherein the recognizing of the voice, speech phrase, or environmental sound is performed entirely locally by the embedded recognition component, without interacting with any remote computing resources.
  - 9. The non-transitory computer readable medium of claim 7 wherein the predefined condition indicates that a security breach or an emergency situation has occurred in the area to be monitored.
  - 10. The non-transitory computer readable medium of claim 7 wherein the predefined condition is configured by a user of the computer system.
  - 11. The non-transitory computer readable medium of claim 10 wherein the area to be monitored is within a home, and wherein the predefined condition is configured by a homeowner or occupant of the home.
  - 12. The non-transitory computer readable medium of claim 7 wherein a user interface is presented on the mobile device of the individual that includes controls for controlling operation of the one or more video capture devices.

13. A computer system comprising:
- a processor; and
  
  a non-transitory computer readable medium having stored thereon executable program code which, when executed by the processor, causes the processor to;
  
  receive an audio signal captured from an area to be monitored via video surveillance;
  
  recognize, via an embedded recognition component, a voice, speech phrase, or environmental sound in the audio signal;
  
  determine that the recognized voice, speech phrase, or environmental sound corresponds to a predefined trigger condition;
  
  in response to the determining, detect whether the audio signal includes one or more aspects that characterize the audio signal as being from a pre-recorded television program or pre-recorded piece of music; and
  
  if the one or more aspects are not detected in the audio signal;
  
  transmit a signal to one or more video capture devices to begin video recording of the area; and
  
  transmit an alert to a mobile device of an individual indicating that video surveillance of the area has been initiated.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer system of claim 13 wherein the recognizing of the voice, speech phrase, or environmental sound is performed entirely locally by the embedded recognition component, without interacting with any remote computing resources.
  - 15. The computer system of claim 13 wherein the predefined condition indicates that a security breach or an emergency situation has occurred in the area to be monitored.
  - 16. The computer system of claim 13 wherein the predefined condition is configured by a user of the computer system.
  - 17. The computer system of claim 16 wherein the area to be monitored is within a home, and wherein the predefined condition is configured by a homeowner or occupant of the home.
  - 18. The computer system of claim 13 wherein a user interface is presented on the mobile device of the individual that includes controls for controlling operation of the one or more video capture devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Mozer, Todd F.
Primary Examiner(s)
Sosanya, Obafemi O

Application Number

US14/840,463
Publication Number

US 20170064262A1
Time in Patent Office

1,646 Days
Field of Search
US Class Current
CPC Class Codes

G08B 13/1672   using sonic detecting means...

G08B 13/19682   Graphic User Interface [GUI...

G08B 13/19689   Remote control of cameras, ...

G08B 13/19695   Arrangements wherein non-vi...

H04N 7/188   Capturing isolated or inter...

Triggering video surveillance using embedded voice, speech, or sound recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Triggering video surveillance using embedded voice, speech, or sound recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others