Direction based end-pointing for speech recognition

US 10,566,012 B1
Filed: 10/12/2018
Issued: 02/18/2020
Est. Priority Date: 02/25/2013
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

at least one processor; and

at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the system to;

receive audio signals from a microphone array, the audio signals representing at least first speech of a first user,determine, using the audio signals, that first audio originated from a first direction,process the audio signals to generate a first audio signal corresponding to the first direction,determine, based on at least one characteristic of the first audio signal, that the first audio signal represents second speech of the first user, andbased at least in part on determining that the first audio signal represents the second speech, cause speech recognition processing to be performed using the first audio signal to determine first text corresponding to at least a portion of the second speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Citations

20 Claims

1. A system, comprising:
- at least one processor; and
  
  at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the system to;
  
  receive audio signals from a microphone array, the audio signals representing at least first speech of a first user,determine, using the audio signals, that first audio originated from a first direction,process the audio signals to generate a first audio signal corresponding to the first direction,determine, based on at least one characteristic of the first audio signal, that the first audio signal represents second speech of the first user, andbased at least in part on determining that the first audio signal represents the second speech, cause speech recognition processing to be performed using the first audio signal to determine first text corresponding to at least a portion of the second speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - determine, using the audio signals, that second audio originated from a second direction;
      
      process the audio signals to generate a second audio signal corresponding to the second direction;
      
      determine, based on at least one characteristic of the second audio signal, that the second audio signal represents a sound other than speech of the first user; and
      
      based at least in part on determining that the second audio signal represents the sound, refrain from performing speech recognition processing using the second audio signal.
  - 3. The system of claim 2, wherein the sound overlaps in time at least partially with the second speech.
  - 4. The system of claim 3, wherein the second speech begins before the sound.
  - 5. The system of claim 1, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - determine, using the audio signals, that second audio originated from a second direction;
      
      process the audio signals to generate a second audio signal corresponding to the second direction; and
      
      determine, based on at least one characteristic of the second audio signal, that the second audio signal represents third speech of a second user, the third speech being louder than the second speech, and overlapping in time at least partially with the second speech.
  - 6. The system of claim 5, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - refrain from performing speech recognition processing using the second audio signal.
  - 7. The system of claim 5, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - perform speech recognition processing using the second audio signal to determine second text corresponding to at least a portion of the third speech.
  - 8. The system of claim 1, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - determine, based on at least one characteristic of the first audio signal, an end point of the second speech.
  - 9. The system of claim 8, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - determine the end point at least in part by determining that the first audio signal has ceased to indicate a presence of the second speech.
  - 10. The system of claim 1, wherein the at least one computer-readable medium is encoded with additional instruction which, when executed by the at least one processor, further cause the system to:
    - determine that the first audio has begun originating from a second direction;
      
      process the audio signals to generate a second audio signal corresponding to the second direction; and
      
      perform speech recognition processing using the second audio signal to determine second text corresponding to at least a further portion of the second speech.

11. A method, comprising:
- receiving audio signals from a microphone array, the audio signals representing at least first speech of a first user;
  
  determining, using the audio signals, that first audio originated from a first direction;
  
  processing the audio signals to generate a first audio signal corresponding to the first direction;
  
  determining, based on at least one characteristic of the first audio signal, that the first audio signal represents second speech of the first user; and
  
  based at least in part on determining that the first audio signal represents the second speech, causing speech recognition processing to be performed using the first audio signal to determine first text corresponding to at least a portion of the second speech.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, further comprising:
    - determining, using the audio signals, that second audio originated from a second direction;
      
      processing the audio signals to generate a second audio signal corresponding to the second direction;
      
      determining, based on at least one characteristic of the second audio signal, that the second audio signal represents a sound other than speech of the first user; and
      
      based at least in part on determining that the second audio signal represents the sound, refraining from performing speech recognition processing using the second audio signal.
  - 13. The method of claim 12, wherein the sound overlaps in time at least partially with the second speech.
  - 14. The method of claim 13, wherein the second speech begins before the sound.
  - 15. The method of claim 11, further comprising:
    - determining, using the audio signals, that second audio originated from a second direction;
      
      processing the audio signals to generate a second audio signal corresponding to the second direction; and
      
      determining, based on at least one characteristic of the second audio signal, that the second audio signal represents third speech of a second user, the third speech being louder than the second speech, and overlapping in time at least partially with the second speech.
  - 16. The method of claim 15, further comprising:
    - refraining from performing speech recognition processing using the second audio signal.
  - 17. The method of claim 15, further comprising:
    - performing speech recognition processing using the second audio signal to determine second text corresponding to at least a portion of the third speech.
  - 18. The method of claim 11, further comprising:
    - determining, based on at least one characteristic of the first audio signal, an end point of the second speech.
  - 19. The method of claim 18, wherein determining the end point comprises determining that the first audio signal has ceased to indicate a presence of the second speech.
  - 20. The method of claim 11, further comprising:
    - determining that the first audio has begun originating from a second direction;
      
      processing the audio signals to generate a second audio signal corresponding to the second direction; and
      
      performing speech recognition processing using the second audio signal to determine second text corresponding to at least a further portion of the second speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Basye, Kenneth John, Adams, Jeffrey Penrod
Primary Examiner(s)
Patel, Shreyans A

Application Number

US16/158,775
Time in Patent Office

494 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/78   Detection of presence or ab...

G10L 25/87   Detection of discrete point...

Direction based end-pointing for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Direction based end-pointing for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links