Direction based end-pointing for speech recognition

US 10,102,850 B1
Filed: 02/25/2013
Issued: 10/16/2018
Est. Priority Date: 02/25/2013
Status: Active Grant

First Claim

Patent Images

1. A computing device for performing speech recognition on audio received by a microphone array, comprising:

at least one processor; and

a memory device including instructions which, when executed by the at least one processor, cause the computing device to;

receive a plurality of audio signals from the microphone array;

process the plurality of audio signals to generate a plurality of beamformed signals;

determine a first plurality of characteristics of a first beamformed signal of the plurality of beamformed signals;

detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal;

determine a second plurality of characteristics of a second beamformed signal of the plurality of beamformed signals;

detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal;

select the first beamformed signal, the first beamformed signal corresponding to a first utterance;

cause speech recognition to be performed on the first beamformed signal;

determine an end of the speech represented in the first beamformed signal using the first plurality of characteristics of the first beamformed signal;

perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal;

select the second beamformed signal during performance of speech recognition on the first beamformed signal, the second beamformed signal corresponding to a second utterance;

cause speech recognition to be performed on the second beamformed signal; and

determine an end of the speech represented in the second beamformed signal using the second plurality of characteristics of the second beamformed signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

Citations

14 Claims

1. A computing device for performing speech recognition on audio received by a microphone array, comprising:
- at least one processor; and
  
  a memory device including instructions which, when executed by the at least one processor, cause the computing device to;
  
  receive a plurality of audio signals from the microphone array;
  
  process the plurality of audio signals to generate a plurality of beamformed signals;
  
  determine a first plurality of characteristics of a first beamformed signal of the plurality of beamformed signals;
  
  detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal;
  
  determine a second plurality of characteristics of a second beamformed signal of the plurality of beamformed signals;
  
  detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal;
  
  select the first beamformed signal, the first beamformed signal corresponding to a first utterance;
  
  cause speech recognition to be performed on the first beamformed signal;
  
  determine an end of the speech represented in the first beamformed signal using the first plurality of characteristics of the first beamformed signal;
  
  perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal;
  
  select the second beamformed signal during performance of speech recognition on the first beamformed signal, the second beamformed signal corresponding to a second utterance;
  
  cause speech recognition to be performed on the second beamformed signal; and
  
  determine an end of the speech represented in the second beamformed signal using the second plurality of characteristics of the second beamformed signal.
- View Dependent Claims (2, 3, 4)
- - 2. The computing device of claim 1, wherein the first plurality of characteristics of the first beamformed signal includes a classification of the first beamformed signal, wherein the classification of the first beamformed signal comprises a classification of the first beamformed signal as speech.
  - 3. The computing device of claim 1, wherein the first utterance and the second utterance are overlapping in time.
  - 4. The computing device of claim 1, wherein the at least one processor is configured to select the first beamformed signal based at least in part on the speech of the first beamformed signal and a comparison of a volume of the first beamformed signal to a volume of the second beamformed signal.

5. A method comprising:
- receiving a plurality of audio signals from a microphone array;
  
  processing the plurality of audio signals to generate a plurality of beamformed signals, the plurality of beamformed signals comprising a first beamformed signal corresponding to a first utterance and a second beamformed signal corresponding to a second utterance, wherein the first utterance and second utterance are overlapping in time;
  
  determining a first plurality of characteristics of the first beamformed signal;
  
  detecting, using the first plurality of characteristics, that speech is represented in the first beamformed signal;
  
  determining a second plurality of characteristics of the second beamformed signal;
  
  detecting, using the second plurality of characteristics, that speech is represented in the second beamformed signal;
  
  causing speech recognition to be performed on the first beamformed signal;
  
  performing at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal;
  
  selecting the second beamformed signal during performance of speech recognition on the first beamformed signal; and
  
  causing speech recognition to be performed on the second beamformed signal.
- View Dependent Claims (6, 7, 8, 9)
- - 6. The method of claim 5, wherein the microphone array is a circular array and processing the received audio signals to generate the plurality of beamformed signals comprises using a fixed beamformer.
  - 7. The method of claim 5, further comprising selecting the first beamformed signal based at least in part on a comparison of energy levels of the first beamformed signal and second beamformed signal.
  - 8. The method of claim 5, further comprising:
    - determining an end of the speech represented in the first beamformed signal using the first plurality of characteristics.
  - 9. The method of claim 8, wherein determining the end of the speech represented in the first beamformed signal comprises one or more of:
    - determining an energy change in the first beamformed signal or identifying a third beamformed signal with an energy level higher than the energy level of the first beamformed signal.

10. A computing device, comprising:
- at least one processor; and
  
  a memory device including instructions which, when executed by the at least one processor, cause the computing device to;
  
  receive a plurality of audio signals from a microphone array;
  
  process the plurality of audio signals to generate a plurality of beamformed signals, the plurality of beamformed signals comprising a first beamformed signal corresponding to a first utterance and a second beamformed signal corresponding to a second utterance, wherein the first utterance and second utterance are overlapping in time;
  
  determine a first plurality of characteristics of the first beamformed signal;
  
  detect, using the first plurality of characteristics, that speech is represented in the first beamformed signal;
  
  determine a second plurality of characteristics of the second beamformed signal;
  
  detect, using the second plurality of characteristics, that speech is represented in the second beamformed signal;
  
  cause speech recognition to be performed on the first beamformed signal;
  
  perform at least one operation based at least in part on a result of the speech recognition performed on the first beamformed signal;
  
  select the second beamformed signal during performance of speech recognition on the first beamformed signal; and
  
  cause speech recognition to be performed on the second beamformed signal.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The computing device of claim 10, wherein the microphone array is a circular array and processing the received audio signals to generate the plurality of beamformed signals comprises using a fixed beamformer.
  - 12. The computing device of claim 10, wherein the at least one processor is further configured to select the first beamformed signal based at least in part on a comparison of energy levels of the first beamformed signal and second beamformed signal.
  - 13. The computing device of claim 10, wherein the memory device includes additional instructions which, when executed by the at least one processor, further cause the computing device to:
    - determine an end of the speech represented in the first beamformed signal using the first plurality of characteristics.
  - 14. The computing device of claim 13 wherein the at least one processor is further configured to determine the end of the first speech represented in the first beamformed signal based at least in part on one or more of:
    - an energy change in the first beamformed signal or a third beamformed signal with an energy level higher than the energy level of the first beamformed signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Basye, Kenneth John, Adams, Jeffrey Penrod
Primary Examiner(s)
Patel, Shreyans A

Application Number

US13/775,954
Time in Patent Office

2,059 Days
Field of Search

704233
US Class Current
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/78   Detection of presence or ab...

G10L 25/87   Detection of discrete point...

Direction based end-pointing for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Direction based end-pointing for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links