Speech end-pointer

US 8,554,564 B2
Filed: 04/25/2012
Issued: 10/08/2013
Est. Priority Date: 06/15/2005
Status: Active Grant

First Claim

Patent Images

1. A speech end-pointer system, comprising:

a computer processor;

a voice triggering module configured to identify a portion of an audio stream comprising a speech segment; and

a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by the computer processor to analyze the audio stream and detect a beginning and an end of the speech segment, where the plurality of rules comprises one or more rules based on an energy counter;

where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream; and

where the computer processor is configured to determine whether a frame of the audio stream has energy above a background noise level and increment the energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A rule-based end-pointer isolates spoken utterances contained within an audio stream from background noise and non-speech transients. The rule-based end-pointer includes a plurality of rules to determine the beginning and/or end of a spoken utterance based on various speech characteristics. The rules may analyze an audio stream or a portion of an audio stream based upon an event, a combination of events, the duration of an event, or a duration relative to an event. The rules may be manually or dynamically customized depending upon factors that may include characteristics of the audio stream itself, an expected response contained within the audio stream, or environmental conditions.

126 Citations

20 Claims

1. A speech end-pointer system, comprising:
- a computer processor;
  
  a voice triggering module configured to identify a portion of an audio stream comprising a speech segment; and
  
  a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by the computer processor to analyze the audio stream and detect a beginning and an end of the speech segment, where the plurality of rules comprises one or more rules based on an energy counter;
  
  where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream; and
  
  where the computer processor is configured to determine whether a frame of the audio stream has energy above a background noise level and increment the energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, where the plurality of rules includes a rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between the energy counter and a threshold.
  - 3. The system of claim 1, where the plurality of rules includes a rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between a lack of energy counter and a threshold.
  - 4. The system of claim 1, where the plurality of rules includes a rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between an isolated energy event counter and a threshold.
  - 5. The system of claim 1, where the plurality of rules includes a first rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between the energy counter and a first threshold, and a second rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between a lack of energy counter and a second threshold.
  - 6. The system of claim 1, where the plurality of rules includes a first rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between the energy counter and a first threshold, a second rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between a lack of energy counter and a second threshold, and a third rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between an isolated energy event counter and a third threshold.
  - 7. The system of claim 1, where the plurality of rules comprises one or more rules based on a lack of energy counter;
    - where the computer processor is configured to increment the lack of energy counter by the length of the frame in response to a determination that the frame does not have energy above the background noise level.
  - 8. The system of claim 7, where the computer processor is configured to execute the rule module and set the beginning of the speech segment or the end of the speech segment in response to a determination that the frame has energy above the background noise level and the energy counter is above a continuous non-voiced energy threshold.
  - 9. The system of claim 7, where the computer processor is configured to execute the rule module and set the beginning of the speech segment or the end of the speech segment in response to a determination that the frame does not have energy above the background noise level and the lack of energy counter is above a continuous silence threshold.
  - 10. The system of claim 1, where the plurality of rules comprises a rule based on an isolated energy event counter;
    - where the computer processor is configured to execute the rule module and set the beginning of the speech segment or the end of the speech segment in response to a determination that the isolated energy event counter is above a maximum allowed isolated energy event threshold.
  - 11. The system of claim 10, where the computer processor is configured to execute the rule module and increment the isolated energy event counter in response to an identification of a plosive surrounded by silence in the audio stream.

12. A speech end-pointing method, comprising:
- receiving an audio stream;
  
  analyzing energy and noise characteristics of a frame of the audio stream by a computer processor to determine whether the frame has energy above a background noise level;
  
  incrementing an energy counter by a length of the frame in response to a determination by the computer processor that the frame has energy above the background noise level;
  
  incrementing a lack of energy counter by the length of the frame in response to a determination by the computer processor that the frame does not have energy above the background noise level; and
  
  applying a plurality of rules by the computer processor to detect a beginning and an end of a speech segment of the audio stream based on the energy counter and the lack of energy counter.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The method of claim 12, where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream.
  - 14. The method of claim 12, where the plurality of rules includes a rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between the energy counter and a first threshold, and where the plurality of rules includes a second rule configured to set the beginning of the speech segment or the end of the speech segment based on a comparison between the lack of energy counter and a second threshold.
  - 15. The method of claim 12, where the step of applying the plurality of rules comprises setting the beginning of the speech segment or the end of the speech segment in response to a determination that the frame has energy above the background noise level and the energy counter is above a continuous non-voiced energy threshold.
  - 16. The method of claim 12, where the step of applying the plurality of rules comprises setting the beginning of the speech segment or the end of the speech segment in response to a determination that the frame does not have energy above the background noise level and the lack of energy counter is above a continuous silence threshold.
  - 17. The method of claim 12, further comprising setting the beginning of the speech segment or the end of the speech segment by the computer processor in response to a determination that an isolated energy event counter is above a maximum allowed isolated energy event threshold.
  - 18. The method of claim 17, further comprising incrementing the isolated energy event counter in response to an identification by the computer processor of a plosive surrounded by silence in the audio stream.
  - 19. The method of claim 12, further comprising:
    - resetting the lack of energy counter in response to the determination by the computer processor that the frame has energy above the background noise level; and
      
      resetting the energy counter in response to the determination by the computer processor that the frame does not have energy above the background noise level.

20. A non-transitory computer-readable medium with instructions stored thereon, where the instructions are executable by a computer processor to cause the computer processor to perform the steps of:
- receiving an audio stream;
  
  analyzing energy and noise characteristics of a frame of the audio stream to determine whether the frame has energy above a background noise level;
  
  incrementing an energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level;
  
  incrementing a lack of energy counter by the length of the frame in response to a determination that the frame does not have energy above the background noise level; and
  
  applying a plurality of rules to detect a beginning and an end of a speech segment of the audio stream based on the energy counter and the lack of energy counter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
QNX Software Systems Limited (Canada) (Blackberry Limited)
Inventors
Hetherington, Phil, Escott, Alex
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/455,886
Publication Number

US 20120265530A1
Time in Patent Office

531 Days
Field of Search

None
US Class Current

704/253
CPC Class Codes

G10L 25/87 Detection of discrete point...

Speech end-pointer

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

126 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech end-pointer

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

126 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links