Speech end-pointer

US 8,170,875 B2
Filed: 06/15/2005
Issued: 05/01/2012
Est. Priority Date: 06/15/2005
Status: Active Grant

First Claim

Patent Images

1. A system for determining at least one of a beginning or an end of a speech segment, the system comprising:

a computer processing unit configured to access a memory to determine at least one of the beginning or the end of the speech segment, where the memory comprises,a voice triggering module executable on the computer processing unit to identify a triggering characteristic in a speech segment of an audio stream; and

a rule module executable on the computer processing unit and in communication with the voice triggering module, the rule module comprising a first rule that counts a number of isolated energy events preceding the triggering characteristic, and a second rule that determines that a frame of the audio stream that precedes the triggering characteristic is outside of the beginning or the end of the speech segment when a number of allowed isolated energy events in the audio stream preceding the trigger characteristic is exceeded.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A rule-based end-pointer isolates spoken utterances contained within an audio stream from background noise and non-speech transients. The rule-based end-pointer includes a plurality of rules to determine the beginning and/or end of a spoken utterance based on various speech characteristics. The rules may analyze an audio stream or a portion of an audio stream based upon an event, a combination of events, the duration of an event, or a duration relative to an event. The rules may be manually or dynamically customized depending upon factors that may include characteristics of the audio stream itself, an expected response contained within the audio stream, or environmental conditions.

134 Citations

17 Claims

1. A system for determining at least one of a beginning or an end of a speech segment, the system comprising:
- a computer processing unit configured to access a memory to determine at least one of the beginning or the end of the speech segment, where the memory comprises,a voice triggering module executable on the computer processing unit to identify a triggering characteristic in a speech segment of an audio stream; and
  
  a rule module executable on the computer processing unit and in communication with the voice triggering module, the rule module comprising a first rule that counts a number of isolated energy events preceding the triggering characteristic, and a second rule that determines that a frame of the audio stream that precedes the triggering characteristic is outside of the beginning or the end of the speech segment when a number of allowed isolated energy events in the audio stream preceding the trigger characteristic is exceeded.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, where the triggering characteristic comprises a vowel.
  - 3. The system of claim 1, where the triggering characteristic comprises an S or X sound.
  - 4. The system of claim 1, where the rule module analyzes a lack of energy in the speech segment of the audio stream before or after the triggering characteristic.
  - 5. The system of claim 1, where the rule module analyzes energy in the speech segment of the audio stream before or after the triggering characteristic.
  - 6. The system of claim 1, where the rule module analyzes an elapsed time in speech segment of the audio stream before or after the triggering characteristic.
  - 7. The system of claim 1, where the rule module detects the beginning and end of the speech segment.

8. A method of determining at least one of a beginning or end of an audio speech segment, the method comprising:
- receiving a portion of an audio stream that includes a speech segment;
  
  identifying a triggering characteristic in the speech segment;
  
  applying at least one decision rule to the speech segment of the audio stream to count a number of isolated energy events in the audio stream that precede the triggering characteristic; and
  
  determining that a frame of the audio stream is outside of an endpoint of the speech segment when a number of allowed isolated energy events is exceeded.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, where the triggering characteristic comprises a vowel.
  - 10. The method of claim 8, where the triggering characteristic comprises an S or X sound.
  - 11. The method of claim 8, further comprising analyzing a lack of energy in one or more frames before or after the speech segment of the audio stream that includes the triggering characteristic.
  - 12. The method of claim 8, further comprising analyzing energy in one or more frames before or after the speech segment of the audio stream that includes the triggering characteristic.
  - 13. The method of claim 8, further comprising analyzing an elapsed time in the one or more frames before or after the portion of the audio stream that includes the triggering characteristic.
  - 14. The method of claim 8, further comprising detecting the beginning and end of the audio speech segment.

15. A system for determining at least one of a beginning or an end of an audio speech segment in an audio stream, the system comprising:
- a computer processing unit configured to access a memory to determine at least one of the beginning or the end of the audio speech segment in the audio stream, where the memory comprises,a voice triggering module executable on the computer processing unit to identify a portion of the audio stream comprising a periodic audio signal; and
  
  an end-pointer module executable on the computer processing unit and in communication with the voice triggering module, the end-pointer module configured to vary an amount of the audio stream input to a recognition device based on a plurality of rules, where the end-pointer module is further configured to determine whether one or more portions of the audio stream before or after the portion of the audio stream comprising the periodic audio signal contain speech by applying a rule that counts a number of isolated energy events in the audio stream and upon determination that more than a predetermined number of isolated energy events after the portion of the audio stream comprising the periodic audio signal occurred identifies a frame immediately preceding a last isolated energy event as the end of the audio speech segment, to exclude, from the audio speech segment input to the recognition device, a portion of the audio stream that contains one or more isolated energy events.

16. A non-transitory computer readable medium having stored therein data representing instructions executable by a programmed processor for determining at least one of a beginning or end of an audio speech segment, the non-transitory computer readable medium comprising instructions operative for:
- converting sound waves associated with an audio speech segment into electrical signals;
  
  analyzing the electrical signals to identify a periodic portion of the audio speech segment;
  
  analyzing the electrical signals to identify isolated energy events in the audio speech segment;
  
  counting a number of individual isolated energy events in the audio speech segment; and
  
  setting the end of the audio speech segment, upon determination that more than a predetermined number of individual isolated energy events occurred after the periodic portion of the audio speech segment, to exclude isolated energy events occurring after the predetermined number of isolated energy events.
- View Dependent Claims (17)
- - 17. The non-transitory computer readable medium of claim 16, further comprising setting a beginning of the audio speech segment upon determination that more than a predetermined number of individual isolated energy events occurred before the periodic portion of the audio speech segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
QNX Software Systems Limited (Canada) (Blackberry Limited)
Inventors
Hetherington, Phil, Escott, Alex
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US11/152,922
Publication Number

US 20060287859A1
Time in Patent Office

2,512 Days
Field of Search

704/253, 704/248
US Class Current

704/253
CPC Class Codes

G10L 25/87 Detection of discrete point...

Speech end-pointer

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

134 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speech end-pointer

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

134 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links