Methods and apparatus for buffering data for use in accordance with a speech recognition system

US 8,781,832 B2
Filed: 03/26/2008
Issued: 07/15/2014
Est. Priority Date: 08/22/2005
Status: Active Grant

First Claim

Patent Images

1. A method for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, the method comprising acts of:

continuously recording acoustic data in a circular buffer;

when an indication that the speech recognition system is being addressed is detected, starting recording of acoustic data in a second buffer that is separate from the circular buffer;

obtaining combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; and

analyzing the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the act of analyzing the combined acoustic data comprises an act of identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. The acoustic data is recorded using a combination of a first buffer area and a second buffer area, such that the recording of the acoustic data using the combination of the two buffer areas at least substantially minimizes one or more truncation errors associated with operation of the speech recognition system.

60 Citations

View as Search Results

30 Claims

1. A method for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, the method comprising acts of:
- continuously recording acoustic data in a circular buffer;
  
  when an indication that the speech recognition system is being addressed is detected, starting recording of acoustic data in a second buffer that is separate from the circular buffer;
  
  obtaining combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; and
  
  analyzing the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the act of analyzing the combined acoustic data comprises an act of identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the act of obtaining combined acoustic data comprises an act of forming a composite buffer area comprising the first acoustic data prepended to the second acoustic data.
  - 3. The method of claim 2, wherein:
    - the composite buffer area contains, at a start of the first acoustic data prepended to the second acoustic data, oldest acoustic data in the circular buffer;
      
      acoustic data recorded in the circular buffer immediately before the indication that the speech recognition system is being addressed ends the first acoustic data; and
      
      in the composite buffer area, the acoustic data recorded in the circular buffer immediately before the indication that the speech recognition system is being addressed is contiguous in memory with acoustic data which is recorded in the second buffer immediately following the indication that the speech recognition system is being addressed.
  - 4. The method of claim 2, wherein the act of analyzing the combined acoustic data comprises processing acoustic data in the composite buffer area to detect one or more features indicating silence.
  - 5. The method of claim 4, wherein a location in the region of silence closest to the reference location is used as a location in the composite buffer area at which speech intended for the speech recognition system to process begins.
  - 6. The method of claim 2, further comprising an act of decoding acoustic data in the composite buffer area into text.
  - 7. The method of claim 2, wherein the act of forming the composite buffer area comprises:
    - copying the first acoustic data recorded in the circular buffer to the composite buffer area.
  - 8. The method of claim 1, wherein the region of silence closest to the reference location is in the first acoustic data if the indication that the speech recognition system is being addressed was given after speech started.
  - 9. The method of claim 1, wherein the recording of acoustic data in the second buffer continues until an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer.
  - 10. The method of claim 1, further comprising:
    - stopping recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer;
      
      stopping recording of acoustic data in the second buffer and restarting recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer.
  - 11. The method of claim 10, wherein the indication that the speech recognition system is being addressed comprises a microphone on event, and the indication that the speech recognition system is no longer being addressed comprises a microphone off event.
  - 12. The method of claim 1, wherein the second buffer comprises a linear buffer.
  - 13. The method of claim 1, wherein the circular buffer and the second buffer are at least part of a single storage data structure.
  - 14. The method of claim 1, wherein the circular buffer and the second buffer are at least part of separate storage data structures.

15. Apparatus for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, comprising:
- at least one memory comprising a circular buffer and a second buffer that is separate from the circular buffer; and
  
  at least one processor coupled to the memory and operative to;
  
  continuously record acoustic data in the circular buffer;
  
  when an indication that the speech recognition system is being addressed is detected, start recording of acoustic data in a second buffer;
  
  obtain combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; and
  
  analyze the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the at least one processor is further operative to analyze the combined acoustic data at least in part by identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location.
- View Dependent Claims (16, 17, 18)
- - 16. The apparatus of claim 15, wherein prepending the first acoustic data comprises copying the acoustic data recorded in the circular buffer to a composite buffer area such that the composite buffer area comprises the first acoustic data prepended to the second acoustic data.
  - 17. The apparatus of claim 15, wherein the region of silence closest to the reference location is in the first acoustic data if the indication that the speech recognition system is being addressed was given after speech started.
  - 18. The apparatus of claim 15, wherein the at least one processor is further operative to:
    - stop recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer; and
      
      stop recording of acoustic data in the second buffer and restart recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer.

19. At least one article of manufacture for use in processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, comprising at least one machine readable medium having encoded thereon one or more programs which when executed implement acts of:
- continuously recording acoustic data in a circular buffer;
  
  when an indication that the speech recognition system is being addressed is detected, starting recording of acoustic data in a second buffer that is separate from the circular buffer;
  
  obtaining combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; and
  
  analyzing the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the act of analyzing the combined acoustic data comprises an act of identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location.
- View Dependent Claims (20, 21)
- - 20. The at least one article of manufacture of claim 19, wherein prepending the first acoustic data comprises copying the acoustic data recorded in the circular buffer to a composite buffer area such that the composite buffer area comprises the first acoustic data prepended to the second acoustic data.
  - 21. The at least one article of manufacture of claim 19, wherein the one or more programs further implement:
    - stopping recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer; and
      
      stopping recording of acoustic data in the second buffer and restarting recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer.

22. A method for processing acoustic data in accordance with a speech recognition system, the method comprising acts of:
- recording acoustic data in at least one recording medium;
  
  detecting, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;
  
  searching in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; and
  
  identifying a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein;
  
  if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; and
  
  if the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location.
- View Dependent Claims (23, 24)
- - 23. The method of claim 22, further comprising:
    - detecting, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;
      
      continuing to record acoustic data after the second time; and
      
      performing speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time.
  - 24. The method of claim 23, further comprising:
    - searching for acoustic data representing silence in the acoustic data recorded after the second time;
      
      identifying a third location having acoustic data representing silence; and
      
      performing speech recognition processing on the recorded acoustic data between the second and third locations.

25. A system for processing acoustic data in accordance with a speech recognition system, the system comprising:
- at least one memory for storing executable instructions;
  
  at least one processor programmed by the executable instructions to;
  
  record acoustic data in at least one recording medium;
  
  detect, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;
  
  search in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; and
  
  identify a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein;
  
  if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; and
  
  if the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location.
- View Dependent Claims (26, 27)
- - 26. The system of claim 25, wherein the at least one processor is further programmed to:
    - detect, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;
      
      continue to record acoustic data after the second time; and
      
      perform speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time.
  - 27. The system of claim 26, wherein the at least one processor is further programmed to:
    - search for acoustic data representing silence in the acoustic data recorded after the second time;
      
      identify a third location having acoustic data representing silence; and
      
      perform speech recognition processing on the recorded acoustic data between the second and third locations.

28. At least one computer readable memory encoded with instructions that, when executed, perform a method for processing acoustic data in accordance with a speech recognition system, the method comprising acts of:
- recording acoustic data in at least one recording medium;
  
  detecting, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;
  
  searching in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; and
  
  identifying a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein;
  
  if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; and
  
  if the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location.
- View Dependent Claims (29, 30)
- - 29. The at least one computer readable memory of claim 28, wherein the method further comprises:
    - detecting, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;
      
      continuing to record acoustic data after the second time; and
      
      performing speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time.
  - 30. The at least one computer readable memory of claim 29, wherein the method further comprises:
    - searching for acoustic data representing silence in the acoustic data recorded after the second time;
      
      identifying a third location having acoustic data representing silence; and
      
      performing speech recognition processing on the recorded acoustic data between the second and third locations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Comerford, Liam D., Frank, David Carl, Lewis, Burn L., Rachevksy, Leonid, Viswanathan, Mahesh
Primary Examiner(s)
He, Jialong

Application Number

US12/056,001
Publication Number

US 20080172228A1
Time in Patent Office

2,302 Days
Field of Search

704/248, 704/253
US Class Current

704/253
CPC Class Codes

G10L 15/28 Constructional details of s...

Methods and apparatus for buffering data for use in accordance with a speech recognition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

30 Claims

Specification

Use Cases

Quick Links

Others

Methods and apparatus for buffering data for use in accordance with a speech recognition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

30 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others