Enhanced stability prediction for incrementally generated speech recognition hypotheses

US 20130110492A1
Filed: 05/01/2012
Published: 05/02/2013
Est. Priority Date: 11/01/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving multiple, partial incremental speech recognition hypotheses that are each output by an incremental speech recognizer as a top partial incremental speech recognition hypothesis at a different point in time;

identifying a segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses;

determining a quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment; and

assigning, by the one or more computers, a stability metric to the segment based on the quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the stability of speech recognition results. In one aspect, a method includes determining a length of time, or a number of occasions, in which a word has remained in an incremental speech recognizer'"'"'s top hypothesis, and assigning a stability metric to the word based on the length of time or number of occasions.

Citations

34 Claims

1. A computer-implemented method comprising:
- receiving multiple, partial incremental speech recognition hypotheses that are each output by an incremental speech recognizer as a top partial incremental speech recognition hypothesis at a different point in time;
  
  identifying a segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses;
  
  determining a quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment; and
  
  assigning, by the one or more computers, a stability metric to the segment based on the quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment.
- View Dependent Claims (2, 3, 5, 6, 7, 8, 9, 10, 11, 31, 32)
- - 2. The method of claim 1, wherein assigning the stability metric to the segment is further based on a right context of the segment.
  - 3. The method of claim 1, wherein the segment comprises a word or sub-word.
  - 5. The method of claim 1, wherein the stability metric is assigned to the segment after the one particular partial incremental speech recognition hypothesis that includes the segment has been output by the incremental speech recognizer.
  - 6. The method of claim 1, comprising:
    - receiving an audio signal corresponding to an utterance; and
      
      performing incremental speech recognition on the audio signal to generate the multiple, partial incremental speech recognition hypotheses.
  - 7. The method of claim 1, comprising:
    - displaying the segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses on a user interface, the displayed segment having a visual characteristic;
      
      determining that the stability metric of the segment satisfies a threshold; and
      
      altering the visual characteristic of the displayed segment on the user interface only after determining that the stability metric satisfies the threshold.
  - 8. The method of claim 1, comprising:
    - determining that the stability metric satisfies a threshold; and
      
      translating the segment to a different language only after determining that the stability metric satisfies the threshold.
  - 9. The method of claim 1, comprising:
    - determining that the stability metric satisfies a threshold; and
      
      submitting the segment to a search engine as part of a search query only after determining that the stability metric satisfies the threshold.
  - 10. The method of claim 1, comprising:
    - determining that the stability metric satisfies a threshold; and
      
      displaying a representation of the segment on a user interface only after determining that the stability metric satisfies the threshold.
  - 11. The method of claim 1, wherein the receiving, identifying, determining and assigning are performed by one or more computers.
  - 31. The method of claim 1, wherein identifying the segment comprises identifying a particular segment that does not occur in a partial incremental speech recognition hypothesis that immediately precedes the one particular partial incremental speech recognition hypothesis.
  - 32. The method of claim 1, comprising:
    - determining that the stability metric of the segment from the particular one of the multiple, partial incremental speech recognition hypotheses satisfies a threshold;
      
      upon determining that the segment does not satisfy the threshold, displaying the segment on a user interface, the displayed segment having a first visual characteristic; and
      
      upon determining that the segment satisfies the threshold, displaying the segment on the user interface, the displayed segment having a second visual characteristic different from the first visual characteristic.

4. (canceled)

12. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, if executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving multiple, partial incremental speech recognition hypotheses that are each output by an incremental speech recognizer as a top partial incremental speech recognition hypothesis at a different point in time;
  
  identifying a segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses;
  
  determining a quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment; and
  
  assigning, by the one or more computers, a stability metric to the segment based on the quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment.
- View Dependent Claims (13, 14, 16, 17, 18, 20, 21, 33)
- - 13. The system of claim 12, wherein assigning the stability metric to the segment is further based on a right context of the segment.
  - 14. The system of claim 12, wherein the segment comprises a word or sub-word.
  - 16. The system of claim 12, wherein the stability metric is assigned to the segment after the one particular partial incremental speech recognition hypothesis that includes the segment has been output by the incremental speech recognizer.
  - 17. The system of claim 12, wherein the operations comprise:
    - receiving an audio signal corresponding to an utterance; and
      
      performing incremental speech recognition on the audio signal to generate the multiple, partial incremental speech recognition hypotheses.
  - 18. The system of claim 12, wherein the operations comprise:
    - displaying the segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses on a user interface, the displayed segment having a visual characteristic;
      
      determining that the stability metric of the segment satisfies a threshold; and
      
      altering the visual characteristic of the displayed segment on the user interface only after determining that the stability metric satisfies the threshold.
  - 20. The system of claim 12, wherein the operations comprise:
    - determining that the stability metric satisfies a threshold; and
      
      submitting the segment to a search engine as part of a search query only after determining that the stability metric satisfies the threshold.
  - 21. The method of claim 12, wherein the operations comprise:
    - determining that the stability metric satisfies a threshold; and
      
      displaying a representation of the segment on a user interface only after determining that the stability metric satisfies the threshold.
  - 33. The system of claim 12, wherein identifying the segment comprises identifying a particular segment that does not occur in a partial incremental speech recognition hypothesis that immediately precedes the one particular partial incremental speech recognition hypothesis.

15. (canceled)

19. (canceled)

22. A computer-readable storage device storing software comprising instructions executable by one or more computers, which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving multiple, partial incremental speech recognition hypotheses that are each output by an incremental speech recognizer as a top partial incremental speech recognition hypothesis at a different point in time;
  
  identifying a segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses;
  
  determining a quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment; and
  
  assigning, by the one or more computers, a stability metric to the segment based on the quantity of consecutive, partial incremental speech recognition hypotheses that (i) are output by the incremental speech recognizer as the top partial incremental speech recognition hypotheses at different points in time immediately after the one particular partial incremental speech recognition hypothesis is output, and (ii) include the segment.
- View Dependent Claims (23, 24, 26, 27, 28, 29, 34)
- - 23. The device of claim 22, wherein assigning the stability metric to the segment is further based on a right context of the segment.
  - 24. The device of claim 22, wherein the segment comprises a word or sub-word.
  - 26. The device of claim 22, where the stability metric is assigned to the segment after the one particular partial incremental speech recognition hypothesis that includes the segment has been output by the incremental speech recognizer.
  - 27. The device of claim 22, wherein the operations comprise:
    - receiving an audio signal corresponding to an utterance; and
      
      performing incremental speech recognition on the audio signal to generate the multiple, partial incremental speech recognition hypotheses.
  - 28. The device of claim 22, wherein the operations comprise:
    - displaying the segment that occurs in a particular one of the multiple, partial incremental speech recognition hypotheses on a user interface, the displayed segment having a visual characteristic;
      
      determining that the stability metric of the segment satisfies a threshold; and
      
      altering the visual characteristic of the displayed segment on the user interface only after determining that the stability metric satisfies the threshold.
  - 29. The device of claim 22, wherein the operations comprise:
    - determining that the stability metric satisfies a threshold; and
      
      submitting the segment to a search engine as part of a search query only after determining that the stability metric satisfies the threshold.
  - 34. The device of claim 22, wherein identifying the segment comprises identifying a particular segment that does not occur in a partial incremental speech recognition hypothesis that immediately precedes the one particular partial incremental speech recognition hypothesis.

25. (canceled)

30. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Gruenstein, Alexander H., McGraw, Ian C.

Granted Patent

US 8,909,512 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G03F 7/707   Chucks, e.g. chucking or un...

G03F 7/70708   being electrostatic; Electr...

G03F 7/70875   Temperature, e.g. temperatu...

G03F 7/70908   Hygiene, e.g. preventing ap...

G10L 15/08   Speech classification or se...

G10L 2015/223   Execution procedure of a sp...

H01L 21/6831   using electrostatic chucks

Enhanced stability prediction for incrementally generated speech recognition hypotheses

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced stability prediction for incrementally generated speech recognition hypotheses

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links