Adapting enhanced acoustic models

US 9,263,034 B1
Filed: 07/13/2010
Issued: 02/16/2016
Est. Priority Date: 07/13/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving voice queries submitted by at least a first user and a second user that is different from the first user;

for each of the voice queries submitted by at least the first user and the second user, generating a score indicative of a probability that a transcription of the voice query is correct, wherein the score is generated based at least on feedback information that indicates an action, other than an explicit selection of the transcription of the voice query, taken by a respective user that submitted the voice query after reviewing the transcription of the voice query,wherein, for at least a particular voice query of the voice queries, the score is generated based on feedback indicating that the user that submitted the particular voice query has selected a search result provided by a search engine, wherein the search engine provided the search result in response to receiving the transcription of the particular voice query as input to the search engine;

selecting a subset of the voice queries whose scores satisfy a threshold; and

adapting an acoustic model using the subset of the voice queries.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving voice queries, obtaining, for one or more of the voice queries, feedback information that references an action taken by a user that submitted the voice query after reviewing a result of the voice query, generating, for the one or more voice queries, a posterior recognition confidence measure that reflects a probability that the voice query was correctly recognized, wherein the posterior recognition confidence measure is generated based at least on the feedback information for the voice query, selecting a subset of the one or more voice queries based on the posterior recognition confidence measures, and adapting an acoustic model using the subset of the voice queries.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving voice queries submitted by at least a first user and a second user that is different from the first user;
  
  for each of the voice queries submitted by at least the first user and the second user, generating a score indicative of a probability that a transcription of the voice query is correct, wherein the score is generated based at least on feedback information that indicates an action, other than an explicit selection of the transcription of the voice query, taken by a respective user that submitted the voice query after reviewing the transcription of the voice query,wherein, for at least a particular voice query of the voice queries, the score is generated based on feedback indicating that the user that submitted the particular voice query has selected a search result provided by a search engine, wherein the search engine provided the search result in response to receiving the transcription of the particular voice query as input to the search engine;
  
  selecting a subset of the voice queries whose scores satisfy a threshold; and
  
  adapting an acoustic model using the subset of the voice queries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, comprising processing a third voice query submitted by a different, third user using the adapted acoustic model.
  - 3. The method of claim 1, wherein the acoustic model is not adapted using the voice queries whose scores do not satisfy the threshold.
  - 4. The method of claim 1, wherein the feedback information indicates that the user that submitted the voice query did not select the transcription of the voice query within a predetermined period of time after reviewing the transcription.
  - 5. The method of claim 1, wherein the feedback information indicates whether the user that submitted the voice query submitted another voice query within a predetermined period of time after reviewing the transcription.
  - 6. The method of claim 1, wherein the feedback information indicates whether the user that submitted the voice query revealed a list of alternate transcriptions of the voice query, after reviewing the transcription.
  - 7. The method of claim 1, wherein the score is further based on a confidence measure generated by an automated speech recognizer for the transcription of the voice query.
  - 8. The method of claim 1, wherein the receiving the voice queries comprises receiving voice queries input through respective user devices;
    - andwherein the feedback information for a transcription indicates non-verbal user input that was provided, to the respective user device through which the voice query was received, after the user that submitted the voice query reviewed the transcription of the voice query.
  - 9. The method of claim 1, wherein generating a score for each of the voice queries comprises generating scores based on multiple different types of user actions indicated by feedback information, each of the multiple different types of user actions being taken by a respective user that submitted the voice query after the user reviewed the transcriptions of the voice query.
  - 10. The method of claim 9, wherein generating the scores based on the multiple different types of user actions comprises making different adjustments to the scores based on the type of user action performed by the user that submitted the voice query.
  - 11. The method of claim 1, wherein generating a score for each of the voice queries comprises generating, for at least one of the voice queries, a score based on user feedback that indicates a user action that implicitly indicates that the transcription is correct.
  - 12. The method of claim 1, wherein generating a score for each of the voice queries comprises:
    - determining, for at least one of the voice queries, a score indicative of a probability that a transcription of the voice query is correct, wherein the score is generated based on user feedback indicating that, after submitting the voice query, the user that submitted the voice query did not submit a voice query for at least a predetermined period of time.
  - 13. The method of claim 1, wherein the search result references a resource that the search engine determines to be responsive to the transcription of the voice query received by the search engine.
  - 14. The method of claim 1, wherein the search result includes a Universal Resource Identifier (URI) that refers to a web page.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving voice queries submitted by at least a first user and a second user that is different from the first user;
  
  for each of the voice queries submitted by at least the first user and the second user, generating a score indicative of a probability that a transcription of the voice query is correct, wherein the score is generated based at least on feedback information that indicates an action, other than an explicit selection of the transcription of the voice query, taken by a respective user that submitted the voice query after reviewing the transcription of the voice query,wherein, for at least a particular voice query of the voice queries, the score is generated based on feedback indicating that the user that submitted the particular voice query has selected a search result provided by a search engine, wherein the search engine provided the search result in response to receiving the transcription of the particular voice query as input to the search engine;
  
  selecting a subset of the voice queries whose scores satisfy a threshold; and
  
  adapting an acoustic model using the subset of the voice queries.
- View Dependent Claims (16, 17)
- - 16. The medium of claim 15, wherein the operations comprise processing a third voice query submitted by a different, third user using the adapted acoustic model.
  - 17. The medium of claim 15, wherein the acoustic model is not adapted using the voice queries whose scores do not satisfy the threshold.

18. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving voice queries submitted by at least a first user and a second user that is different from the first user;
  
  for each of the voice queries submitted by at least the first user and the second user, generating a score indicative of a probability that a transcription of the voice query is correct, wherein the score is generated based at least on feedback information that indicates an action, other than an explicit selection of the transcription of the voice query, taken by a respective user that submitted the voice query after reviewing the transcription of the voice query,wherein, for at least a particular voice query of the voice queries,the score is generated based on feedback indicating that the user that submitted the particular voice query has selected a search result provided by a search engine, wherein the search engine provided the search result in response to receiving the transcription of the particular voice query as input to the search engine;
  
  selecting a subset of the voice queries whose scores satisfy a threshold; and
  
  adapting an acoustic model using the subset of the voice queries.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein the operations comprise using the adapted acoustic model to process a third voice query submitted by a different, third user.
  - 20. The system of claim 18, wherein the acoustic model is not adapted using the voice queries whose scores do not satisfy the threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Strope, Brian, Beeferman, Douglas H.
Primary Examiner(s)
Shah, Paras D

Application Number

US12/834,981
Time in Patent Office

2,044 Days
Field of Search

704/231, 704/236, 704/239, 704/251, 704/252
US Class Current

1/1
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/065   Adaptation

G10L 15/07   to the speaker

G10L 15/10   using distance or distortio...

G10L 17/02   Preprocessing operations, e...

Adapting enhanced acoustic models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Adapting enhanced acoustic models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links