Speech recognition using domain knowledge

US 9,646,606 B2
Filed: 10/08/2013
Issued: 05/09/2017
Est. Priority Date: 07/03/2013
Status: Active Grant

First Claim

Patent Images

1. A method of performing speech recognition that is performed by one or more computers of an automated speech recognizer, the method comprising:

receiving, by the one or more computers, data that indicates multiple candidate transcriptions for an utterance, wherein the one or more computers are in communication with (i) a first search system that provides a search service of a first domain, and (ii) a second search system that provides a search service for a second domain, the second domain being different from the first domain;

for each particular candidate transcription of the candidate transcriptions;

receiving, by the one or more computers, data from the first search system that provides the search service for the first domain, the data from the first search system indicating first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;

determining, by the one or more computers, a first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;

receiving, by the one or more computers, data from the second search system that provides the search service for the second domain, the data from the second search system indicating second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;

determining, by the one or more computers, a second score based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;

providing, by the one or more computers, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results as input to a classifier, wherein the classifier has been trained, using scores that represent characteristics of different search results from different domains, to indicate a likelihood that a transcription is correct based on scores for multiple different domains; and

receiving, by the one or more computers and from the trained classifier, a classifier output in response to at least the first score and the second score, the classifier output indicating a likelihood that the particular candidate transcription is correct;

selecting, by the one or more computers, a transcription for the utterance, from among the multiple candidate transcriptions, based on the classifier outputs; and

providing, by the one or more computers, the transcription as output of the automated speech recognizer.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some implementations, data that indicates multiple candidate transcriptions for an utterance is received. For each of the candidate transcriptions, data relating to use of the candidate transcription as a search query is received, a score that is based on the received data is provided to a trained classifier, and a classifier output for the candidate transcription is received. One or more of the candidate transcriptions may be selected based on the classifier outputs.

Citations

21 Claims

1. A method of performing speech recognition that is performed by one or more computers of an automated speech recognizer, the method comprising:
- receiving, by the one or more computers, data that indicates multiple candidate transcriptions for an utterance, wherein the one or more computers are in communication with (i) a first search system that provides a search service of a first domain, and (ii) a second search system that provides a search service for a second domain, the second domain being different from the first domain;
  
  for each particular candidate transcription of the candidate transcriptions;
  
  receiving, by the one or more computers, data from the first search system that provides the search service for the first domain, the data from the first search system indicating first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  receiving, by the one or more computers, data from the second search system that provides the search service for the second domain, the data from the second search system indicating second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a second score based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  providing, by the one or more computers, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results as input to a classifier, wherein the classifier has been trained, using scores that represent characteristics of different search results from different domains, to indicate a likelihood that a transcription is correct based on scores for multiple different domains; and
  
  receiving, by the one or more computers and from the trained classifier, a classifier output in response to at least the first score and the second score, the classifier output indicating a likelihood that the particular candidate transcription is correct;
  
  selecting, by the one or more computers, a transcription for the utterance, from among the multiple candidate transcriptions, based on the classifier outputs; and
  
  providing, by the one or more computers, the transcription as output of the automated speech recognizer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 19, 20, 21)
- - 2. The method of claim 1, further comprising, for each particular candidate transcription of the candidate transcriptions, receiving, by the one or more computers, data indicating a frequency that users have submitted the particular candidate transcription as a search query;
    - andwherein providing, by the one or more computers, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results as input to the classifier comprises providing, by the one or more computers, the first score, the second score, and a third score that is indicative of the frequency that users have submitted the particular candidate transcription as a search query as input to the classifier.
  - 3. The method of claim 1, wherein determining, by the one or more computers, the first score based on the content comprises determining, by the one or more computers, a score that indicates a degree that one or more of the first search results match the candidate transcription.
  - 4. The method of claim 3, wherein determining, by the one or more computers, the score that indicates the degree that one or more of the first search results match the candidate transcription comprises determining, by the one or more computers, a score indicating a degree that a highest-ranked search result of the first search results matches the candidate transcription.
  - 5. The method of claim 3, wherein determining, by the one or more computers, the score that indicates the degree that one or more of the first search results match the candidate transcription comprises determining, by the one or more computers, a score indicating a degree that a second-ranked search result of the first search results matches the candidate transcription.
  - 6. The method of claim 1, wherein determining, by the one or more computers, the first score based on the first search results comprises accessing, by the one or more computers, a score that is generated by the search service for the first domain.
  - 7. The method of claim 1, further comprising, for each particular candidate transcription of the multiple candidate transcriptions, providing, by the one or more computers, the particular candidate transcription to the search service for the first domain as a query;
    - wherein receiving, by the one or more computers, the data indicating the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises receiving, by the one or more computers, the data indicating the first search results in response to providing the particular candidate transcription to the search service for the first domain as a query.
  - 8. The method of claim 1, wherein selecting, by the one or more computers, the transcription utterance, from among the multiple candidate transcriptions comprises:
    - ranking, by the one or more computers, the multiple candidate transcriptions based on the classifier outputs; and
      
      selecting, by the one or more computers, the highest-ranked candidate transcription.
  - 9. The method of claim 1, wherein providing, by the one or more computers, (i) the first score determined based on the first search results and (ii) the second score determined based on the second search results as input to the classifier comprises:
    - providing, by the one or more computers, the first score and the second score as input to a trained maximum entropy classifier.
  - 10. The method of claim 1, wherein the trained classifier has been trained using (i) a first set of scores corresponding to first features relevant to the first domain and (ii) a second set of feature scores for second features relevant to the second domain, wherein at least some of the second features are different from the first features.
  - 13. The method of claim 1, wherein receiving, by the one or more computers, the data indicating the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises:
    - receiving, by the one or more computers, data indicating search results that the search service for the first domain identifies in a first data collection; and
      
      wherein receiving, by the one or more computers, the data indicating the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription comprises;
      
      receiving, by the one or more computers, data indicating search results that the search service for the second domain identifies in a second data collection that is different from the first data collection.
  - 14. The method of claim 1, wherein receiving, by the one or more computers, the data indicating the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises:
    - receiving, by the one or more computers, data indicating first search results that the search service for the first domain identifies in a first data collection, wherein the first data collection is selected from a group consisting of (i) data associated with web documents, (ii) data associated with a set of media items, (iii) data associated with a set of applications, and (iv) data associated with a set of voice commands; and
      
      wherein receiving, by the one or more computers, the data indicating the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription comprises;
      
      receiving, by the one or more computers, data indicating search results that the search service for the second domain identifies in a second data collection, wherein the second data collection is different from the first data collection and is selected from the group consisting of (i) data associated with web documents, (ii) data associated with a set of media items, (iii) data associated with a set of applications, and (iv) data associated with a set of voice commands.
  - 15. The method of claim 1, wherein receiving, by the one or more computers, the data indicating the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises:
    - receiving, by the one or more computers, data indicating search results from a search service that provides a first type of information; and
      
      wherein receiving, by the one or more computers, the data indicating the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription comprises;
      
      receiving, by the one or more computers, data indicating search results from a search service that provides a type of information different from the first type of information.
  - 16. The method of claim 1, further comprising, for each particular candidate transcription of the candidate transcriptions:
    - determining, by the one or more computers, whether the particular candidate transcription matches at least one of a set of one or more predetermined semantic patterns; and
      
      determining, by the one or more computers, a third score that indicates whether the particular candidate transcription matches at least one of a set of one or more predetermined semantic patterns;
      
      wherein providing, by the one or more computers, the first score and the second score as input to the classifier comprises providing, by the one or more computers, the first score, the second score, and the third score that indicates whether the particular candidate transcription matches at least one of a set of one or more predetermined semantic patterns as input to the classifier.
  - 17. The method of claim 1, wherein the search results in the first set of search results are ranked;
    - andwherein determining, by the one or more computers, the first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises;
      
      determining, by the one or more computers, as the first score, a score that is indicative of one or more characteristics of the search result that occurs at a particular predetermined position in the ranking of the first search results.
  - 19. The method of claim 1, further comprising, for each particular candidate transcription:
    - determining, by the one or more computers, a plurality of feature scores that each indicate a degree of match between the particular candidate transcription and a different grammar;
      
      wherein providing, by the one or more computers, the scores as input to the classifier comprises providing, by the one or more computers, the first score, the second score, and the plurality of feature scores to the classifier, the classifier having been trained using scores indicating characteristics of search results and scores indicating degrees of matches between transcriptions and the different grammars; and
      
      wherein receiving, by the one or more computers, the classifier output comprises receiving a classifier output determined based on the first score, the second score, the first domain-specific query submission score, and the second domain-specific query submission score.
  - 20. The method of claim 1, further comprising, for each particular candidate transcription:
    - determining, by the one or more computers, a first domain-specific query submission score that indicates a frequency that the particular candidate transcription was submitted as a query directed to the first domain; and
      
      determining, by the one or more computers, a second domain-specific query submission score that indicates a frequency that the particular candidate transcription was submitted as a query directed to the second domain;
      
      wherein providing, by the one or more computers, the scores as input to the classifier comprises providing, by the one or more computers, the first score, the second score, the first domain-specific query submission score, and the second domain-specific query submission score to the classifier, the classifier having been trained using examples of scores indicating characteristics of search results for multiple domains and using scores indicating domain-specific query submission scores for multiple domains; and
      
      wherein receiving, by the one or more computers, the classifier output comprises receiving, by the one or more computers, a classifier output determined based on the first score, the second score, the first domain-specific query submission score, and the second domain-specific query submission score.
  - 21. The method of claim 1, further comprising, for each particular candidate transcription:
    - determining, by the one or more computers, a first plurality of scores that includes the first score, each of the first plurality of scores being determined based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription, wherein the first plurality of scores includes at least (ii) an aggregate measure that is determined based on multiple search results of the first search results, and (i) an individual measure that is determined based on only the characteristics of a single search result in the first search results;
      
      determining, by the one or more computers, a second plurality of scores that includes the second score, each of the second plurality of scores being determined based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription, wherein the second plurality of scores includes at least (ii) an aggregate measure that is determined based on multiple search results of the second search results, and (i) an individual measure that is determined based on only the characteristics of a single search result in the second search results;
      
      wherein providing, by the one or more computers, the scores as input to the classifier comprises providing, by the one or more computers, the first plurality of scores and the second plurality of scores to the trained classifier, wherein at least one of the second plurality of scores indicates a search result characteristic that is not indicated by the first plurality of scores; and
      
      wherein receiving, by the one or more computers, the classifier output comprises receiving, by the one or more computers, a classifier output determined based on the first plurality of scores and the second plurality of scores.

11. An automated speech recognizer system comprising:
- one or more computers in communication with (i) a first search system that provides a search service for a first domain, and (ii) a second search system that provides a search service for a second domain, the second domain being different from the first domain; and
  
  one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by the one or more computers, data that indicates multiple candidate transcriptions for an utterance;
  
  for each particular candidate transcription of the candidate transcriptions;
  
  providing, by the one or more computers, first query data to the first search system, wherein the first query data specifies the particular candidate transcription as a query to the search service for the first domain;
  
  receiving, by the one or more computers, data from the first search system indicating first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  providing, by the one or more computers, second query data to the second search system, wherein the second query data specifies the particular candidate transcription as a query to the search service for the second domain;
  
  receiving, by the one or more computers, data from the second search system indicating second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a second score based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  providing, by the one or more computers, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results as input to a classifier, wherein the classifier has been trained, using scores that represent characteristics of different search results from different domains, to indicate a likelihood that a transcription is correct based on scores for multiple different domains; and
  
  receiving, by the one or more computers and from the trained classifier, a classifier output in response to at least the first score and the second score, the classifier output indicating a likelihood that the particular candidate transcription is correct;
  
  selecting, by the one or more computers, a transcription for the utterance, from among the multiple candidate transcriptions, based on the classifier outputs; and
  
  providing, by the one or more computers, the transcription as output of the automated speech recognizer system.

12. A non-transitory computer-readable storage device encoded with a computer program, the program comprising instructions that, when executed by one or more computers of an automated speech recognizer, cause the one or more computers to perform operations comprising:
- receiving, by the one or more computers, data that indicates multiple candidate transcriptions for an utterance, wherein the one or more computers are in communication with (i) a first search system that provides a search service of a first domain, and (ii) a second search system that provides a search service for a second domain, the second domain being different from the first domain;
  
  for each particular candidate transcription of the candidate transcriptions;
  
  providing, by the one or more computers, first query data to the search service for the first domain, wherein the first query data specifies the particular candidate transcription as a query to the search service for the first domain;
  
  receiving, by the one or more computers, data from the first search system that provides the search service for the first domain, the data from the first search system indicating first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  providing, by the one or more computers, second query data to the search service for the second domain, wherein the second query data specifies the particular candidate transcription as a query to the search service for the second domain;
  
  receiving, by the one or more computers, data from the second search system that provides the search service for the second domain, the data from the second search system indicating second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  determining, by the one or more computers, a second score based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  providing, by the one or more computers, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results as input to a classifier, wherein the classifier has been trained, using scores that represent characteristics of different search results from different domains, to indicate a likelihood that a transcription is correct based on scores for multiple different domains; and
  
  receiving, by the one or more computers and from the trained classifier, a classifier output in response to at least the first score and the second score, the classifier output indicating a likelihood that the particular candidate transcription is correct;
  
  selecting, by the one or more computers, a transcription for the utterance, from among the multiple candidate transcriptions, based on the classifier outputs; and
  
  providing, by the one or more computers, the transcription as output of the automated speech recognizer.

18. A method of performing speech recognition that is performed by data processing apparatus of an automated speech recognizer, the method comprising:
- receiving, by the data processing apparatus, data that indicates multiple candidate transcriptions for an utterance, wherein the data processing apparatus is in communication with (i) a first search system that provides a search service of a first domain, and (ii) a second search system that provides a search service for a second domain, the second domain being different from the first domain;
  
  for each particular candidate transcription of the candidate transcriptions;
  
  receiving, by the data processing apparatus, data from the first search system that provides the search service for the first domain, the data from the first search system indicating first search results that the search service for the first domain identifies as relevant to the particular candidate transcription;
  
  determining, by the data processing apparatus, a first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription, wherein the search results in the first set of search results are ranked, wherein determining the first score based on the first search results that the search service for the first domain identifies as relevant to the particular candidate transcription comprises;
  
  determining, by the data processing apparatus and as the first score, a score that is indicative of one or more characteristics of the search result that occurs at a particular predetermined position in the ranking of the first search results;
  
  receiving, by the data processing apparatus, data from the second search system that provides the search service for the second domain, the data from the second search system indicating second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  determining, by the data processing apparatus, a second score based on the second search results that the search service for the second domain identifies as relevant to the particular candidate transcription;
  
  providing, by the data processing apparatus and to a trained classifier, (i) the first score that is determined based on the first search results and (ii) the second score that is determined based on the second search results; and
  
  receiving, by the data processing apparatus and from the trained classifier, a classifier output in response to at least the first score and the second score, the classifier output indicating a likelihood that the candidate transcription is a correct transcription;
  
  selecting, by the data processing apparatus, from among the multiple candidate transcriptions based on the classifier outputs; and
  
  providing, by the data processing apparatus, the transcription as output of the automated speech recognizer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Peng, Fuchun, Shahshahani, Ben, Roy, Howard Scott
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Tzeng, Forrest F

Application Number

US14/048,199
Publication Number

US 20150012271A1
Time in Patent Office

1,309 Days
Field of Search

704246, 704251, 704257, 704270, 704275
US Class Current
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

Speech recognition using domain knowledge

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using domain knowledge

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links