Accuracy improvement of spoken queries transcription using co-occurrence information
First Claim
1. A method comprising:
- receiving a spoken query;
identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score;
evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising;
identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents;
assigning a co-occurrence score to each respective transcription hypothesis;
evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis;
increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and
selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis;
generating a text query corresponding to the best transcription hypothesis; and
receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate.
-
Citations
24 Claims
-
1. A method comprising:
-
receiving a spoken query; identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score; evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising; identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents; assigning a co-occurrence score to each respective transcription hypothesis; evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis; increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis; generating a text query corresponding to the best transcription hypothesis; and receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18)
-
-
13. A system comprising:
-
a processor; and non-transitory memory storing executable instructions that, when executed by the processor, cause the system to perform; receiving a spoken query; identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score; evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising; identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents; assigning a co-occurrence score to each respective transcription hypothesis; evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis; increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis; generating a text query corresponding to the best transcription hypothesis; and receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis. - View Dependent Claims (19, 20, 21)
-
-
14. One or more non-transitory computer-readable media storing executable instructions that, when executed by a processor, cause a system to:
-
receive a spoken query; identify, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score; evaluate the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising; identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents; assigning a co-occurrence score to each respective transcription hypothesis; evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis; increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis; generate a text query corresponding to the best transcription hypothesis; and receive, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis. - View Dependent Claims (22, 23, 24)
-
Specification