Accuracy improvement of spoken queries transcription using co-occurrence information

US 9,330,661 B2
Filed: 01/16/2014
Issued: 05/03/2016
Est. Priority Date: 07/31/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a spoken query;

identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score;

evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising;

identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents;

assigning a co-occurrence score to each respective transcription hypothesis;

evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis;

increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and

selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis;

generating a text query corresponding to the best transcription hypothesis; and

receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate.

Citations

24 Claims

1. A method comprising:
- receiving a spoken query;
  
  identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score;
  
  evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising;
  
  identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents;
  
  assigning a co-occurrence score to each respective transcription hypothesis;
  
  evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis;
  
  increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and
  
  selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis;
  
  generating a text query corresponding to the best transcription hypothesis; and
  
  receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18)
- - 2. The method of claim 1, wherein the co-occurrence score of the given transcription hypothesis represents a measure of semantic relation of the proposed query terms based on identified co-occurrence frequencies including identifying non-sequential co-occurrences of the proposed query terms within the documents in the corpus.
  - 3. The method of claim 2, further comprising:
    - selecting the best transcription hypothesis based on a combination of speech recognition scores and co-occurrence scores from the plurality of transcription hypotheses.
  - 4. The method of claim 1, wherein identifying the frequency that proposed query terms co-occur based on the corpus of documents includes identifying co-occurrences of the proposed query terms within a predetermined number of consecutive words in an individual document of the corpus of documents.
  - 5. The method of claim 1, wherein evaluating the plurality of transcription hypotheses using the co-occurrence identification process includes the plurality being a group of transcription hypotheses selected as having best speech recognition scores based on a predetermined criterion.
  - 6. The method of claim 1, comprising:
    - increasing the weighting of the co-occurrence score of the given transcription hypothesis relative to a baseline weighting in response to identifying that the proposed query terms from the given transcription hypothesis co-occur within a predetermined number of consecutive words within an individual document of the corpus of documents.
  - 7. The method of claim 1, comprising:
    - rescoring the speech recognition score of each respective transcription hypothesis based on the co-occurrence score of the respective transcription hypothesis; and
      
      identifying the best transcription hypothesis as having a highest score based on the rescoring.
  - 8. The method of claim 1, wherein identifying the plurality of transcription hypotheses via the automated speech recognition process includes analyzing a waveform of the spoken query using an acoustic language model and a sequence-based statistical language model.
  - 9. The method of claim 8, wherein the statistical language model is trained on a first text corpus of natural language utterances and a second text corpus of search engine queries.
  - 10. The method of claim 1, wherein the spoken query is received from a voice search interface of a mobile client device.
  - 11. The method of claim 1, further comprising:
    - evaluating the plurality of transcription hypotheses using a class identification process, the class identification process including determining that a given query term, from a respective transcription hypothesis, corresponds to a specific class of terms, the class identification process assigning a classification score to the given query term,wherein selecting the best transcription hypothesis is further based on classification scores.
  - 12. The method of claim 1, further comprising:
    - evaluating the plurality of transcription hypotheses using a word relatedness identification process, the word relatedness identification process including evaluating a given query term, from a respective transcription hypothesis, using a lexical database, the word relatedness identification process assigning a word relatedness score to the given query term, the word relatedness score indicating a measure of semantic relation of the given query term to other words,wherein selecting the best transcription hypothesis is further based on word relatedness scores.
  - 15. The method of claim 1, comprising:
    - building a document index that identifies which words appear in which documents of the corpus of documents; and
      
      analyzing the document index using the proposed query terms from the given transcription hypothesis to identify a subset of documents of the corpus of documents in which the proposed query terms co-occur.
  - 16. The method of claim 1, wherein selecting the best transcription hypothesis is further based on the speech recognition score of each respective transcription hypothesis and the co-occurrence score of each respective transcription hypothesis.
  - 17. The method of claim 1, wherein identifying, via the automated speech recognition process, the plurality of transcription hypotheses based on the spoken query comprises analyzing phrase-based possibilities of the spoken query.
  - 18. The method of claim 1, comprising:
    - selecting a subset of the plurality of transcription hypotheses based on the speech recognition score of each respective transcription hypothesis of the subset being above a threshold score; and
      
      evaluating the subset of the plurality of transcription hypotheses using the co-occurrence identification process.

13. A system comprising:
- a processor; and
  
  non-transitory memory storing executable instructions that, when executed by the processor, cause the system to perform;
  
  receiving a spoken query;
  
  identifying, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score;
  
  evaluating the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising;
  
  identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents;
  
  assigning a co-occurrence score to each respective transcription hypothesis;
  
  evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis;
  
  increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and
  
  selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis;
  
  generating a text query corresponding to the best transcription hypothesis; and
  
  receiving, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis.
- View Dependent Claims (19, 20, 21)
- - 19. The system of claim 13, wherein identifying the frequency that proposed query terms co-occur based on the corpus of documents includes identifying co-occurrences of the proposed query terms within a predetermined number of consecutive words in an individual document of the corpus of documents.
  - 20. The system of claim 13, wherein the executable instructions, when executed by the processor, cause the system to perform:
    - increasing the weighting of the co-occurrence score of the given transcription hypothesis relative to a baseline weighting in response to identifying that the proposed query terms from the given transcription hypothesis co-occur within a predetermined number of consecutive words within an individual document of the corpus of documents.
  - 21. The system of claim 13, wherein the executable instructions, when executed by the processor, cause the system to perform:
    - evaluating the plurality of transcription hypotheses using a class identification process, the class identification process including determining that a given query term, from a respective transcription hypothesis, corresponds to a specific class of terms, the class identification process assigning a classification score to the given query term,wherein selecting the best transcription hypothesis is further based on classification scores.

14. One or more non-transitory computer-readable media storing executable instructions that, when executed by a processor, cause a system to:
- receive a spoken query;
  
  identify, via an automated speech recognition process, a plurality of transcription hypotheses based on the spoken query, each respective transcription hypothesis having a speech recognition score;
  
  evaluate the plurality of transcription hypotheses using a co-occurrence identification process, the co-occurrence identification process comprising;
  
  identifying a frequency that proposed query terms, from each respective transcription hypothesis, co-occur based on a corpus of documents;
  
  assigning a co-occurrence score to each respective transcription hypothesis;
  
  evaluating a weighting of the speech recognition score of each respective transcription hypothesis and a weighting of the co-occurrence score of each respective transcription hypothesis;
  
  increasing a weighting of a co-occurrence score of a given transcription hypothesis of the plurality of transcription hypotheses to be greater than a weighting of a speech recognition score of the given transcription hypothesis of the plurality of transcription hypotheses when proposed query terms from the given transcription hypothesis are more than a threshold phrase length; and
  
  selecting a best transcription hypothesis based on at least the weighting of the speech recognition score of each respective transcription hypothesis and the weighting of the co-occurrence score of each respective transcription hypothesis;
  
  generate a text query corresponding to the best transcription hypothesis; and
  
  receive, from an information retrieval system, search results based on the text query corresponding to the best transcription hypothesis.
- View Dependent Claims (22, 23, 24)
- - 22. The one or more non-transitory computer-readable media of claim 14, wherein identifying the frequency that proposed query terms co-occur based on the corpus of documents includes identifying co-occurrences of the proposed query terms within a predetermined number of consecutive words in an individual document of the corpus of documents.
  - 23. The one or more non-transitory computer-readable media of claim 14, wherein the executable instructions, when executed by the processor, cause the system to:
    - increase the weighting of the co-occurrence score of the given transcription hypothesis relative to a baseline weighting in response to identifying that the proposed query terms from the given transcription hypothesis co-occur within a predetermined number of consecutive words within an individual document of the corpus of documents.
  - 24. The one or more non-transitory computer-readable media of claim 14, wherein the executable instructions, when executed by the processor, cause the system to:
    - evaluate the plurality of transcription hypotheses using a class identification process, the class identification process including determining that a given query term, from a respective transcription hypothesis, corresponds to a specific class of terms, the class identification process assigning a classification score to the given query term,wherein selecting the best transcription hypothesis is further based on classification scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Mamou, Jonathan, Sethy, Abhinav, Ramabhadran, Bhuvana, Hoory, Ron, Vozila, Paul Joseph, Bodenstab, Nathan
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
THOMAS-HOMESCU, ANNE L

Application Number

US14/156,788
Publication Number

US 20140136197A1
Time in Patent Office

838 Days
Field of Search

704231-257, 704/275
US Class Current

1/1
CPC Class Codes

G06F 16/00   Information retrieval; Data...

G06F 7/00   Methods or arrangements for...

G10L 15/08   Speech classification or se...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/26   Speech to text systems G10L...

Accuracy improvement of spoken queries transcription using co-occurrence information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Accuracy improvement of spoken queries transcription using co-occurrence information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links