SEQUENTIAL SPEECH RECOGNITION WITH TWO UNEQUAL ASR SYSTEMS

US 20100082343A1
Filed: 09/29/2008
Published: 04/01/2010
Est. Priority Date: 09/29/2008
Status: Active Grant

First Claim

Patent Images

1. A method for providing efficient speech recognition, the method comprising:

providing a first plurality of vocabulary data;

providing a second plurality of vocabulary data;

adding at least one decoy entry to the first plurality of vocabulary data wherein the at least one decoy entry comprises at least one entry from the second plurality of vocabulary data;

receiving an input comprising an audio signal;

determining whether the input matches at least one entry in the first plurality of vocabulary data;

in response to determining that the input matches the at least one entry in the first vocabulary data, determining whether the matched at least one entry comprises the at least one decoy entry in the first vocabulary data; and

in response to determining that the matched at least one entry comprises the at least one decoy entry in the first vocabulary data, determining whether the input matches at least one entry in the second plurality of vocabulary data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Sequential speech recognition using two unequal automatic speech recognition (ASR) systems may be provided. The system may provide two sets of vocabulary data. A determination may be made as to whether entries in one set of vocabulary data are likely to be confused with entries in the other set of vocabulary data. If confusion is likely, a decoy entry from one set of the vocabulary data may be placed in the other set of vocabulary data to ensure more efficient and accurate speech recognition processing may take place.

121 Citations

20 Claims

1. A method for providing efficient speech recognition, the method comprising:
- providing a first plurality of vocabulary data;
  
  providing a second plurality of vocabulary data;
  
  adding at least one decoy entry to the first plurality of vocabulary data wherein the at least one decoy entry comprises at least one entry from the second plurality of vocabulary data;
  
  receiving an input comprising an audio signal;
  
  determining whether the input matches at least one entry in the first plurality of vocabulary data;
  
  in response to determining that the input matches the at least one entry in the first vocabulary data, determining whether the matched at least one entry comprises the at least one decoy entry in the first vocabulary data; and
  
  in response to determining that the matched at least one entry comprises the at least one decoy entry in the first vocabulary data, determining whether the input matches at least one entry in the second plurality of vocabulary data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, further comprising:
    - in response to determining that the input does not match at least one entry in the first plurality of vocabulary data, determining whether the input matches at least one entry in the second plurality of vocabulary data.
  - 3. The method of claim 1, wherein adding at least one decoy entry to the first plurality of vocabulary data wherein the at least one decoy entry comprises at least one entry from the second plurality of vocabulary data comprises:
    - comparing each entry in the first plurality of vocabulary data to each entry in the second plurality of vocabulary data;
      
      determining whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data;
      
      in response to determining that at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data, adding the at least one entry in the second plurality of vocabulary data that is confusable with the at least one entry in the first plurality of vocabulary data to the first plurality of vocabulary data; and
      
      associating the added entry to the first plurality of vocabulary data with the at least one entry in the first plurality of vocabulary data as a decoy entry.
  - 4. The method of claim 3, wherein determining whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data comprises:
    - calculating a confusability score based on a comparison of each entry in the first plurality of vocabulary data with each entry in the second plurality of vocabulary data; and
      
      determining whether the calculated confusability score comprises a value indicating that at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data.
  - 5. The method of claim 4, wherein calculating the confusability score based on a comparison of each entry in the first plurality of vocabulary data with each entry in the second plurality of vocabulary data comprises:
    - comparing a phonetic representation of each entry in the first plurality of vocabulary data with a phonetic representation of each entry in the second plurality of vocabulary data; and
      
      calculating an edit distance between the phonetic representation of each entry in the first plurality of vocabulary data with the phonetic representation of each entry in the second plurality of vocabulary data.
  - 6. The method of claim 4, wherein calculating the confusability score based on a comparison of each entry in the first plurality of vocabulary data with each entry in the second plurality of vocabulary data comprises deriving a metric of natural confusability from a set of training data.
  - 7. The method of claim 1, further comprising:
    - receiving a new entry to the first plurality of vocabulary data; and
      
      determining whether the new entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data; and
      
      in response to determining that the new entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data, adding the at least one entry in the second plurality of vocabulary data that is confusable with the at least one entry in the first plurality of vocabulary data to the first plurality of vocabulary data.
  - 8. The method of claim 1, further comprising:
    - determining whether an entry in the first plurality of vocabulary data has been confused with an entry in the second plurality of vocabulary data; and
      
      creating a record of the confusion of the entry in the first plurality of vocabulary data with the entry in the second plurality of vocabulary data.
  - 9. The method of claim 8, further comprising adding a decoy entry comprising the entry in the second plurality of vocabulary data confused with the entry in the first plurality of vocabulary data to the first plurality of vocabulary data.
  - 10. The method of claim 4, wherein calculating the score based on a comparison of each entry in the first plurality of vocabulary data with each entry in the second plurality of vocabulary data comprises deriving a metric of natural confusability for each comparison from at least one set of training data, wherein the metric of natural confusability is associated with the entirety of the entry.
  - 11. The method of claim 6, wherein the likelihood of confusion for each compared phoneme is based on at least one of the following:
    - training data from at least one user and historical data from a plurality of users.
  - 12. The method of claim 1, wherein the first plurality of vocabulary data comprises fewer entries than the second plurality of vocabulary data.
  - 13. The method of claim 1, wherein determining whether the input matches at least one entry in the first plurality of vocabulary data comprises:
    - determining whether the input is within a confidence threshold for determining a match; and
      
      periodically adjusting the confidence threshold for determining a match according to training data from at least one user.
  - 14. The method of claim 13, wherein each entry in the first plurality of vocabulary data requires the same confidence threshold for determining a match with the input.
  - 15. The method of claim 13, wherein the confidence threshold for determining a match between each entry in the first plurality of vocabulary data and the input is separately configurable.

16. A system for providing efficient speech recognition, the system comprising:
- a memory storage; and
  
  a processing unit coupled to the memory storage, wherein the processing unit is operative to;
  
  access a first plurality of vocabulary data;
  
  access a second plurality of vocabulary data;
  
  add at least one decoy entry to the first plurality of vocabulary data wherein the at least one decoy entry comprises at least one entry from the second plurality of vocabulary data, wherein being operative to add the at least one decoy entry to the first plurality of vocabulary data comprises being operative to;
  
  compare each entry in the first plurality of vocabulary data to each entry in the second plurality of vocabulary data,determine whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data,in response to determining that at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data, add the at least one entry in the second plurality of vocabulary data that is confusable with the at least one entry in the first plurality of vocabulary data to the first plurality of vocabulary data, andassociate the added entry to the first plurality of vocabulary data with the at least one entry in the first plurality of vocabulary data as a decoy entry;
  
  receive an input comprising a speech signal;
  
  determine whether the input matches at least one entry in the first plurality of vocabulary data;
  
  in response to determining that the input matches the at least one entry in the first vocabulary data, determine whether the matched at least one entry comprises the at least one decoy entry in the first vocabulary data; and
  
  in response to determining that the matched at least one entry comprises the at least one decoy entry in the first vocabulary data, determine whether the input matches at least one entry in the second plurality of vocabulary data.
- View Dependent Claims (17, 18, 19)
- - 17. The system of claim 16, wherein the processing unit is further operable to:
    - determine whether an entry in the first plurality of vocabulary data has been confused with an entry in the second plurality of vocabulary data;
      
      create at least one record of the confusion of the entry in the first plurality of vocabulary data with the entry in the second plurality of vocabulary data; and
      
      re-determine whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data based on the at least one record of the confusion of the entry in the first plurality of vocabulary data with the entry in the second plurality of vocabulary data.
  - 18. The system of claim 16, wherein being operative to access the first plurality of vocabulary data comprises being operative to access a locally stored first recognition resource operative to provide the first plurality of vocabulary data;
    - andwherein being operative to access the second plurality of vocabulary data comprises being operative to access a remotely stored second recognition resource operative to provide the second plurality of vocabulary data.
  - 19. The system of claim 16, wherein being operative to determine whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data comprises being operative to calculate a weighted edit distance between a pronunciation of each entry in the first plurality of vocabulary data and a pronunciation of each entry in the second plurality of vocabulary data.

20. A computer-readable medium which stores a set of instructions which, when executed, performs a method for providing efficient speech recognition, the method executed by the set of instructions comprising:
- providing a first plurality of vocabulary data;
  
  providing a second plurality of vocabulary data, wherein the second plurality of vocabulary data comprises a larger number of entries than the first plurality of vocabulary data;
  
  adding at least one decoy entry to the first plurality of vocabulary data wherein the at least one decoy entry comprises at least one entry from the second plurality of vocabulary data, wherein adding at least one decoy entry to the first plurality of vocabulary data comprises;
  
  comparing each entry in the first plurality of vocabulary data to each entry in the second plurality of vocabulary data,determining whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data, wherein determining whether at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data comprises;
  
  calculating a confusion score based on a comparison of each entry in the first plurality of vocabulary data with each entry in the second plurality of vocabulary data, anddetermining whether the calculated confusion score comprises a value indicating that at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data;
  
  in response to determining that at least one entry in the first plurality of vocabulary data is confusable with at least one entry in the second plurality of vocabulary data, adding the at least one entry in the second plurality of vocabulary data that is confusable with the at least one entry in the first plurality of vocabulary data to the first plurality of vocabulary data according to at least one of the following;
  
  adding all of the at least one entries in the second plurality of vocabulary data to the first plurality of vocabulary data comprising a confusion score greater than a confusion threshold and adding a predefined number of the at least one entries in the second plurality of vocabulary data to the first plurality of vocabulary data comprising the highest confusion scores, andassociating the added entry to the first plurality of vocabulary data with the at least one entry in the first plurality of vocabulary data as a decoy entry;
  
  receiving an input comprising an audible signal;
  
  determining whether the input matches at least one entry in the first plurality of vocabulary data, wherein determining whether the input matches at least one entry in the first plurality of vocabulary data comprises;
  
  assigning a recognition score based on a comparison of the input with each entry in the first plurality of vocabulary data,converting the recognition score associated with each entry in the first plurality of vocabulary data to a posterior probability,computing a confidence score as a difference between the posterior probability of the input with the highest recognition score and the input with the next highest recognition score, anddetermining whether the confidence score exceeds a confidence threshold associated with the entry comprising the highest recognition score in the first plurality of vocabulary data;
  
  in response to determining that the input matches the at least one entry in the first vocabulary data, determining whether the matched entry comprises the at least one decoy entry in the first vocabulary data;
  
  in response to determining that the matched entry comprises the at least one decoy entry in the first vocabulary data, determining whether the input matches at least one entry in the second plurality of vocabulary data; and
  
  in response to determining that the input does not match at least one entry in the first plurality of vocabulary data, determining whether the input matches at least one entry in the second plurality of vocabulary data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Levit, Michael, Buntschuh, Bruce Melvin, Chang, Shuangyu

Granted Patent

US 8,180,641 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 15/32 Multiple recognisers used i...

G10L 2015/0631 Creating reference template...

SEQUENTIAL SPEECH RECOGNITION WITH TWO UNEQUAL ASR SYSTEMS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SEQUENTIAL SPEECH RECOGNITION WITH TWO UNEQUAL ASR SYSTEMS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links