System and method for identifying semantic intent from acoustic information

US 7,634,406 B2
Filed: 12/10/2004
Issued: 12/15/2009
Est. Priority Date: 12/10/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method of processing acoustic information, comprising:

extracting data representing a plurality of sets of acoustic information of interest from a data store;

performing speech recognition using a computer with an application grammar and a second grammar on the data to obtain speech recognition results;

identifying whether the speech recognition results are generated with the application grammar or the second grammar;

performing a semantic analysis of the second data;

clustering the sets of acoustic information into clusters based on the semantic analysis of the speech recognition results;

ranking the clusters based on a number of instances of utterances contained in each cluster;

removing clusters based on a consistency threshold calculation of each cluster that indicates when a cluster has a number of unlike utterances that have meaningful semantics;

identifying, for each cluster, and storing an indicator of a set of acoustic information as being representative of a corresponding cluster, wherein the identified set of acoustic information is selected from the sets of acoustic information in each corresponding cluster;

identifying the representative set of acoustic information for a given cluster as representing either a semantic intent covered by the application or as an unrepresented semantic intent based on the speech recognition results assigned to the given cluster were generated using the application grammar or the second grammar; and

generating a revision to the application grammar to accommodate for the unrepresented semantic intent.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.

Citations

33 Claims

1. A computer implemented method of processing acoustic information, comprising:
- extracting data representing a plurality of sets of acoustic information of interest from a data store;
  
  performing speech recognition using a computer with an application grammar and a second grammar on the data to obtain speech recognition results;
  
  identifying whether the speech recognition results are generated with the application grammar or the second grammar;
  
  performing a semantic analysis of the second data;
  
  clustering the sets of acoustic information into clusters based on the semantic analysis of the speech recognition results;
  
  ranking the clusters based on a number of instances of utterances contained in each cluster;
  
  removing clusters based on a consistency threshold calculation of each cluster that indicates when a cluster has a number of unlike utterances that have meaningful semantics;
  
  identifying, for each cluster, and storing an indicator of a set of acoustic information as being representative of a corresponding cluster, wherein the identified set of acoustic information is selected from the sets of acoustic information in each corresponding cluster;
  
  identifying the representative set of acoustic information for a given cluster as representing either a semantic intent covered by the application or as an unrepresented semantic intent based on the speech recognition results assigned to the given cluster were generated using the application grammar or the second grammar; and
  
  generating a revision to the application grammar to accommodate for the unrepresented semantic intent.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method of claim 1 wherein clustering comprises:
    - clustering the sets of acoustic information together if the corresponding speech recognition results have a similar semantic analysis.
  - 3. The method of claim 2 wherein the semantic analysis is indicative of semantic intent expressed by the speech recognition results and wherein clustering comprises:
    - clustering sets of acoustic information that have corresponding speech recognition results with similar semantic intent.
  - 4. The method of claim 3 wherein clustering comprises:
    - initializing clusters based on lexical items in the speech recognition results;
      
      assigning the speech recognition results to the initialized clusters;
      
      merging similar clusters; and
      
      re-assigning the speech recognition results to the clusters.
  - 5. The method of claim 4 wherein initializing clusters comprises:
    - generating a cluster corresponding to each lexical item in the speech recognition results; and
      
      computing a prior probability for each cluster based on a number of speech recognition results that include the lexical item corresponding to that cluster.
  - 6. The method of claim 4 wherein assigning includes:
    - generating a language model corresponding to each cluster based on the speech recognition results assigned to that cluster.
  - 7. The method of claim 6 wherein assigning comprises:
    - refining the assignment of speech recognition results to the clusters based on a probability that the language models corresponding to the clusters will produce the speech recognition results; and
      
      re-generating the language models corresponding to the clusters.
  - 8. The method of claim 4 wherein merging comprises:
    - computing a distance between a pair of clusters; and
      
      merging the pair of clusters into one cluster if the computed distance meets a threshold distance; and
      
      performing the steps of computing and merging for a plurality of pairs of clusters to obtain a refined set of clusters.
  - 9. The method of claim 8 wherein re-assigning comprises:
    - re-assigning the speech recognition results to the refined set of clusters to obtain a set of refined clusters.
  - 10. The method of claim 9 wherein clustering further comprises:
    - filtering the set of refined clusters based on a similarity of the speech recognition results to one another in each of the refined clusters, to obtain filtered clusters.
  - 11. The method of claim 10 wherein filtering comprises:
    - removing clusters that have a compactness value that is lower than a threshold compactness value, wherein the compactness value is determined from a normalized, pair-wise similarity of multiple utterances in a cluster, and wherein removing the clusters includes removing the cluster and the speech recognition results assigned to the removed cluster from being used in a generated language model.
  - 12. The method of claim 10 and further comprising:
    - ranking the filtered clusters based on the speech recognition results assigned to the filtered clusters.
  - 13. The method of claim 12 wherein identifying a set of acoustic information as being representative, comprises:
    - computing a distance in similarity between each speech recognition result assigned to a given cluster and other speech recognition results assigned to the given cluster; and
      
      identifying a speech recognition result having a smallest distance in similarity to the other speech recognition results in the given cluster; and
      
      identifying, as the representative set of acoustic information for the given cluster, the set of acoustic information corresponding to the identified speech recognition result.
  - 14. The method of claim 4 wherein merging comprises:
    - merging clusters until a defined auxiliary function changes by a predetermined amount.
  - 15. The method of claim 4 wherein merging comprises:
    - iteratively merging clusters and re-assigning speech recognition results to clusters.
  - 16. The method of claim 4 wherein extracting sets of acoustic information of interest comprises:
    - identifying acoustic information of interest stored by a human-machine interface application configured to process input speech indicative of semantic intent covered by the application.
  - 17. The method of claim 16 wherein the application comprises an automatic voice response application and wherein extracting comprises:
    - extracting stored call log information received by the application.
  - 18. The method of claim 17 wherein the stored call log information indicates whether the call log information was generated from a failed call in which a caller prematurely hung up.
  - 19. The method of claim 18 further comprising identifying the representative set of acoustic information for each given cluster as representing either a semantic intent covered by the application or as an unrepresented semantic intent, based on whether the speech recognition results assigned to the given cluster correspond to acoustic information from failed calls.
  - 20. The method of claim 18 wherein extracting comprises:
    - extracting only acoustic information generated from failed calls.
  - 21. The method of claim 1 wherein identifying acoustic information as being representative, comprises:
    - identifying the acoustic information based on a likelihood that speech represented by the acoustic information will be generated, given the corresponding cluster.
  - 22. The method of claim 1 wherein generating a revision comprises:
    - generating a language model for each cluster; and
      
      outputting, as the revision, the language model corresponding to clusters for which the representative set of acoustic information corresponds to an unrepresented semantic intent.

23. A system for processing acoustic information, comprising:
- a computer memory including instructions to execute a clustering component configured to;
  
  cluster sets of acoustic information, from an application, into clusters based on a semantic analysis of speech recognition results of speech recognition performed on the sets of acoustic information;
  
  rank the clusters based on a number of instances of utterances contained in each cluster;
  
  remove clusters based on a consistency threshold calculation of each cluster that indicates when a cluster has a number of unlike utterances that have meaningful semantics;
  
  identify, for each cluster, a set of acoustic information as being representative of a corresponding cluster, wherein the identified set of acoustic information is selected from the sets of acoustic information in each corresponding cluster;
  
  identify the representative set of acoustic information for a given cluster as representing either a semantic intent covered by the application or as an unrepresented semantic intent based on the speech recognition results assigned to the given cluster were generated using an application grammar or a second grammar; and
  
  generate a revision to the application grammar to accommodate for the unrepresented semantic intent.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The system of claim 23 wherein the semantic analysis is indicative of semantic intent expressed by the speech recognition results and wherein the clustering component is configured to cluster those sets of acoustic information that have corresponding speech recognition results with similar semantic intent.
  - 25. The system of claim 24 wherein the clustering component comprises:
    - a language model-based clustering component configured to initialize clusters based on lexical items in the speech recognition results, assign the speech recognition results to the initialized clusters, generate a language model for each cluster based on the speech recognition results assigned to each cluster, and refine assignment of the speech recognition results to the clusters based on probabilities generated by the language models.
  - 26. The system of claim 25 wherein the clustering component is configured to merge similar clusters with one another to obtain merged clusters, and to re-assigning the speech recognition results to the merged clusters.
  - 27. The system of claim 25 wherein the clustering component is configured to initialize the clusters by generating a cluster corresponding to each lexical item in the speech recognition results, and computing a prior probability for each cluster based on a number of speech recognition results that include the lexical item corresponding to that cluster.
  - 28. The system of claim 25 wherein the clustering component is configured to merge similar clusters by computing a distance between a pair of clusters, merging the pair of clusters into one cluster if the computed distance meets a threshold distance, and performing the steps of computing and merging for a plurality of pairs of clusters to obtain a refined set of clusters.
  - 29. The system of claim 28 wherein the clustering component is configured to re-assign the speech recognition results to the refined set of clusters to obtain a set of refined clusters.
  - 30. The system of claim 28 wherein the clustering component is configured to identify a set of acoustic information as being representative by computing a distance in similarity between each speech recognition result assigned to a given cluster and other speech recognition results assigned to the given cluster, identify a speech recognition result having a smallest distance in similarity to the other speech recognition results in the given cluster, and identify, as the representative set of acoustic information for the given cluster, the set of acoustic information corresponding to the identified speech recognition result.

31. A computer storage medium storing instructions which, when executed by a computer, cause the computer to process acoustic information by performing steps of:
- extracting a plurality of sets of acoustic information of interest from a data store;
  
  performing speech recognition on the acoustic information to obtain speech recognition results;
  
  identifying whether the speech recognition results are generated with the application grammar or the second grammar;
  
  clustering the sets of acoustic information into clusters based on a semantic analysis of the speech recognition results;
  
  ranking the clusters based on a number of instances of utterances contained in each cluster;
  
  removing clusters based on a consistency threshold calculation of each cluster that indicates when a cluster has a number of unlike utterances that have meaningful semantics;
  
  identifying, for each cluster, and storing an indicator of a set of acoustic information as being representative of a corresponding cluster, wherein the identified set of acoustic information is selected from the sets of acoustic information in each corresponding cluster;
  
  selecting, for each cluster, a set of acoustic information from the sets of acoustic information in a particular cluster as being representative of the particular cluster;
  
  identifying the representative set of acoustic information for a given cluster as representing either a semantic intent covered by the application or as an unrepresented semantic intent based on the speech recognition results assigned to the given cluster were generated using the application grammar or the second grammar; and
  
  generating a revision to the application grammar to accommodate for the unrepresented semantic intent.
- View Dependent Claims (32, 33)
- - 32. The computer storage medium of claim 31 wherein clustering comprises:
    - clustering the sets of acoustic information together if the corresponding speech recognition results have a similar semantic analysis.
  - 33. The computer storage medium of claim 32 wherein the semantic analysis is indicative of semantic intent expressed by the speech recognition results and wherein clustering comprises:
    - clustering sets of acoustic information that have corresponding speech recognition results with similar semantic intent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Xiao, Yu, Dong, Acero, Alejandro, Mahajan, Milind, Gunawardana, Asela J.
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
GODBOLD, DOUGLAS

Application Number

US11/009,630
Publication Number

US 20060129397A1
Time in Patent Office

1,831 Days
Field of Search

704/9, 704/10, 704242-245, 704/257, 704/270, 704/270.1
US Class Current

704/244
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/19 Grammatical context, e.g. d...

System and method for identifying semantic intent from acoustic information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for identifying semantic intent from acoustic information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links