Method of assessing degree of acoustic confusability, and system therefor

US 7,013,276 B2
Filed: 10/05/2001
Issued: 03/14/2006
Est. Priority Date: 10/05/2001
Status: Expired due to Term

First Claim

Patent Images

1. A process comprising:

representing each text form of two spoken phrases with a first and second strings of phonemes, respectively;

assigning costs to transform the first string to the second string of phonemes according to a speech recognizer confusability of arbitrary pairs of audio files; and

calculating an acoustic confusability measure as a least cost, according to the cost assigning, to transform the first string of phonemes to the second string of phonemes, thereby predicting when a speech recognizer will confuse the two spoken phrases by directly using text forms of the spoken phrases.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Predicting speech recognizer confusion where utterances can be represented by any combination of text form and audio file. The utterances are represented with an intermediate representation that directly reflects the acoustic characteristics of the utterances. Text representations of the utterances can be directly used for predicting confusability without access to audio file examples of the utterances. First embodiment: two text utterances are represented with strings of phonemes and one of the strings of phonemes is transformed into the other strings of phonemes for a least cost as a confusability measure. Second embodiment: two utterances are represented with an intermediate representation of sequences of acoustic events based on phonetic capabilities of speakers obtained from acoustic signals of the utterances and the acoustic events are compared. Predicting confusability of the utterances according to a formula 2K/(T), K is a number of matched acoustic events and T is a total number of acoustic events.

Citations

33 Claims

1. A process comprising:
- representing each text form of two spoken phrases with a first and second strings of phonemes, respectively;
  
  assigning costs to transform the first string to the second string of phonemes according to a speech recognizer confusability of arbitrary pairs of audio files; and
  
  calculating an acoustic confusability measure as a least cost, according to the cost assigning, to transform the first string of phonemes to the second string of phonemes, thereby predicting when a speech recognizer will confuse the two spoken phrases by directly using text forms of the spoken phrases.
- View Dependent Claims (2, 3, 4)
- - 2. The process according to claim 1, wherein a dynamic programming minimum string edit distance algorithm computes the least cost.
  - 3. The process according to claim 1, further comprising using different cost values for each operation to transform depending on the characteristics of phonemes in the strings.
  - 4. The process according to claim 3, further comprising comparing the least cost to a threshold value to predict whether the speech recognizer will confuse the spoken phrases corresponding to the text forms.

5. A process comprising:
- representing text phrases with corresponding strings of phonemes;
  
  calculating an acoustic confusability measure as a least cost to transform one of the strings of phonemes into another of the strings of phonemes;
  
  determining and adjusting, using a speech recognizer, a threshold value according to confusability of example pairs of spoken phrases represented as audio files; and
  
  comparing the least cost to the threshold value to determine acoustic confusability of the text phrases when spoken to a speech recognizer.

6. A process, comprising:
- representing text phrases with corresponding strings of phonemes;
  
  calculating a distance between the strings of phonemes using a string edit distance algorithm as a least cost to transform according to a series of transformations one of the strings of phonemes into another of the strings of phonemes;
  
  determining and adjusting, using a speech recognizer, a threshold value according to confusability of example pairs of spoken phrases represented as audio files; and
  
  comparing the least cost to the threshold value to predict when a speech recognizer will confuse spoken phrases of the text phrases.
- View Dependent Claims (7, 8)
- - 7. The process according to claim 6, wherein each transformation is assigned a cost depending on characteristics of phonemes.
  - 8. The process according to claim 6, wherein the transformations comprise insertion of a phoneme, substitution of a phoneme with another phoneme and/or deletion of a phoneme.

9. A process comprising calculating an acoustic confusability measure as a least cost to transform a first string of phonemes corresponding to a first text phrase to a second string of phonemes corresponding to a second text phrase, the process further comprising:
- determining confusability of example pairs of audio files using a speech recognizer;
  
  calculating a confusability threshold value based upon the determined confusability, the threshold value compared to the least cost to determine when a speech recognizer will confuse spoken phrases of the two text phrases.

10. A computer system, comprising:
- a processor programmed to control the computer system according to a process comprising;
  
  representing each text form of two spoken phrases with a first and second strings of phonemes, respectively,assigning costs to transform the first string to the second string of phonemes according to a speech recognizer confusability of arbitrary pairs of audio files, andcalculating an acoustic confusability measure as a least cost, according to the cost assigning, to transform the first string of phonemes to the second string of phonemes, thereby predicting when a speech recognizer will confuse two spoken phrases by directly using text representations of the spoken phrases.

11. A computer system, comprising:
- a processor programmed to represent text phrases with corresponding strings of phonemes, to calculate an acoustic confusability measure as a least cost to transform one of the strings of phonemes into another of the strings of phonemes, to determine and adjust, using a speech recognizer, a threshold value according to confusability of example pairs of spoken phrases represented as audio flies, and to compare the least cost to the threshold value to determine acoustic confusability of the text phrases when spoken to a speech recognizer.

12. A computer system, comprising:
- a processor programmed to represent text phrases with corresponding strings of phonemes, to calculate a distance between the strings of phonemes using a string edit distance algorithm as a least cost to transform according to a series of transformations one of the strings of phonemes into another of the strings of phonemes, to determine and adjust, using a speech recognizer, a threshold value according to confusability of example pairs of spoken phrases represented as audio files, and to compare the least cost to the threshold value to predict when a speech recognizer will confuse spoken forms of the text phrases.
- View Dependent Claims (13)
- - 13. The computer system according to claim 12, wherein each transformation is assigned a cost depending on characteristics of phonemes.

14. A computer system programmed to calculate an acoustic confusability measure as a least cost to transform a first string of phonemes corresponding to a first text phrase to a second string of phonemes corresponding to a second text phrase, the system further comprising:
- a processor programmed to determine confusability of example pairs of audio files using a speech recognizer, and to calculate a confusability threshold value based upon the determined confusability, the threshold value compared to the least cost to determine when a speech recognizer will confuse spoken form of the two text phrases.

15. A process of predicting when a speech recognizer will confuse two spoken phrases where two utterances corresponding to the spoken phrases in any combination of text and audio file are available to a confusability prediction algorithm, comprising:
- representing each utterance with corresponding sequences of acoustic characteristics based on phonetic capabilities of speakers obtained from acoustic signals of the utterances;
  
  aligning the sequences of acoustic characteristics;
  
  comparing the sequences of acoustic characteristics; and
  
  calculating a metric of acoustic confusability according to a formula 2K/(T), wherein K is a number of acoustic characteristics that match from the comparing and T is a total number of acoustic characteristics in the sequences of acoustic characteristics for both utterances.
- View Dependent Claims (16, 17)
- - 16. The process according to claim 15, wherein the spoken phrases are in any language.
  - 17. The process according to claim 16, wherein the metric of acoustic confusability is used in telephony applications.

18. A process comprising:
- representing two utterances that are in any combination of a text form and an audio file with corresponding sequences of acoustic characteristics based on phonetic capabilities of speakers obtained from acoustic signals of the utterances;
  
  aligning the sequences of acoustic characteristics;
  
  comparing the sequences of acoustic characteristics; and
  
  calculating a metric of acoustic confusability according to the comparing.
- View Dependent Claims (19)
- - 19. The process according to claim 18, wherein the utterances are in any language.

20. A process comprising:
- representing two utterances with corresponding sequences of acoustic characteristics based on phonetic capabilities of speakers obtained from acoustic signals of the utterances;
  
  aligning the sequences of acoustic characteristics;
  
  comparing the sequences of acoustic characteristics; and
  
  calculating a metric of acoustic confusability according to a formula 2K/(T), wherein K is a number of acoustic characteristics that match from the comparing and T is a total number of acoustic characteristics in the sequences of acoustic characteristics for both utterances.
- View Dependent Claims (21)
- - 21. The process according to claim 20, wherein the utterances are in any language.

22. A process, comprising:
- representing a first phrase and a second phrase, respectively, with a corresponding sequence of acoustic features and acoustic measures obtained from a dictionary of acoustic features and a database of prerecorded audio files;
  
  aligning the sequence of acoustic features and acoustic measures of the first phrase with the sequence of acoustic features and acoustic measures of the second phrase;
  
  comparing the aligned first phrase and second phrase sequences; and
  
  calculating a metric of acoustic confusability according to the comparing according to a formula 2K/(N+M), wherein K is a number of acoustic features and acoustic measures that match from the comparing and N is a number of acoustic features and acoustic measures in the first phrase and M is a number of acoustic features and acoustic measures in the second phrase.

23. A computer system, comprising:
- a processor programmed to predict when a speech recognizer will confuse two spoken phrases where two phrases in any combination of text and spoken form are available to a confusability prediction algorithm, according to a process comprising;
  
  representing each phrase with corresponding sequences of acoustic characteristics based on phonetic capabilities of speakers obtained from acoustic signals of the phrases;
  
  aligning the sequences of acoustic characteristics;
  
  comparing the sequences of acoustic characteristics; and
  
  calculating a metric of acoustic confusability according to a formula 2K/(T), wherein K is a number of acoustic characteristics that match from the comparing and T is a total number of acoustic characteristics in the sequences of acoustic characteristics for both phrases.
- View Dependent Claims (24, 25)
- - 24. The computer system according to claim 23, wherein the phrases are in any language.
  - 25. The computer system according to claim 24, wherein the metric of acoustic confusability is used in telephony applications.

26. A computer system, comprising:
- a processor programmed to represent two spoken phrases that are in any combination of a text form and an audio file with corresponding sequences of acoustic characteristics based on phonetic capabilities of speakers obtained from acoustic signals of the phrases, to align the sequences of acoustic characteristics, to compare the sequences of acoustic characteristics, and to calculate a metric of acoustic confusability according to the comparing.
- View Dependent Claims (27)
- - 27. The computer system according to claim 26, wherein the spoken phrases are in any language.

28. A computer system, comprising:
- a processor programmed to represent a first phrase and a second phrase, respectively, with a corresponding sequence of acoustic features and acoustic measures obtained from a dictionary of acoustic features and a database of prerecorded audio files, to align the sequence of acoustic features and acoustic measures of the first phrase with the sequence of acoustic features and acoustic measures of the second phrase, to compare the aligned first phrase and second phrase sequences, and to calculate a metric of acoustic confusability according to the comparing according to a formula 2K/(N+M), wherein K is a number of acoustic features and acoustic measures that match from the comparing and N is a number of acoustic features and acoustic measures in the first phrase and M is a number of acoustic features and acoustic measures in the second phrase.

29. A process, comprising:
- receiving two utterances in any combination of a text form and an audio file, the audio file having a speech signal recorded by any speaker and under any acoustic condition;
  
  representing the two utterances with an intermediate representation that directly reflects the salient acoustic characteristics of the two utterances when spoken; and
  
  predicting when a speech recognizer will confuse the two utterances when spoken by using the intermediate representation of the two utterances.
- View Dependent Claims (30)
- - 30. The process according to claim 29, wherein results from the predicting is used in a telephony application.

31. A computer system, comprising:
- a processor programmed to receive two phrases in any combination of text and spoken form, the spoken form spoken by any speaker and under any acoustic condition, to represent the two phrases with an intermediate representation that directly reflects the salient acoustic characteristics of the two phrases when spoken, and to predict when a speech recognizer will confuse the two phrases when spoken by using the intermediate representation of the two phrases.
- View Dependent Claims (32)
- - 32. The system according to claim 31, wherein results from the prediction is used in a telephony application.

33. A computer program, embodied on a computer-readable medium, comprising:
- an input segment receiving two utterances in any combination of a text form and an audio file, the audio file having speech signals recorded by any speaker and under any acoustic condition;
  
  a transformation segment representing the two utterances with an intermediate representation that directly reflects the salient acoustic characteristics of the two utterances when spoken; and
  
  a predicting segment predicting acoustic confusability of the two utterances when spoken to a speech recognizer using the intermediate representation of the two utterances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mavenir Systems Incorporated (Mavenir Group Holdings Ltd.)
Original Assignee
Comverse Incorporated (Mavenir Group Holdings Ltd.)
Inventors
Denenberg, Lawrence A., Bickley, Corine A.
Primary Examiner(s)
Abebe, Daniel

Application Number

US09/971,012
Publication Number

US 20030069729A1
Time in Patent Office

1,621 Days
Field of Search

704/255, 704/256, 704/257
US Class Current

704/255
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/10 using distance or distortio...

Method of assessing degree of acoustic confusability, and system therefor

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Method of assessing degree of acoustic confusability, and system therefor

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links