PHONETIC DISTANCE MEASUREMENT SYSTEM AND RELATED METHODS

US 20100332230A1
Filed: 06/25/2009
Published: 12/30/2010
Est. Priority Date: 06/25/2009
Status: Active Grant

First Claim

Patent Images

1. A method of generating a phonetic distance matrix comprising:

determining a plurality of error occurrences by comparing a recognized speech file with a reference file;

determining a plurality of error rates corresponding to the plurality of error occurrences;

determining a plurality of phonetic distances as a function of the plurality of error rates; and

outputting a phonetic distance matrix based on the plurality of phonetic distances.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file. The phonetic distances can be normalized to earlier measurements. The phonetic distances/error rates can also be used to improve speech recognition engine grammar selection, as an aid in language training and evaluation, and in other applications.

Citations

23 Claims

1. A method of generating a phonetic distance matrix comprising:
- determining a plurality of error occurrences by comparing a recognized speech file with a reference file;
  
  determining a plurality of error rates corresponding to the plurality of error occurrences;
  
  determining a plurality of phonetic distances as a function of the plurality of error rates; and
  
  outputting a phonetic distance matrix based on the plurality of phonetic distances.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the plurality of error occurrences includes a plurality of substitution, insertion and deletion error occurrences and the plurality of error rates includes a corresponding plurality of substitution, insertion and deletion error rates.
  - 3. The method of claim 1, wherein determining the plurality of error rates includes dividing the plurality of error occurrences by a total number of phonetic element occurrences in the reference file.
  - 4. The method of claim 1, wherein determining the plurality of phonetic distances and outputting the phonetic distance matrix includes normalizing the phonetic distances to minimize a total separation between the phonetic distance matrix and an existing phonetic distance matrix.
  - 5. The method of claim 4, wherein normalizing the phonetic distances to minimize the total separation between the phonetic distance matrix and an existing phonetic distance matrix includes using a mapping function with three normalization coefficients.
  - 6. The method of claim 5, wherein the mapping function is:
    - Phonetic Distance_i,j=α
      
      ₁+(α
      
      ₂/(Error Rate_i,j−
      
      α
      
      ₃));
      
      wherein i and j are indices of the phonetic elements, and α
      
      ₁, α
      
      ₂and α
      
      ₃are the three normalization coefficients.
  - 7. The method of claim 6, wherein the separation between the phonetic distance matrix and the existing phonetic distance matrix is defined as:
    - L(α
      
      ₁, α
      
      ₂, α
      
      ₃)=Σ
      
      _i,j(Existing Phonetic Distance_i,j−
      
      Phonetic Distance_i,j)².
  - 8. The method of claim 1, further comprising generating the recognized speech file by processing, with a speech recognition engine, an audio file of a speaker reading contents of the reference file.
  - 9. The method of claim 8, further comprising generating the audio file.
  - 10. The method of claim 1, wherein determining a plurality of error occurrences includes comparing a plurality of recognized speech and reference files.
  - 11. The method of claim 10, wherein the plurality of recognized speech files correspond to audio files of a plurality of different speakers.
  - 12. The method of claim 10, wherein the plurality of recognized speech files are generated by a plurality of different speech recognition engines.

13. A phonetic distance measurement system comprising:
- a reference file;
  
  a recognized speech file;
  
  a comparison module configured to determine a plurality of error occurrences by comparing the recognized speech file and the reference file;
  
  an error rate module configured to determine a plurality of error rates corresponding to the plurality of error occurrences; and
  
  a measurement module configured to determine a plurality of phonetic distances as a function of the plurality of error rates.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, further comprising a dictionary, wherein the comparison module is further configured to access the dictionary to identify phonetic elements in the reference file and recognized speech file prior to determining the plurality of error occurrences.
  - 15. The system of claim 13, wherein the plurality of error occurrences the comparison module is configured to determine include substitution error occurrences, insertion error occurrences and deletion error occurrences.
  - 16. The system of claim 15, wherein the comparison module is further configured to identify the plurality of substitution error occurrences by corresponding phonetic element pairs and the insertion and deletion error occurrences by corresponding phonetic elements.
  - 17. The system of claim 13, wherein the measurement module is further configured to normalize the phonetic distances.

18. A grammar development method for a speech recognition engine, the method comprising:
- generating a plurality of recognized speech files by processing a plurality of audio files of recorded speech with the speech recognition engine;
  
  determining a plurality of substitution, insertion and deletion error occurrences by comparing the plurality of recognized speech files with a plurality of corresponding reference files;
  
  determining a plurality of substitution, insertion and deletion error rates from the plurality substitution, insertion and deletion error occurrences; and
  
  editing the grammar based on the plurality of error rates.
- View Dependent Claims (19)
- - 19. The method of claim 18, wherein all of the plurality of audio files of recorded speech correspond to a distinct group of speakers of a language.

20. A language training and evaluation method comprising:
- generating an audio file of a speaker;
  
  generating a recognized speech file by processing the audio file with a speech recognition engine;
  
  determining a plurality of substitution error occurrences for a plurality of phonetic element pairs by comparing the recognized speech file with a reference file corresponding to the recognized speech file;
  
  determining a plurality of error rates based on the plurality of substitution error occurrences;
  
  comparing the plurality of error rates with optimal values; and
  
  identifying phonetic element pairs requiring improvement based on a set of results of comparing the plurality of error rates with the optimal values.
- View Dependent Claims (21, 22, 23)
- - 21. The method of claim 20, wherein identifying phonetic element pairs requiring improvement includes displaying the results of the comparison to the speaker.
  - 22. The method of claim 20, wherein the determining a plurality of error rates based on the plurality of substitution error occurrences and comparing the plurality of error rates with optimal values includes determining a plurality of phonetic distances from the plurality of error rates.
  - 23. The method of claim 20, further comprising:
    - generating a later audio file of the speaker reading a text;
      
      generating a later recognized speech file by processing the later audio file with the speech recognition engine;
      
      determining a later plurality of substitution error occurrences for the plurality of phonetic element pairs by comparing the later recognized speech file with a reference file corresponding to the later recognized speech file;
      
      determining a later plurality of error rates based on the later plurality of substitution error occurrences;
      
      comparing the later plurality of error rates with the optimal values;
      
      identifying phonetic element pairs requiring improvement based on a later set of results of comparing the later plurality of error rates with the optimal values; and
      
      identifying a trend based on comparing the set of results with the later set of results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Original Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Inventors
Shu, Chang-Qing

Granted Patent

US 9,659,559 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/253
CPC Class Codes

G06F 40/56   Natural language generation

G10L 15/01   Assessment or evaluation of...

G10L 15/063   Training

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/144   Training of HMMs

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/221   Announcement of recognition...

PHONETIC DISTANCE MEASUREMENT SYSTEM AND RELATED METHODS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

PHONETIC DISTANCE MEASUREMENT SYSTEM AND RELATED METHODS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links