Systems and methods for comparing speech elements

US 7,752,045 B2
Filed: 10/07/2002
Issued: 07/06/2010
Est. Priority Date: 10/07/2002
Status: Active Grant

First Claim

Patent Images

1. A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken a plurality of different times by a second person, the method comprising:

performing a forced alignment speech recognition function on the first audio data source to isolate at least one element of the first audio data source, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source;

comparing the isolated element with a corresponding element in the plurality of audio data sources; and

determining whether the utterance spoken by the first person contained an error based on the comparison.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken by a second person. The method includes performing a speech recognition function on the first audio data source to isolate at least one element of the first audio data source. The method also includes comparing the isolated element with a corresponding element in the plurality of audio data sources and determining whether the utterance spoken by the first person contained an error based on the comparison.

Citations

30 Claims

1. A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken a plurality of different times by a second person, the method comprising:
- performing a forced alignment speech recognition function on the first audio data source to isolate at least one element of the first audio data source, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source;
  
  comparing the isolated element with a corresponding element in the plurality of audio data sources; and
  
  determining whether the utterance spoken by the first person contained an error based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 21, 22)
- - 2. The method of claim 1, wherein comparing the isolated element includes comparing an isolated duration element to a corresponding duration element in the plurality of audio data sources.
  - 3. The method of claim 1, wherein comparing the isolated element includes comparing an isolated recognition element to a corresponding recognition element in the plurality of audio data sources.
  - 4. The method of claim 1, further comprising creating the first audio data source from an input voice signal.
  - 5. The method of claim 1, wherein the utterance spoken by the first person is spoken in a non-native language and the utterance spoken by the second person is in spoken in the same language, wherein the language is native to the second person.
  - 6. The method of claim 1, further comprising outputting an indication of one of a bad pronunciation of the utterance by the first person and a good pronunciation of the utterance by the first person.
  - 21. The method of claim 1, further comprising sending the orthographic text to a speech recognizer.
  - 22. The method of claim 1, wherein the plurality of audio data sources is comprised of the same utterance spoken a plurality of different times by a plurality of different persons.

7. An apparatus, comprising:
- an audio capture module for receiving a first spoken utterance, wherein the first spoken utterance is spoken in a language that is not native to a speaker of the first spoken utterance;
  
  a speech recognition module for performing a forced alignment speech recognition function on the first spoken utterance to isolate at least one element of the first spoken utterance, the forced alignment speech recognition function using an orthographic text corresponding to the first spoken utterance; and
  
  a comparison module for comparing the isolated element of the first spoken utterance with a corresponding element in audio data from a plurality of audio data sources and for determining whether the first spoken utterance contained an error based on the comparison, wherein the audio data from a plurality of audio data sources are spoken a plurality of different times in the language, wherein the language is native to each speaker of the audio data from a plurality of audio data sources, and wherein the first spoken utterance and the audio data from a plurality of audio data sources are substantially similar in content.
- View Dependent Claims (8, 9, 10, 11, 23, 24)
- - 8. The apparatus of claim 7, further comprising an audio database for storing the first spoken utterance.
  - 9. The apparatus of claim 7, further comprising a native speaker database for storing the audio data from a plurality of audio data sources.
  - 10. The apparatus of claim 7, further comprising a display module in communication with the comparison module.
  - 11. The apparatus of claim 7 further comprising speech recognition interface for interfacing with a speech recognizer.
  - 23. The apparatus of claim 7, further comprising sending the orthographic text to a speech recognizer.
  - 24. The apparatus of claim 7, wherein the plurality of audio data sources are spoken by a plurality of different persons.

12. A system, comprising:
- an audio input device; and
  
  a comparison device in communication with the audio input device, wherein the comparison device includes;
  
  an audio capture module for receiving a first spoken utterance, wherein the first spoken utterance is spoken in a language that is not native to a speaker of the first spoken utterance;
  
  a speech recognition module for performing a forced alignment speech recognition function on the first spoken utterance to isolate at least one element of the first spoken utterance, the forced alignment speech recognition function using an orthographic text corresponding to the first spoken utterance; and
  
  a comparison module for comparing the isolated element of the first spoken utterance with a corresponding element in audio data from a plurality of audio data sources and for determining whether the first spoken utterance contained an error based on the comparison, wherein the audio data from a plurality of audio data sources are spoken a plurality of different times in the language, wherein the language is native to each speaker of the audio data from a plurality of audio data sources, and wherein the first spoken utterance and the audio data from a plurality of audio data sources are substantially similar in content.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 25, 26)
- - 13. The system of claim 12, further comprising a speech recognizer in communication with the comparison device.
  - 14. The system of claim 12, further comprising:
    - an input device; and
      
      a display device.
  - 15. The system of claim 12, wherein the comparison device further comprises an audio database for storing the first spoken utterance.
  - 16. The system of claim 12, wherein the comparison device further comprises a native speaker database for storing the audio data from a plurality of data sources.
  - 17. The system of claim 12, wherein the comparison device further comprises a display module in communication with the comparison module.
  - 18. The system of claim 12, wherein the comparison device further comprises a speech recognition interface in communication with the speech recognizer.
  - 25. The system of claim 12, further comprising sending the orthographic text to the speech recognizer.
  - 26. The system of claim 12, wherein the plurality of audio data sources have the same utterance spoken a plurality of different times by a plurality of different persons.

19. An apparatus, comprising:
- means for performing a forced alignment speech recognition function on a first audio data source to isolate at least one element of the first audio data source, wherein the first audio data source has an utterance spoken by a first person, the means for performing the forced alignment speech recognition function using an orthographic text corresponding to the utterance spoken by the first person;
  
  means for comparing the isolated element with a corresponding element in a plurality of audio data sources, wherein the plurality of audio data sources have an utterance substantially similar in content to the utterance spoken by the first person spoken a plurality of different times by a second person; and
  
  means for determining whether the utterance spoken by the first person contained an error based on the comparison.
- View Dependent Claims (27, 28)
- - 27. The apparatus of claim 19, further comprising a speech recognizer, wherein the orthographic text is sent to the speech recognizer.
  - 28. The apparatus of claim 19, wherein the plurality of audio data sources is comprised of an utterance substantially similar in content to the utterance spoken by the first person spoken by a plurality of different persons.

20. A computer readable medium having stored thereon instructions which, when executed by a processor, cause the processor to:
- perform a forced alignment speech recognition function on a first audio data source to isolate at least one element of the first audio file, wherein the first audio data source has an utterance spoken by a first person, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source;
  
  compare the isolated element with a corresponding element in a plurality of audio data sources, wherein the plurality of audio data sources have an utterance substantially similar in content to the utterance spoken by the first person spoken a plurality of different times by a second person; and
  
  determine whether the utterance spoken by the first person contained an error based on the comparison.
- View Dependent Claims (29, 30)
- - 29. The computer readable medium of claim 20, further comprising instructions which, when executed by a processor, cause the processor to send the orthographic text to a speech recognizer.
  - 30. The computer readable medium of claim 20, wherein the plurality of audio data sources is comprised of an utterance substantially similar in content to the utterance spoken a plurality of different times by a plurality of different persons.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Carnegie Mellon University, Carnegie Speech
Original Assignee
Carnegie Mellon University, Carnegie Speech Company, Inc.
Inventors
Eskenazi, Maxine
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Rider; Justin W

Application Number

US10/265,862
Publication Number

US 20070192093A1
Time in Patent Office

2,829 Days
Field of Search

704/246, 704/277, 704/236, 704/243
US Class Current

704/243
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G09B 19/06   Foreign languages with audi...

G10L 15/10   using distance or distortio...

Systems and methods for comparing speech elements

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for comparing speech elements

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links