Method for assessing pronunciation abilities

US 8,271,281 B2
Filed: 06/27/2008
Issued: 09/18/2012
Est. Priority Date: 12/28/2007
Status: Active Grant

First Claim

Patent Images

1. A method for assessing pronunciation abilities of a user, the method comprising:

presenting at least one text prompt to the user;

recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and

automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a first classifier, wherein the plurality of classes comprises a first subset of classes comprising at least one class and a second subset of classes comprising at least one class, wherein the first subset of classes is disjoint from the second subset of classes, wherein;

presenting the at least one text prompt comprises presenting a first text prompt to the user,recording the at least one spoken input comprises recording a first spoken input spoken by the user in response to the first text prompt,assigning the user to one of the plurality of classes comprises determining whether the user belongs to any class in the first subset of classes or to any class in the second subset of classes by applying the first classifier to one or more features obtained from the first spoken input, andwherein the method further comprises;

presenting a second text prompt to the user if it is determined that the user belongs to any class in the first subset of classes; and

presenting a third prompt to the user if it is determined that the user belongs to any class in the second subset of classes, wherein the second prompt is different from the third prompt.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.

33 Citations

View as Search Results

15 Claims

1. A method for assessing pronunciation abilities of a user, the method comprising:
- presenting at least one text prompt to the user;
  
  recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a first classifier, wherein the plurality of classes comprises a first subset of classes comprising at least one class and a second subset of classes comprising at least one class, wherein the first subset of classes is disjoint from the second subset of classes, wherein;
  
  presenting the at least one text prompt comprises presenting a first text prompt to the user,recording the at least one spoken input comprises recording a first spoken input spoken by the user in response to the first text prompt,assigning the user to one of the plurality of classes comprises determining whether the user belongs to any class in the first subset of classes or to any class in the second subset of classes by applying the first classifier to one or more features obtained from the first spoken input, andwherein the method further comprises;
  
  presenting a second text prompt to the user if it is determined that the user belongs to any class in the first subset of classes; and
  
  presenting a third prompt to the user if it is determined that the user belongs to any class in the second subset of classes, wherein the second prompt is different from the third prompt.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the at least one classifier further comprises a second classifier different from the first classifier, the first subset of classes comprises a third subset of classes comprising at least one class and a fourth subset of classes comprising at least one class, wherein the third subset of classes is disjoint from the fourth subset, and wherein:
    - recording the at least one spoken input further comprises recording a second spoken input spoken by the user in response to the second text prompt; and
      
      assigning the user to one of the plurality of classes further comprises determining whether the user belongs to any class in the third subset of classes or to any subset in the fourth subset of classes by applying the second classifier to the one or more features obtained from the first spoken input and/or one or more features obtained from the second spoken input.
  - 3. The method of claim 1, wherein using the at the least one classifier comprises using at least one classifier trained by using recordings of spoken input spoken by a plurality of speakers in response to being presented with the at least one text prompt.
  - 4. The method of claim 1, wherein the one or more features comprise at least one feature selected from the group consisting of presence of one or more substitution errors, presence of one or more insertion errors, and presence of one or more deletion errors.

5. A method for assessing pronunciation abilities of a user, the method comprising:
- presenting at least one text prompt to the user;
  
  recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a first classifier, wherein the plurality of classes comprises a first subset of classes comprising at least one class and a second subset of classes comprising at least one class, wherein the first subset of classes is disjoint from the second subset of classes, and wherein;
  
  presenting the at least one text prompt comprises presenting a first text prompt to the user,recording the at least one spoken input comprises recording a first spoken input spoken by the user in response to the first text prompt,assigning the user to one of the plurality of classes comprises determining whether the user belongs to any class in the first subset of classes or to any class in the second subset of classes by applying the first classifier to one or more features obtained from the first spoken input,wherein the at least one classifier further comprises a second classifier different from the first classifier, the first subset of classes comprises a third subset of classes comprising at least one class and a fourth subset of classes comprising at least one class, wherein the third subset of classes is disjoint from the fourth subset, and wherein assigning the user to one of the plurality of classes further comprises;
  
  determining whether the user belongs to any class in the third subset of classes or to any subset in the fourth subset of classes by applying the second classifier to one or more features obtained from the first spoken input.

6. A method comprising:
- presenting at least one text prompt to the user;
  
  recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a plurality of binary classifiers hierarchically organized according to a hierarchy, and wherein using the at least one classifier comprises;
  
  applying one or more classifiers in the plurality of binary classifiers, in accordance with the hierarchy, to the one or more features extracted from the at least one recorded spoken input.

7. A system for assessing pronunciation abilities of a user, the system comprising:
- at least one processor programmed to;
  
  present at least one text prompt to the user;
  
  record at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assign, by using at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a first classifier, wherein the plurality of classes comprises a first subset of classes comprising at least one class and a second subset of classes comprising at least one class, wherein the first subset of classes is disjoint from the second subset of classes,wherein the at least one processor is programmed to;
  
  present the at least one text prompt at least by presenting a first text prompt to the user;
  
  record the at least one spoken input at least by recording a first spoken input spoken by the user in response to the first text prompt; and
  
  assign the user to one of the plurality of classes at least by determining whether the user belongs to any class in the first subset of classes or to any class in the second subset of classes by applying the first classifier to one or more features obtained from the first spoken input, andwherein the at least one processor is further programmed to;
  
  present a second text prompt to the user if it is determined that the user belongs to any class in the first subset of classes; and
  
  present a third prompt to the user if it is determined that the user belongs to any class in the second subset of classes, wherein the second prompt is different from the third prompt.
- View Dependent Claims (8, 9)
- - 8. The system of claim 7, wherein the at least one classifier further comprises a second classifier different from the first classifier, the first subset of classes comprises a third subset of classes comprising at least one class and a fourth subset of classes comprising at least one class, wherein the third subset of classes is disjoint from the fourth subset, and wherein the at least one processor is further programmed to:
    - record the at least one spoken input at least by recording a second spoken input spoken by the user in response to the second text prompt; and
      
      assign the user to one of the plurality of classes at least by determining whether the user belongs to any class in the third subset of classes or to any subset in the fourth subset of classes by applying the second classifier to the one or more features obtained from the first spoken input and/or one or more features obtained from the second spoken input.
  - 9. The system of claim 7, wherein the user is a call center agent.

10. A system for assessing pronunciation abilities of a user, the system comprising:
- at least one processor programmed to;
  
  present at least one text prompt to the user;
  
  record at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assign, by using at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a plurality of binary classifiers hierarchically organized according to a hierarchy, and wherein the at least one processor is programmed to use the at least one classifier by applying one or more classifiers in the plurality of binary classifiers, in accordance with the hierarchy, to the one or more features extracted from the at least one recorded spoken input.

11. At least one computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a method for assessing pronunciation abilities of a user, the method comprising:
- presenting at least one text prompt to the user;
  
  recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a first classifier, wherein the plurality of classes comprises a first subset of classes comprising at least one class and a second subset of classes comprising at least one class, wherein the first subset of classes is disjoint from the second subset of classes, wherein;
  
  presenting the at least one text prompt comprises presenting a first text prompt to the user,recording the at least one spoken input comprises recording a first spoken input spoken by the user in response to the first text prompt,assigning the user to one of the plurality of classes comprises determining whether the user belongs to any class in the first subset of classes or to any class in the second subset of classes by applying the first classifier to one or more features obtained from the first spoken input, andwherein the method further comprises;
  
  presenting a second text prompt to the user if it is determined that the user belongs to any class in the first subset of classes; and
  
  presenting a third prompt to the user if it is determined that the user belongs to any class in the second subset of classes, wherein the second prompt is different from the third prompt.
- View Dependent Claims (12, 13, 14)
- - 12. The at least one computer-readable storage medium of claim 11, wherein the at least one classifier further comprises a second classifier different from the first classifier, the first subset of classes comprises a third subset of classes comprising at least one class and a fourth subset of classes comprising at least one class, wherein the third subset of classes is disjoint from the fourth subset, and wherein:
    - recording the at least one spoken input further comprises recording a second spoken input spoken by the user in response to the second text prompt; and
      
      assigning the user to one of the plurality of classes comprises determining whether the user belongs to any class in the third subset of classes or to any subset in the fourth subset of classes by applying the second classifier to the one or more features obtained from the first spoken input and/or one or more features obtained from the second spoken input.
  - 13. The at least one computer-readable storage medium of claim 11, wherein using the at least one classifier comprises using at least one support vector machine classifier.
  - 14. The at least one computer-readable storage medium of claim 11, wherein the user is a call center agent.

15. At least one computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a method for assessing pronunciation abilities of a user, the method comprising:
- presenting at least one text promt to the user;
  
  recording at least one spoken input spoken by the user in a language in response to the at least one text prompt; and
  
  automatically assigning, by using at least one processor and at least one classifier, the user to one of a plurality of classes based on one or more features obtained from the at least one recorded spoken input, wherein each of the plurality of classes corresponds to a pronunciation ability in the language,wherein the at least one classifier comprises a plurality of binary classifiers hierarchically organized according to a hierarchy, and wherein using the at least one classifier comprises;
  
  applying one or more classifiers in the plurality of binary classifiers, in accordance with the hierarchy, to the one or more features extracted from the at least one recorded spoken input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Jayadeva, Joshi, Sachindra, Pant, Himanshu, Verma, Ashish
Primary Examiner(s)
Smits, Talivaldis Ivars

Application Number

US12/147,898
Publication Number

US 20090171661A1
Time in Patent Office

1,544 Days
Field of Search

None
US Class Current

704/250
CPC Class Codes

G09B 19/04 Speaking with audible prese...

G10L 15/26 Speech to text systems G10L...

Method for assessing pronunciation abilities

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

15 Claims

Specification

Use Cases

Quick Links

Others

Method for assessing pronunciation abilities

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

15 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others