Assessing speech prosody
First Claim
Patent Images
1. A method for assessing speech prosody, comprising:
- receiving, by a computing device, spoken speech, the spoken speech being converted into input speech data representing the spoken speech;
processing, by the computing device, the input speech data to acquire an input language structure that corresponds to the input speech data and that represents part of speech role of words of the spoken speech;
obtaining, from a corpus of standard speech data comprising at least one example of standard speech data having a matching language structure as at least a portion of the input speech data, a language structure of standard speech;
traversing a decision tree that corresponds to the language structure of standard speech based on at least a portion of the input language structure to identify, for a word in the input language structure, an occurrence probability of phrase boundary location at the word, wherein a leaf node of the decision tree identifies a determined occurrence probability of phrase boundary location for a part of speech based on a first adjacent part of speech to the left of the part of speech and a second adjacent part of speech to the right of the part of speech;
acquiring a rhythm feature and a fluency feature of the input speech data based, at least in part, on the occurrence probability of phrase boundary location for the word;
acquiring, from the corpus of standard speech data, a prosody constraint based on the rhythm feature and the fluency feature;
assessing prosody of the input speech data according to the prosody constraint;
providing an assessment result based on the prosody constraint; and
the corpus of standard speech data or outputting reference speech that indicates a correct way to say the spoken speech.
3 Assignments
0 Petitions
Accused Products
Abstract
A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.
-
Citations
22 Claims
-
1. A method for assessing speech prosody, comprising:
-
receiving, by a computing device, spoken speech, the spoken speech being converted into input speech data representing the spoken speech; processing, by the computing device, the input speech data to acquire an input language structure that corresponds to the input speech data and that represents part of speech role of words of the spoken speech; obtaining, from a corpus of standard speech data comprising at least one example of standard speech data having a matching language structure as at least a portion of the input speech data, a language structure of standard speech; traversing a decision tree that corresponds to the language structure of standard speech based on at least a portion of the input language structure to identify, for a word in the input language structure, an occurrence probability of phrase boundary location at the word, wherein a leaf node of the decision tree identifies a determined occurrence probability of phrase boundary location for a part of speech based on a first adjacent part of speech to the left of the part of speech and a second adjacent part of speech to the right of the part of speech; acquiring a rhythm feature and a fluency feature of the input speech data based, at least in part, on the occurrence probability of phrase boundary location for the word; acquiring, from the corpus of standard speech data, a prosody constraint based on the rhythm feature and the fluency feature; assessing prosody of the input speech data according to the prosody constraint; providing an assessment result based on the prosody constraint; and the corpus of standard speech data or outputting reference speech that indicates a correct way to say the spoken speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for assessing speech prosody, comprising:
-
one or more processors; an input speech data an audio receiver configured to receive spoken speech; and memory storing instructions that, when executed by one of the processors, cause the system to convert the spoken speech into input speech data representing the spoken speech, process the input speech data to acquire an input language structure that corresponds to the input speech data and that represents part of speech role of words of the spoken speech, obtain, from a corpus of standard speech data comprising at least one example of standard speech data having a matching language structure as at least a portion of the input speech data, a language structure of standard speech, traverse a decision tree that corresponds to the language structure of standard speech based on at least a portion of the input language structure to identify, for a word in the input language structure, an occurrence probability of phrase boundary location at the word, wherein a leaf node of the decision tree identifies a determined occurrence probability of phrase boundary location for a part of speech based on a first adjacent part of speech to the left of the part of speech and a second adjacent part of speech to the right of the part of speech, acquire a rhythm feature and a fluency feature of the input speech data based, at least in part, on the occurrence probability of phrase boundary location for the word, acquire, from the corpus of standard speech data, a prosody constraint based on the rhythm feature and the fluency feature, assess prosody of the input speech data according to the prosody constraint, provide an assessment result based on the prosody constraint, and based on the assessment result, either add the input speech data to the corpus of standard speech data or output reference speech that indicates a correct way to say the spoken speech. - View Dependent Claims (18, 19, 20)
-
-
21. A computer-implemented method for assessing speech prosody comprising:
-
receiving, by a computing device, spoken speech, the spoken speech being converted into input speech data representing the spoken speech; processing, by the computing device, the input speech data to acquire an input language structure that corresponds to the input speech data and that represents part of speech role of words of the spoken speech; obtaining, from a corpus of standard speech data comprising at least one example of standard speech data having a matching language structure as at least a portion of the input speech data, a language structure of standard speech; obtaining traversing a decision tree that corresponds to the language structure of standard speech based on at least a portion of the input language structure to identify, for a word in the input language structure, an occurrence probability of phrase boundary location at the word and a silence duration of phrase boundary location at the word, wherein a leaf node of the decision tree identifies a determined occurrence probability of phrase boundary location for a part of speech and a determined average silence duration for the part of speech each based on a first adjacent part of speech to the left of the part of speech and a second adjacent part of speech to the right of the part of speech; acquiring a rhythm feature and a fluency feature of the input speech data, wherein the rhythm feature is acquired based, at least in part, on the occurrence probability of phrase boundary location for the word and wherein the fluency feature is acquired based, at least in part, on the silence duration of phrase boundary location for the word; acquiring, from the corpus of standard speech data, a standard rhythm feature and a standard fluency feature based on the decision tree; performing a first comparison of the rhythm feature to the standard rhythm feature; performing a second comparison of the fluency feature to the standard fluency feature; obtaining a prosody assessment result based on the first and second comparisons; and based on the prosody assessment result, either adding the input speech data to the corpus of standard speech data or outputting reference speech data that indicates a correct way to say the spoken speech. - View Dependent Claims (22)
-
Specification