Linguistic segmentation of speech
First Claim
1. A linguistic segmentation tool comprising:
- a lexical feature extraction component configured to receive text and generate lexical feature vectors relating to the text, the lexical feature vectors including words from the text and syntactic classes of the words;
an acoustic feature extraction component configured to receive an audio version of the text and generate acoustic feature vectors relating to the audio version of the text; and
a statistical framework component configured to generate linguistic features associated with the text based on the acoustic feature vectors and the lexical feature vectors.
4 Assignments
0 Petitions
Accused Products
Abstract
A linguistic segmentation tool (115) includes an acoustic feature extraction component (302) and a lexical feature extraction component (311). The acoustic feature extraction component (302) extracts prosodic features from speech (e.g., pauses, pitch, energy, and rate). The lexical feature extraction component (311) extracts lexical features from a transcribed version of the speech (e.g., words, syntactic classifications of the words, and word structure). A language model is constructed based on the lexical features and an acoustic model is constructed based on the acoustic features. A statistical framework combines the outputs of the language model to generate indications of potential linguistic features.
66 Citations
33 Claims
-
1. A linguistic segmentation tool comprising:
-
a lexical feature extraction component configured to receive text and generate lexical feature vectors relating to the text, the lexical feature vectors including words from the text and syntactic classes of the words;
an acoustic feature extraction component configured to receive an audio version of the text and generate acoustic feature vectors relating to the audio version of the text; and
a statistical framework component configured to generate linguistic features associated with the text based on the acoustic feature vectors and the lexical feature vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for determining linguistic information for words corresponding to a transcribed version of an audio input stream including speech, the method comprising:
-
generating lexical features for the words, including a syntactic class associated with at least one of the words;
generating acoustic features for the audio input stream, the acoustic features being based on at least one of speaker pauses, speaker rate, speaker energy, and speaker pitch; and
generating the linguistic information based on the lexical features and the acoustic features. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computing device for determining linguistic information for words corresponding to a transcribed version of an audio input stream that includes speech, the computing device comprising:
-
a processor; and
a computer memory coupled to the processor and containing programming instructions that when executed by the processor cause the processor to;
generate lexical features for the words, including a syntactic class associated with at least one of the words, generate acoustic features for the audio input stream, the acoustic features being based on at least one of speaker pauses, speaker rate, speaker energy, and speaker pitch, generate the linguistic information based on the lexical features and the acoustic features, and output the generated linguistic information as meta-information embedded in the transcribed version of the audio input stream. - View Dependent Claims (22, 23, 24)
-
-
25. A method for associating meta-information with a document transcribed from speech, the method comprising:
-
building a language model based on lexical feature vectors extracted from the document, the lexical feature vectors including words and syntactic classifications of the words;
building an acoustic model based on acoustic feature vectors extracted from the speech; and
combining outputs of the language model and the acoustic model in a statistical framework that estimates a probability for associating the meta-information with the document. - View Dependent Claims (26, 27, 28, 29, 30, 31)
-
-
32. A device comprising:
-
means for building a language model based on lexical feature vectors extracted from a document transcribed from human speech, the lexical feature vectors including a word and a syntactic classification of the word;
means for building an acoustic model based on acoustic feature vectors extracted from the speech; and
means for combining outputs of the language model and the acoustic model to estimate a probability for associating a linguistic feature with the document.
-
-
33. A computer-readable medium containing program instructions for execution by a processor, the program instructions, when executed by the processor, cause the processor to perform a method comprising:
-
generating lexical features for words corresponding to a transcribed version of speech, the lexical features including a syntactic class associated with at least one of the words;
generating acoustic features for the speech, the acoustic features based on at least one of speaker pauses, speaker rate, speaker energy, and speaker pitch; and
generating linguistic information for the words based on the lexical features and the acoustic features.
-
Specification