PROSODIC AND LEXICAL ADDRESSEE DETECTION
First Claim
1. A method for addressee detection, comprising:
- receiving an utterance;
extracting prosodic features directly from a spoken signal corresponding to the utterance;
determining patterns of the prosodic features that characterize a speaking style of the utterance as either a human-computer (H-C) or a human-human (H-H) style, indicating whether speech is directed to a computer or another human;
characterizing a degree to which a new utterance conforms to the H-C or H-H styles; and
using the new utterance according to the characterization of the speaking style.
3 Assignments
0 Petitions
Accused Products
Abstract
Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.
-
Citations
20 Claims
-
1. A method for addressee detection, comprising:
-
receiving an utterance; extracting prosodic features directly from a spoken signal corresponding to the utterance; determining patterns of the prosodic features that characterize a speaking style of the utterance as either a human-computer (H-C) or a human-human (H-H) style, indicating whether speech is directed to a computer or another human; characterizing a degree to which a new utterance conforms to the H-C or H-H styles; and using the new utterance according to the characterization of the speaking style. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable medium storing computer-executable instructions for addressee detection, comprising:
-
receiving an utterance; extracting prosodic features directly from a spoken signal corresponding to the utterance; determining patterns of the prosodic features that characterize a speaking style of the utterance as either a human-computer (H-C) or a human-human (H-H) style, indicating whether speech is directed to a computer or another human; characterizing a degree to which a new utterance conforms to the H-C or H-H styles; and using the new utterance according to the characterization of the speaking style. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system for addressee detection, comprising:
-
a processor and memory; an operating environment executing using the processor; a display; and an addressee manager that is configured to perform actions comprising; receiving an utterance; extracting prosodic features directly from a spoken signal corresponding to the utterance; determining patterns of the prosodic features characterizing a speaking style of the utterance as directed toward a computer when the patterns indicate the utterance is directed to a computer; characterizing the speaking style of the utterance as directed toward a human when the patterns indicate the utterance is directed to a human; using a new utterance according to the characterization of the speaking style; and using a language model that is trained using a combination of out-of-domain data and in-domain data. - View Dependent Claims (18, 19, 20)
-
Specification