Speech processing device and speech processing method
First Claim
1. A speech processing device, comprising:
- a speech detector that detects speech of individual speakers from acoustic signals;
a total-amount-of-speech calculator that calculates, for each of all pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment;
an established-conversation calculator that calculates, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech;
a long-time feature calculator that calculates, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and
a conversational-partner determining unit that extracts a conversation group holding conversation from the speakers, on the basis of the calculated long-time features, whereinthe established-conversation calculator excludes, for each of the pairs of the speakers, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold from the calculation of the long-time feature for the pair of the speakers, andthe conversational-partner determining unit determines that the speakers of the pair with the long-time feature greater than or equal to a second threshold belong to the same conversation group.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech processing device which can accurately extract a conversation group from among a plurality of speakers, even when a conversation group formed of three or more people is present. This device (400) comprises: a spontaneous speech detection unit (420) and a direction-specific speech detection unit (430) which separately detect, from a sound signal, uttered speech from the speakers; a conversation establishment level calculation unit (450) which calculates a conversation establishment level for each separated segment of the time being determined, for all of the pairings of two people, on the basis of the detected uttered speech; an extended-period characteristic amount calculation unit (460) which calculates an extended-period characteristic amount for the conversation establishment level of the time being determined, for each pairing; and a conversation-partner determination unit (470) which extracts a conversation group which forms a conversation on the basis of the calculated extended-period characteristic amount.
11 Citations
8 Claims
-
1. A speech processing device, comprising:
-
a speech detector that detects speech of individual speakers from acoustic signals; a total-amount-of-speech calculator that calculates, for each of all pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment; an established-conversation calculator that calculates, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech; a long-time feature calculator that calculates, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and a conversational-partner determining unit that extracts a conversation group holding conversation from the speakers, on the basis of the calculated long-time features, wherein the established-conversation calculator excludes, for each of the pairs of the speakers, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold from the calculation of the long-time feature for the pair of the speakers, and the conversational-partner determining unit determines that the speakers of the pair with the long-time feature greater than or equal to a second threshold belong to the same conversation group. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech processing method, comprising:
-
detecting speech of individual speakers from acoustic signals; calculating, for each of all of pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment; calculating, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech; calculating, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and extracting a conversation group holding conversation from the speakers on the basis of the calculated long-time features, wherein for each of the pairs of the speakers in said calculating the degree of established conversation, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold is excluded from the calculation of the long-time feature of the pair of the speakers, and in said extracting the conversation group, the speakers of the pair of speakers with the long-time feature greater than or equal to a second threshold are determined to belong to the same conversation group.
-
Specification