Speech processing device and speech processing method

US 9,064,501 B2
Filed: 09/14/2011
Issued: 06/23/2015
Est. Priority Date: 09/28/2010
Status: Active Grant

First Claim

Patent Images

1. A speech processing device, comprising:

a speech detector that detects speech of individual speakers from acoustic signals;

a total-amount-of-speech calculator that calculates, for each of all pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment;

an established-conversation calculator that calculates, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech;

a long-time feature calculator that calculates, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and

a conversational-partner determining unit that extracts a conversation group holding conversation from the speakers, on the basis of the calculated long-time features, whereinthe established-conversation calculator excludes, for each of the pairs of the speakers, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold from the calculation of the long-time feature for the pair of the speakers, andthe conversational-partner determining unit determines that the speakers of the pair with the long-time feature greater than or equal to a second threshold belong to the same conversation group.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech processing device which can accurately extract a conversation group from among a plurality of speakers, even when a conversation group formed of three or more people is present. This device (400) comprises: a spontaneous speech detection unit (420) and a direction-specific speech detection unit (430) which separately detect, from a sound signal, uttered speech from the speakers; a conversation establishment level calculation unit (450) which calculates a conversation establishment level for each separated segment of the time being determined, for all of the pairings of two people, on the basis of the detected uttered speech; an extended-period characteristic amount calculation unit (460) which calculates an extended-period characteristic amount for the conversation establishment level of the time being determined, for each pairing; and a conversation-partner determination unit (470) which extracts a conversation group which forms a conversation on the basis of the calculated extended-period characteristic amount.

11 Citations

View as Search Results

8 Claims

1. A speech processing device, comprising:
- a speech detector that detects speech of individual speakers from acoustic signals;
  
  a total-amount-of-speech calculator that calculates, for each of all pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment;
  
  an established-conversation calculator that calculates, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech;
  
  a long-time feature calculator that calculates, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and
  
  a conversational-partner determining unit that extracts a conversation group holding conversation from the speakers, on the basis of the calculated long-time features, whereinthe established-conversation calculator excludes, for each of the pairs of the speakers, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold from the calculation of the long-time feature for the pair of the speakers, andthe conversational-partner determining unit determines that the speakers of the pair with the long-time feature greater than or equal to a second threshold belong to the same conversation group.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The speech processing device according to claim 1, whereinthe acoustic signals are acoustic signals of speech received by a speech receiving section having variable directivity, the speech receiving section being disposed close to a user being one of the speakers, andthe speech processing device further comprises an output sound controller that controls the directivity of the speech receiving section toward one of the speakers other than the user of the conversation group if the extracted conversation group includes the user.
  - 3. The speech processing device according to claim 2, whereinthe output sound controller performs predetermined signal processing on the acoustic signals and outputs the acoustic signals after the predetermined signal processing to a speaker of a hearing aid on the user.
  - 4. The speech processing device according to claim 2, whereinthe speech detector detects speech of a speaker sitting in each of predetermined directions relative to the user, andthe output sound controller controls the directivity of the speech receiving section toward one of the speakers other than the user in the extracted conversation group.
  - 5. The speech processing device according to claim 1, whereinif the long-time features are uniformly high in several pairs of all the pairs, the conversational-partner determining unit determines that the speakers of the several pairs belong to the same conversation group.
  - 6. The speech processing device according to claim 1, whereinif a difference between the highest long-time feature and the second highest long-time feature is equal to or greater than a predetermined threshold in a pair including a user, the conversational-partner determining unit determines a speaker other than the user corresponding to the highest long-time feature to be an only conversational partner of the user.
  - 7. The speech processing device according to claim 1, wherein the determination time period is a period from the last start of conversation in which the user participates to a current time.

8. A speech processing method, comprising:
- detecting speech of individual speakers from acoustic signals;
  
  calculating, for each of all of pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment;
  
  calculating, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech;
  
  calculating, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and
  
  extracting a conversation group holding conversation from the speakers on the basis of the calculated long-time features, whereinfor each of the pairs of the speakers in said calculating the degree of established conversation, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold is excluded from the calculation of the long-time feature of the pair of the speakers, andin said extracting the conversation group, the speakers of the pair of speakers with the long-time feature greater than or equal to a second threshold are determined to belong to the same conversation group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Original Assignee
Panasonic Intellectual Property Management Co., Ltd. (Panasonic Holdings Corporation)
Inventors
Yamada, Maki, Endo, Mitsuru
Primary Examiner(s)
ZHU, RICHARD Z

Application Number

US13/816,502
Publication Number

US 20130144622A1
Time in Patent Office

1,378 Days
Field of Search

381/313
US Class Current

1/1
CPC Class Codes

G10L 2021/02087   the noise being separate sp...

G10L 2021/065   Aids for the handicapped in...

G10L 2025/783   based on threshold decision

G10L 25/00   Speech or voice analysis te...

G10L 25/06   the extracted parameters be...

G10L 25/48   specially adapted for parti...

G10L 25/78   Detection of presence or ab...

H04R 2225/43   Signal processing in hearin...

H04R 25/407   Circuits for combining sign...

H04R 25/552   Binaural

H04R 25/558   Remote control, e.g. of amp...

Speech processing device and speech processing method

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

11 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Speech processing device and speech processing method

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links