Non-speech section detecting method and non-speech section detecting device

US 8,798,991 B2
Filed: 11/13/2012
Issued: 08/05/2014
Est. Priority Date: 12/18/2007
Status: Active Grant

First Claim

Patent Images

1. A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the device comprising:

a first calculating part configured to calculate, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis;

a second calculating part configured to calculate, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and configured to judge whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging;

a counting part configured to count a number of variations judged as smaller than or equal to the threshold;

a count judging part configured to judge whether the counted number is greater than or equal to a given value; and

a detecting part configured to detect, when the counted number is judged as greater than or equal to the given value, a section of the sound data as a non-speech section.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.

Citations

5 Claims

1. A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the device comprising:
- a first calculating part configured to calculate, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis;
  
  a second calculating part configured to calculate, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and configured to judge whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging;
  
  a counting part configured to count a number of variations judged as smaller than or equal to the threshold;
  
  a count judging part configured to judge whether the counted number is greater than or equal to a given value; and
  
  a detecting part configured to detect, when the counted number is judged as greater than or equal to the given value, a section of the sound data as a non-speech section.
- View Dependent Claims (2, 3, 4)
- - 2. The non-speech section detecting device according to claim 1, further comprisinga second judging part configured to judge whether any of the variations calculated by the second calculating part exceeds a second threshold greater than said given threshold, whereinwhen the second judging part judges any of the variations as exceeding the second threshold, the detecting part excludes a sound data section including the frames corresponding to a variation which exceeds the second threshold, from being detected as a non-speech section.
  - 3. The non-speech section detecting device according to claim 2, further comprising:
    - a satisfaction counting part configured to count the number of variations which exceed the second threshold;
      
      a given number judging part configured to judge whether the number of variations counted in the satisfaction counting part is smaller than or equal to a third threshold; and
      
      a second detecting part configured to detect, in a case that the number of variations counted in the satisfaction counting part is judged to be less than the third threshold, a section of the sound data is designated as a non-speech section.
  - 4. The non-speech section detecting device according to claim 2, further comprisinga third calculating part configured to calculate a maximum value of at least two of the calculated variations, whereinthe judging part treats the maximum value calculated by the third calculating part, as a variation of the frames corresponding to the at least two calculated variations.

5. A non-speech section detecting method of generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the method comprising:
- calculating, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, or a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis, using a processor;
  
  calculating, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and judging whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging using the processor;
  
  counting a number of variations judged as smaller than or equal to the threshold using the processor;
  
  judging whether the counted number of variations is greater than or equal to a given value using the processor; and
  
  detecting, when the counted number of variations is judged as greater than or equal to the given value, a section of the sound data as a non-speech section using the processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Washio, Nobuyuki, Hayakawa, Shoji
Primary Examiner(s)
YEN, ERIC L

Application Number

US13/675,317
Publication Number

US 20130073281A1
Time in Patent Office

630 Days
Field of Search

704/208, 704/210, 704/248
US Class Current

704/208
CPC Class Codes

G10L 2025/783 based on threshold decision

G10L 25/78 Detection of presence or ab...

Non-speech section detecting method and non-speech section detecting device

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Non-speech section detecting method and non-speech section detecting device

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links