Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

US 8,494,849 B2
Filed: 06/20/2005
Issued: 07/23/2013
Est. Priority Date: 06/20/2005
Status: Active Grant

First Claim

Patent Images

1. A method of transmitting speech data to a remote device in a distributed speech recognition system, comprising:

dividing an input speech signal into frames;

calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame; and

grouping said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred to said remote device;

comprising the steps of;

calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe;

comparing the number of frames representing speech activity in a current multiframe with a first threshold;

selectively transmitting the current multiframe to said remote device on the basis of said voice activity marker associated with the current multiframe, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity; and

if the number of frames representing speech activity in the current multiframe is less than said first threshold, comparing a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity,wherein said second threshold is less than said first threshold, andwherein said plurality of frames is less than the total number of frames in the current multiframe, andselectively transmitting the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.

Citations

17 Claims

1. A method of transmitting speech data to a remote device in a distributed speech recognition system, comprising:
- dividing an input speech signal into frames;
  
  calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame; and
  
  grouping said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred to said remote device;
  
  comprising the steps of;
  
  calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe;
  
  comparing the number of frames representing speech activity in a current multiframe with a first threshold;
  
  selectively transmitting the current multiframe to said remote device on the basis of said voice activity marker associated with the current multiframe, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity; and
  
  if the number of frames representing speech activity in the current multiframe is less than said first threshold, comparing a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity,wherein said second threshold is less than said first threshold, andwherein said plurality of frames is less than the total number of frames in the current multiframe, andselectively transmitting the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein comparing a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity comprises:
    - comparing the number of frames representing speech activity in said tail portion of the current multiframe with said second threshold; and
      
      if the number of frames representing speech activity in said tail portion is greater than said second threshold, performing the steps of;
      
      associating a first value with the marker of said current multiframe, indicative of speech activity, if a multiframe immediately preceding the current multiframe had associated therewith a marker of the first value; and
      
      associating a second value with the marker of said current multiframe, indicative of no speech activity, if said multiframe immediately preceding the current multiframe had associated therewith a marker of the second value.
  - 3. The method of claim 2, further comprising, if the number of frames representing speech activity in said tail portion is lower than or equal to said second threshold:
    - comparing the number of frames representing speech activity in said head portion of the current multiframe with a third threshold; and
      
      if the number of frames representing speech activity in said head portion is lower than said third threshold, associating a second value with the marker of said current multiframe, indicative of no speech activity.
  - 4. The method of claim 3, comprising, in case the number of frames representing speech activity in said head portion is greater than or equal to said third threshold, the steps of:
    - associating a first value with the marker of said current multiframe, indicative of speech activity, if a multiframe immediately subsequent to the current multiframe has associated therewith a marker of the first value; and
      
      associating a second value with the marker of said current multiframe, indicative of no speech activity, if said multiframe immediately subsequent to the current multiframe has associated therewith a marker of the second value.
  - 5. The method of claim 4, wherein each multiframe comprises 24 frames, of which 8 initial frames represent said tail portion and 8 last frames represent said head portion.
  - 6. The method of claim 1, wherein said voice activity value is a voice activity detection flag computed on every speech frame of said input speech signal.
  - 7. The method of claim 1, comprising a step of:
    - calculating a background noise average energy on a plurality of initial multiframes of said input speech signal;
      
      comparing said background noise average energy with a background noise energy threshold; and
      
      if said background noise average energy is below said background noise energy threshold, setting, as said voice activity value, a voice activity detection flag computed on every speech frame of said input speech signal.
  - 8. The method of claim 7, comprising, in case said background noise average energy is over said background noise energy threshold, the steps of:
    - calculating, for each frame, a frame energy representative of the energy of the corresponding portion of the input speech signal;
      
      comparing said frame energy with an energy threshold, setting a frame energy flag to the value “
      
      1”
      
      if said frame energy is over said energy threshold and to the value “
      
      0”
      
      otherwise; and
      
      setting, as said voice activity value, said frame energy flag.
  - 9. The method of claim 8, wherein said energy threshold is computed by applying a multiplicative coefficient to said background noise average energy.
  - 10. A distributed speech recognition-system comprising a front-end module for performing processing of a speech signal and a back-end module for carrying out recognition on said processed speech signal, said front-end module being capable of operating according to the method of claim 1.
  - 11. A non-transitory computer-readable storage medium having stored thereon a program comprising software code portions capable of performing, when the computer program is run on a speech recognition system, the method of claim 1.

12. A user terminal comprising a front-end module of a speech recognition system distributed over a communications network, said front-end module comprising:
- a feature extraction block configured to divide an input speech signal into frames, and configured to calculate, for each frame, a voice activity value representative of the presence of speech activity in said frame;
  
  a bitstream formatting block configured to group said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred from said front-end module to a remote back-end module of said distributed speech recognition system;
  
  a marker block configured to calculate, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe; and
  
  a decision block configured to compare the number of frames representing speech activity in a current multiframe with a first threshold and to selectively transmit the current multiframe on the basis of said voice activity marker associated with the current multiframe over said communications network to the remote back-end module of said distributed speech recognition system, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity, andif the number of frames representing speech activity in the current multiframe is less than said first threshold, said decision block further configured to compare a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity,wherein said second threshold is less than said first threshold, andwherein said plurality of frames is less than the total number of frames in the current multiframe, andsaid decision block further configured to selectively transmit the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The user terminal of claim 12, wherein said decision block is further configured to eliminate bitstream regions corresponding to the multiframes having associated therewith a marker indicating no speech activity.
  - 14. The user terminal of claim 13, wherein said decision block is further configured to renumber the transmitted multiframes in order to provide the remote back-end module with a coherent bitstream.
  - 15. The user terminal of claim 13, wherein said marker block comprises a module configured to calculate a background noise average energy on a plurality of initial multiframes of said input speech signal, and to compare said background noise average energy with a background noise energy threshold.
  - 16. The user terminal of claim 15, wherein, if said background noise average energy is below said background noise energy threshold, said voice activity value is a voice activity detection flag computed by said feature extraction block on every speech frame of said input speech signal.
  - 17. The user terminal of claim 15, wherein, if said background noise average energy is over said background noise energy threshold, said voice activity value is a frame energy flag obtained, for each frame, by comparing a frame energy, computed by said feature extraction block and representative of the energy of a corresponding portion of the input speech signal, with an energy threshold, and setting said frame energy flag to the value “
    - 1”
      
      if said frame energy is over said energy threshold and to the value “
      
      0”
      
      otherwise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Telecom Italia S.p.A.
Original Assignee
Telecom Italia S.p.A.
Inventors
Collotta, Ivano Salvatore, Ettorre, Donato, Fodrini, Maurizio, Gallo, Pierluigi, Spagnolo, Roberto
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US11/922,500
Publication Number

US 20090222263A1
Time in Patent Office

2,955 Days
Field of Search

704/216, 704/233, 704/462, 704/220, 704/266, 704/258, 704/253, 704/245, 704/223, 704/222, 704/221, 704/219, 704/211, 704/208, 370/528, 370/462, 370/347, 370/315, 379/269, 375/369, 375/317, 375/219, 846/22, 846/09
US Class Current

704/233
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links