Method and Apparatus for Transmitting Speech Data To a Remote Device In a Distributed Speech Recognition System

US 20090222263A1
Filed: 06/20/2005
Published: 09/03/2009
Est. Priority Date: 06/20/2005
Status: Active Grant

First Claim

Patent Images

1-18. -18. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.

Citations

36 Claims

1-18. -18. (canceled)

19. A method of transmitting speech data to a remote device in a distributed speech recognition system, comprising:
- dividing an input speech signal into frames;
  
  calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame; and
  
  grouping said frames into multiframes, each multiframe comprising a predetermined number of frames;
  
  comprising the steps of;
  
  calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe; and
  
  selectively transmitting, on the basis of said voice activity marker associated with each multiframe, said multiframes to said remote device.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 35, 36)
- - 20. The method of claim 19, wherein said step of calculating said voice activity marker comprises:
    - comparing the number of frames representing speech activity in a current multiframe with a first threshold;
      
      if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold, associating a first value with said marker, indicative of speech activity.
  - 21. The method of claim 20, comprising, in case the number of frames representing speech activity in the current multiframe is lower than said first threshold, the steps of:
    - comparing the number of frames representing speech activity in a tail portion of the current multiframe with a second threshold; and
      
      if the number of frames representing speech activity in said tail portion is greater than said second threshold, performing the steps of;
      
      associating a first value with the marker of said current multiframe, indicative of speech activity, if a multiframe immediately preceding the current multiframe had associated therewith a marker of the first value; and
      
      associating a second value with the marker of said current multiframe, indicative of no speech activity, if said multiframe immediately preceding the current multiframe had associated therewith a marker of the second value.
  - 22. The method of claim 21, comprising, in case the number of frames representing speech activity in said tail portion is lower than or equal to said second threshold, the steps of:
    - comparing the number of frames representing speech activity in a head portion of the current multiframe with a third threshold; and
      
      if the number of frames representing speech activity in said head portion is lower than said third threshold, associating a second value with the marker of said current multiframe, indicative of no speech activity.
  - 23. The method of claim 22, comprising, in case the number of frames representing speech activity in said head portion is greater than or equal to said third threshold, the steps of:
    - associating a first value with the marker of said current multiframe, indicative of speech activity, if a multiframe immediately subsequent to the current multiframe has associated therewith a marker of the first value; and
      
      associating a second value with the marker of said current multiframe, indicative of no speech activity, if said multiframe immediately subsequent to the current multiframe has associated therewith a marker of the second value.
  - 24. The method of claim 23, wherein each multiframe comprises 24 frames, of which 8 initial frames represent said tail portion and 8 last frames represent said head portion.
  - 25. The method of claim 19, wherein said voice activity value is a voice activity detection flag computed on every speech frame of said input speech signal.
  - 26. The method of claim 19, comprising a step of:
    - calculating a background noise average energy on a plurality of initial multiframes of said input speech signal;
      
      comparing said background noise average energy with a background noise energy threshold; and
      
      if said background noise average energy is below said background noise energy threshold, setting, as said voice activity value, a voice activity detection flag computed on every speech frame of said input speech signal.
  - 27. The method of claim 26, comprising, in case said background noise average energy is over said background noise energy threshold, the steps of:
    - calculating, for each frame, a frame energy representative of the energy of the corresponding portion of the input speech signal;
      
      comparing said frame energy with an energy threshold, setting a frame energy flag to the value “
      
      1”
      
      if said frame energy is over said energy threshold and to the value “
      
      0”
      
      otherwise; and
      
      setting, as said voice activity value, said frame energy flag.
  - 28. The method of claim 27, wherein said energy threshold is computed by applying a multiplicative coefficient to said background noise average energy.
  - 35. A distributed speech recognition-system comprising a front-end module for performing processing of a speech signal and a back-end module for carrying out recognition on said processed speech signal, said front-end module being capable of operating according to the method of claim 19.
  - 36. A computer program product, loadable in the memory of at least one computer and comprising software code portions capable of performing the method of claim 19.

29. A user terminal comprising a front-end module of a speech recognition system distributed over a communications network, said front end module comprising:
- a feature extraction block for dividing an input speech signal into frames, and for calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame;
  
  a bitstream formatting block for grouping said frames into multiframes, each multiframe comprising a predetermined number of frames;
  
  said front-end module further comprising;
  
  a marker block for calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe; and
  
  a decision block for selectively transmitting, on the basis of said voice activity marker associated with each multiframe, said multiframes over said communications network to a remote back-end module of said distributed speech recognition system.
- View Dependent Claims (30, 31, 32, 33, 34)
- - 30. The user terminal of claim 29, wherein said decision block eliminates bitstream regions corresponding to the multiframes having associated therewith a marker indicating no speech activity.
  - 31. The user terminal of claim 30, wherein said decision block renumbers the transmitted multiframes in order to provide the remote back-end module with a coherent bitstream.
  - 32. The user terminal of claim 30, wherein said marker block comprises a module for calculating a background noise average energy on a plurality of initial multiframes of said input speech signal, and for comparing said background noise average energy with a background noise energy threshold.
  - 33. The user terminal of claim 32, wherein, in case said background noise average energy is below said background noise energy threshold, said voice activity value is a voice activity detection flag computed by said feature extraction block on every speech frame of said input speech signal.
  - 34. The user terminal of claim 32, wherein, in case said background noise average energy is over said background noise energy threshold, said voice activity value is a frame energy flag obtained, for each frame, by comparing a frame energy, computed by said feature extraction block and representative of the energy of a corresponding portion of the input speech signal, with an energy threshold, and setting said frame energy flag to the value “
    - 1”
      
      if said frame energy is over said energy threshold and to the value “
      
      0”
      
      otherwise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Telecom Italia S.p.A.
Original Assignee
Telecom Italia S.p.A.
Inventors
Gallo, Pierluigi, Spagnolo, Roberto, Fodrini, Maurizio, Ettorre, Donato, Collotta, Ivano Salvatore

Granted Patent

US 8,494,849 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 25/78 Detection of presence or ab...

Method and Apparatus for Transmitting Speech Data To a Remote Device In a Distributed Speech Recognition System

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Method and Apparatus for Transmitting Speech Data To a Remote Device In a Distributed Speech Recognition System

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links