Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system
First Claim
1. A method of transmitting speech data to a remote device in a distributed speech recognition system, comprising:
- dividing an input speech signal into frames;
calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame; and
grouping said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred to said remote device;
comprising the steps of;
calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe;
comparing the number of frames representing speech activity in a current multiframe with a first threshold;
selectively transmitting the current multiframe to said remote device on the basis of said voice activity marker associated with the current multiframe, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity; and
if the number of frames representing speech activity in the current multiframe is less than said first threshold, comparing a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity,wherein said second threshold is less than said first threshold, andwherein said plurality of frames is less than the total number of frames in the current multiframe, andselectively transmitting the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.
-
Citations
17 Claims
-
1. A method of transmitting speech data to a remote device in a distributed speech recognition system, comprising:
-
dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in said frame; and grouping said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred to said remote device;
comprising the steps of;calculating, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe; comparing the number of frames representing speech activity in a current multiframe with a first threshold; selectively transmitting the current multiframe to said remote device on the basis of said voice activity marker associated with the current multiframe, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity; and if the number of frames representing speech activity in the current multiframe is less than said first threshold, comparing a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity, wherein said second threshold is less than said first threshold, and wherein said plurality of frames is less than the total number of frames in the current multiframe, and selectively transmitting the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A user terminal comprising a front-end module of a speech recognition system distributed over a communications network, said front-end module comprising:
-
a feature extraction block configured to divide an input speech signal into frames, and configured to calculate, for each frame, a voice activity value representative of the presence of speech activity in said frame; a bitstream formatting block configured to group said frames into multiframes, each multiframe comprising a predetermined number of consecutive frames corresponding to the smallest transmission unit transferred from said front-end module to a remote back-end module of said distributed speech recognition system; a marker block configured to calculate, for each multiframe, a voice activity marker representative of the number of frames in said multiframe having a voice activity value representing speech activity, said voice activity marker being indicative of speech activity in said multiframe; and a decision block configured to compare the number of frames representing speech activity in a current multiframe with a first threshold and to selectively transmit the current multiframe on the basis of said voice activity marker associated with the current multiframe over said communications network to the remote back-end module of said distributed speech recognition system, if the number of frames representing speech activity in the current multiframe is greater than or equal to said first threshold indicative of speech activity, and if the number of frames representing speech activity in the current multiframe is less than said first threshold, said decision block further configured to compare a plurality of frames in at least one of a tail portion and a head portion of the current multiframe with a second threshold indicative of speech activity, wherein said second threshold is less than said first threshold, and wherein said plurality of frames is less than the total number of frames in the current multiframe, and said decision block further configured to selectively transmit the current multiframe to said remote device if the number of frames representing speech activity in the plurality of frames is greater than or equal to said second threshold indicative of speech activity. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification