System and method for computing and transmitting parameters in a distributed voice recognition system
First Claim
1. In a voice recognition system comprising a front end and a back end a feature extraction module, comprising:
- a processing sub-module; and
a feature extraction sub-module communicatively coupled to said processing sub-module;
wherein a digital signal provided from said processing sub-module is downsampled in a downsampling module.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server . The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server . The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.
95 Citations
109 Claims
-
1. In a voice recognition system comprising a front end and a back end a feature extraction module, comprising:
-
a processing sub-module; and
a feature extraction sub-module communicatively coupled to said processing sub-module;
wherein a digital signal provided from said processing sub-module is downsampled in a downsampling module. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
-
-
19. In a voice recognition system comprising a front end and a back end a voice activity detection module, comprising:
-
a processing sub-module; and
a voice activity detection sub-module communicatively coupled to said processing sub-module;
wherein a digital signal provided from said processing sub-module is downsampled in a downsampling module.
-
-
35. A voice recognition system comprising a front end and a back end, comprising:
-
a processing sub-module;
a feature extraction sub-module communicatively coupled to said processing sub-module, wherein a digital signal provided from said processing sub-module is downsampled in a first downsampling module; and
a voice activity detection sub-module communicatively coupled to said processing sub-module, wherein the digital signal provided from said processing sub-module is downsampled in a second downsampling module.
-
-
59. A voice recognition system comprising a front end and a back end, comprising:
-
a framing module;
a windowing module communicatively coupled to said framing module;
a first transformation module communicatively coupled to said windowing module;
a power spectrum module communicatively coupled to said first transformation module;
a first filtering module communicatively coupled to said power spectrum module;
a second transformation module communicatively coupled to said first filtering module;
a second filter module communicatively coupled to said second transformation module;
a third filter module communicatively coupled to said second filter module;
a first downsampling module communicatively coupled to said second filter module;
a third transformation module communicatively coupled to said first downsampling module;
a normalization module communicatively coupled to said third transformation module. a compressor module communicatively coupled to said normalization module;
a bitstream processor communicatively coupled to said compressor module;
a second downsampling module communicatively coupled to said second filter module;
a fourth transformation module communicatively coupled to said second downsampling module;
an estimation module communicatively coupled to said fourth transformation module;
a threshold detector communicatively coupled to said estimation module;
a fourth filter module communicatively coupled to said threshold detector.
-
-
60. A method for extracting at least one feature from a speech signal, comprising:
-
processing a speech signal;
downsampling said processed speech signal to provide a downsampled signal; and
extracting the at least one feature from said downsampled signal. - View Dependent Claims (61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71)
-
-
72. A method for voice activity detection, comprising:
-
processing a speech signal;
downsampling said processed speech signal to provide a downsampled signal; and
detecting voice activity of said downsampled signal. - View Dependent Claims (73, 74, 75, 76, 77, 78, 79, 80, 81)
-
-
82. A method for determining speech signal characteristics, comprising:
-
processing a speech signal;
downsampling said processed speech signal by a first value to provide a first downsampled signal;
extracting the at least one feature from said first downsampled signal;
downsampling said processed speech signal by a second value to provide a second downsampled signal; and
detecting voice activity from said second downsampled signal. - View Dependent Claims (83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)
-
-
101. A system for processing speech, comprising:
-
a terminal feature extraction submodule for extracting at least one feature from the speech; and
a terminal compression module for distinguishing the presence of voice activity from silence in the speech to determine voice activity data, compressing the at least one feature, and selectively combining and transmitting the at least one feature with selected voice activity data. - View Dependent Claims (102, 103, 104)
-
-
105. A distributed voice recognition system for transmitting speech activity, comprising:
-
a subscriber unit, comprising;
a processing/feature extraction element receiving speech activity and converting the speech activity into features;
a voice activity detector for detecting voice activity within said speech and providing at least one voice activity indication; and
a processor for selectively combining the features with the at least one voice activity indication into advanced front end features; and
a transmitter for transmitting the advanced front end features to a remote device. - View Dependent Claims (106)
-
-
107. A subscriber unit, comprising:
-
means for extracting a plurality of features of a speech signal;
means for detecting voice activity with the speech signal and providing an indication of the detected voice activity; and
a transmitter coupled to the feature extraction means and the voice activity detection means and configured to selectively transmit indication of detected voice activity in selective combination with the plurality of features to a remote device. - View Dependent Claims (108)
-
-
109. A system for generating feature vectors, comprising:
-
a time derivative computation block for computing feature time derivatives;
a feature concatenation block for combining feature time derivatives with features;
a dual branch processor receiving data from said feature concatenation block, comprising;
a first branch, comprising a multiple frame assembly module; and
a second branch comprising a nonlinear transformation module and a dimensionality reduction and decorrelation module; and
a processing concatenation block for concatenating data computed by said first branch and said second branch.
-
Specification