System and method for merging audio data streams for use in speech recognition applications
First Claim
Patent Images
1. A method for merging at least a first and second audio data stream for use in a speech recognition application, the method comprising:
- transforming the first audio data stream from a time domain to a frequency domain;
transforming the second audio data stream from the time domain function to the frequency domain;
determining a first feature data set for the first transformed audio stream for a first range of frequencies;
determining a second feature data set for the second transformed audio stream for a second range of frequencies; and
combining predetermined feature data from the first and second feature data sets to form a merged feature data set;
wherein the first, second and merged feature data sets each have a zeroeth cepstral coefficient and an equal number (N) of additional cepstral coefficients, and the merged feature data set is formed by selecting only a predetermined ratio of only lowest numbered additional cepstral coefficients from the first and second feature data sets; and
wherein the additional cepstral coefficients of the merged feature data set include, from lowest to highest, first all of the selected additional cepstral coefficients from the first feature data set and then all of the selected additional cepstral coefficients from the second feature data set.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.
-
Citations
13 Claims
-
1. A method for merging at least a first and second audio data stream for use in a speech recognition application, the method comprising:
-
transforming the first audio data stream from a time domain to a frequency domain; transforming the second audio data stream from the time domain function to the frequency domain; determining a first feature data set for the first transformed audio stream for a first range of frequencies; determining a second feature data set for the second transformed audio stream for a second range of frequencies; and combining predetermined feature data from the first and second feature data sets to form a merged feature data set; wherein the first, second and merged feature data sets each have a zeroeth cepstral coefficient and an equal number (N) of additional cepstral coefficients, and the merged feature data set is formed by selecting only a predetermined ratio of only lowest numbered additional cepstral coefficients from the first and second feature data sets; and wherein the additional cepstral coefficients of the merged feature data set include, from lowest to highest, first all of the selected additional cepstral coefficients from the first feature data set and then all of the selected additional cepstral coefficients from the second feature data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification