System and method for merging audio data streams for use in speech recognition applications

US 8,731,923 B2
Filed: 08/20/2010
Issued: 05/20/2014
Est. Priority Date: 08/20/2010
Status: Active Grant

First Claim

Patent Images

1. A method for merging at least a first and second audio data stream for use in a speech recognition application, the method comprising:

transforming the first audio data stream from a time domain to a frequency domain;

transforming the second audio data stream from the time domain function to the frequency domain;

determining a first feature data set for the first transformed audio stream for a first range of frequencies;

determining a second feature data set for the second transformed audio stream for a second range of frequencies; and

combining predetermined feature data from the first and second feature data sets to form a merged feature data set;

wherein the first, second and merged feature data sets each have a zeroeth cepstral coefficient and an equal number (N) of additional cepstral coefficients, and the merged feature data set is formed by selecting only a predetermined ratio of only lowest numbered additional cepstral coefficients from the first and second feature data sets; and

wherein the additional cepstral coefficients of the merged feature data set include, from lowest to highest, first all of the selected additional cepstral coefficients from the first feature data set and then all of the selected additional cepstral coefficients from the second feature data set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.

Citations

13 Claims

1. A method for merging at least a first and second audio data stream for use in a speech recognition application, the method comprising:
- transforming the first audio data stream from a time domain to a frequency domain;
  
  transforming the second audio data stream from the time domain function to the frequency domain;
  
  determining a first feature data set for the first transformed audio stream for a first range of frequencies;
  
  determining a second feature data set for the second transformed audio stream for a second range of frequencies; and
  
  combining predetermined feature data from the first and second feature data sets to form a merged feature data set;
  
  wherein the first, second and merged feature data sets each have a zeroeth cepstral coefficient and an equal number (N) of additional cepstral coefficients, and the merged feature data set is formed by selecting only a predetermined ratio of only lowest numbered additional cepstral coefficients from the first and second feature data sets; and
  
  wherein the additional cepstral coefficients of the merged feature data set include, from lowest to highest, first all of the selected additional cepstral coefficients from the first feature data set and then all of the selected additional cepstral coefficients from the second feature data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising capturing the first audio data stream from an ear microphone and capturing the second audio data stream from a mouth microphone.
  - 3. The method of claim 2, wherein in the first and second ranges of frequencies overlap.
  - 4. The method of claim 2, wherein the first and second audio data streams are assigned to respective stereo audio channels.
  - 5. The method of claim 2, wherein the first range of frequencies is from approximately 50 Hz to approximately 4.5 kHz.
  - 6. The method of claim 5, wherein the second range of frequencies is from approximately 4.5 kHz to approximately 8 kHz.
  - 7. The method of claim 1, wherein the first range of frequencies includes lower frequencies than the second range of frequencies, and the second range of frequencies includes higher frequencies than the first range of frequencies.
  - 8. The method of claim 1, wherein combining predetermined feature data includes selecting an equal number of the additional coefficients from the first and second cepstral coefficient sets for inclusion in the merged cepstral coefficient set.
  - 9. The method of claim 8, wherein the first N/2 coefficients are selected from the first and second cepstral coefficient sets.
  - 10. The method of claim 9, wherein N=12 and the first through sixth cepstral coefficients of the first cepstral coefficient set are used as the first through sixth cepstral coefficients of the merged cepstral coefficient set and the first through sixth cepstral coefficients of the second cepstral coefficient set are used as the seventh through twelfth cepstral coefficients of the merged cepstral coefficient set.
  - 11. The method of claim 8, wherein the zeroeth cepstral coefficient of the first cepstral coefficient set is selected as the zeroeth cepstral coefficient of the merged cepstral coefficient set.
  - 12. The method of claim 1, wherein the first and second ranges of frequencies do not completely overlap, and the cepstral coefficients of the first and second feature data sets are determined only for the respective range of frequencies.
  - 13. The method of claim 1, wherein the first N/2 additional cepstral coefficients are selected from the first and second feature data sets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Original Assignee
Adacel Systems, Inc. (Adacel Technologies Ltd.)
Inventors
Shu, Chang-Qing
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US12/860,245
Publication Number

US 20120046946A1
Time in Patent Office

1,369 Days
Field of Search

704/203, 704205-207, 704/243
US Class Current

704/243
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 2021/02165 Two microphones, one receiv...

System and method for merging audio data streams for use in speech recognition applications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for merging audio data streams for use in speech recognition applications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links