Microphone array based speech recognition system and target speech extracting method of the system

US 8,249,867 B2
Filed: 09/30/2008
Issued: 08/21/2012
Est. Priority Date: 12/11/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A microphone-array-based speech recognition system comprising:

a signal separator configured to separate mixed signals input through a plurality of microphone into sound-source signals by an ICA algorithm;

a target speech extractor configured to extract one target speech spoken for speech recognition from the sound-source signals separated by the signal separator; and

a speech recognition unit configured to recognize a desired speech from the extracted target speech,wherein the target speech extractor is configured to extract feature vector sequences from the separated sound-source signals, calculate logarithm likelihood ratios (LLRs) of the extracted feature vector sequences, calculate a maximum value by using the calculated LLRs, compare the maximum value with a predetermined threshold value, and determine the maximum value to be the target speech if the maximum value is larger than the threshold value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A microphone-array-based speech recognition system using a blind source separation (BBS) and a target speech extraction method in the system are provided. The speech recognition system performs an independent component analysis (ICA) to separate mixed signals input through a plurality of microphone into sound-source signals, extracts one target speech spoken for speech recognition from the separated sound-source signals by using a Gaussian mixture model (GMM) or a hidden Markov Model (HMM), and automatically recognizes a desired speech from the extracted target speech. Accordingly, it is possible to obtain a high speech recognition rate even in a noise environment.

26 Citations

View as Search Results

16 Claims

1. A microphone-array-based speech recognition system comprising:
- a signal separator configured to separate mixed signals input through a plurality of microphone into sound-source signals by an ICA algorithm;
  
  a target speech extractor configured to extract one target speech spoken for speech recognition from the sound-source signals separated by the signal separator; and
  
  a speech recognition unit configured to recognize a desired speech from the extracted target speech,wherein the target speech extractor is configured to extract feature vector sequences from the separated sound-source signals, calculate logarithm likelihood ratios (LLRs) of the extracted feature vector sequences, calculate a maximum value by using the calculated LLRs, compare the maximum value with a predetermined threshold value, and determine the maximum value to be the target speech if the maximum value is larger than the threshold value.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The microphone-array-based speech recognition system of claim 1, further comprising an additional information provider configured to transmit additional information used for extraction of the target speech to the target speech extractor.
  - 3. The microphone-array-based speech recognition system of claim 2, wherein, in a case where the additional information is provided, the target speech extractor is configured to perform a hypothesis test for the separated sound-source signals by using a Gaussian mixture model (GMM), and to determine a sound-source signal having the highest reliability as the target speech.
  - 4. The microphone-array-based speech recognition system of claim 3, wherein the additional information is gender information, speech-music information, speech-noise information, or speaker'"'"'s identification information.
  - 5. The microphone-array-based speech recognition system of claim 1, wherein the target speech extractor is configured to determine that the target speech does not exist in the separated sound-source signals if the maximum value is smaller than the threshold value.
  - 6. The microphone-array-based speech recognition system of claim 1, wherein, in a case where the additional information for the target speech is not provided, the target speech extractor is configured to calculate an LLR-based reliability by using a hidden Markov model (HMM) as a speech-recognition acoustic model.

7. A target speech extraction method for a microphone-array-based speech recognition system, comprising:
- separating mixed signals input through a plurality of microphone into sound-source signals by an ICA;
  
  extracting one target speech spoken for speech recognition from the separated sound-source signals; and
  
  recognizing a desired speech from the extracted target speech,wherein the extracting of the target speech comprises;
  
  extracting feature vector sequence Xⁱfrom the separated sound-source signals;
  
  calculating an ith LLR (logarithm likelihood ratio) LLR_iof the extracted feature vector sequence;
  
  calculating a maximum value using the LLR_i;
  
  comparing the maximum value with a predetermined threshold value; and
  
  determining the maximum value to be the target speech when the maximum value is larger than the threshold value.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 8. The target speech extraction method of claim 7, further comprising determining that the target speech does not exist in the separated sound-source signals when the maximum value is smaller than the threshold value.
  - 9. The target speech extraction method of claim 7, wherein, when additional information representing that the target speech is a female speech is provided, the LLR_iis calculated as expressed by
  - 10. The target speech extraction method of claim 7, wherein, when additional information representing that the target speech is a male speech is provided, the LLR_iis calculated as expressed by
  - 11. The target speech extraction method of claim 7, wherein, when additional information representing speech-music information is provided, the LLR_iis calculated as expressed by
  - 12. The target speech extraction method of claim 7, wherein, when additional information representing speech-noise information is provided, the LLR_iis calculated as expressed by
  - 13. The target speech extraction method of claim 7, wherein, when additional information representing that the target speech is a speech of a specific speaker is provided, calculating the LLR_iis calculated as expressed by
  - 14. The target speech extraction method of claim 7, wherein, when additional information representing that the target speech is a speech of a specific property A (Property_A) is provided, the LLR_iis calculated as expressed by
  - 15. The target speech extraction method of claim 7, wherein the extracting of the target speech comprises:
    - in a case where the additional information for the target speech is not provided, performing primary speech recognition for the separated sound-source signals by using an HMM (hidden Markov model) as a speech-recognition acoustic model;
      
      calculating a closest HMM and a state column thereof for a sequence of the words obtained through the speech recognition;
      
      calculating LLR_iby using the HMMs;
      
      calculating a maximum value by using the calculated LLRi;
      
      comparing the maximum value with a predetermined threshold value; and
      
      determining the maximum value to be the target speech if the maximum value is larger than the threshold value.
  - 16. The target speech extraction method of claim 15, wherein the LLR_iis calculated by using the following Equation:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Electronics and Telecommunications Research Institute
Original Assignee
Electronics and Telecommunications Research Institute
Inventors
Lee, Yun Keun, Kang, Jeom Ja, Kang, Byung Ok, Kim, Kap Kee, Lee, Sung Joo, Jung, Ho Young, Chung, Hoon, Park, Jeon Gue, Jeon, Hyung Bae, Cho, Hoon Young
Primary Examiner(s)
Chawan, Vijay B

Application Number

US12/242,819
Publication Number

US 20090150146A1
Time in Patent Office

1,421 Days
Field of Search

704/233, 704226-228, 704/208, 704/205, 704/240, 704/256, 704/222, 704/220, 704/275, 704/231, 381/92, 381/94.1, 381/94.2, 381/94.3, 381/66, 381/60, 381/122, 381/314, 381/88.01
US Class Current

704/233
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0272   Voice signal separating

Microphone array based speech recognition system and target speech extracting method of the system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

26 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Microphone array based speech recognition system and target speech extracting method of the system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links