Automatic Speech Recognition System

US 20090018828A1
Filed: 11/12/2004
Published: 01/15/2009
Est. Priority Date: 11/12/2003
Status: Abandoned Application

First Claim

Patent Images

1. An automatic speech recognition system, which recognizes speeches in acoustic signals detected by a plurality of microphones as character information, the system comprising:

a sound source localization module which localizes a sound direction corresponding to a specified speaker based on the acoustic signals detected by the plurality of microphones;

a feature extractor which extracts features of speech signals contained in one or more pieces of information detected by the plurality of microphones;

an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals;

an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models in the acoustic model memory, the acoustic model composition module storing the acoustic model in the acoustic model memory; and

a speech recognition module which recognizes the features extracted by the feature extractor as character information using the acoustic model composed by the acoustic model composition module.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module.

479 Citations

12 Claims

1. An automatic speech recognition system, which recognizes speeches in acoustic signals detected by a plurality of microphones as character information, the system comprising:
- a sound source localization module which localizes a sound direction corresponding to a specified speaker based on the acoustic signals detected by the plurality of microphones;
  
  a feature extractor which extracts features of speech signals contained in one or more pieces of information detected by the plurality of microphones;
  
  an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals;
  
  an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models in the acoustic model memory, the acoustic model composition module storing the acoustic model in the acoustic model memory; and
  
  a speech recognition module which recognizes the features extracted by the feature extractor as character information using the acoustic model composed by the acoustic model composition module.
- View Dependent Claims (3, 4, 6, 7)
- - 3. A system according to claim 1, wherein the sound source localization module is configured to execute a process comprising:
    - performing a frequency analysis for the acoustic signals detected by the microphones to extract harmonic relationships;
      
      acquiring an intensity difference and a phase difference for the harmonic relationships extracted through the plurality of microphones;
      
      acquiring belief factors for a sound direction based on the intensity difference and the phase difference, respectively; and
      
      determining a most probable sound direction.
  - 4. A system according to claim 1, wherein the sound source localization module employs scattering theory that generates a model for an acoustic signal, which scatters on a surface of a member to which the microphones are attached, according to a sound direction so as to specify the sound direction for the speaker with the intensity difference and the phase difference detected from the plurality of microphones.
  - 6. A system according to claim 1, wherein the acoustic model composition module is configured to compose an acoustic model for the sound direction by applying weighted linear summation to the direction-dependent acoustic models in the acoustic model memory, and weights introduced into the linear summation are determined by training.
  - 7. A system according to claim 1, further comprising a speaker identification module,wherein the acoustic model memory possesses the direction-dependent acoustic models for respective speakers, andwherein the acoustic model composition module is configured to execute a process comprising:
    - referring to direction-dependent acoustic models of a speaker who is identified by the speaker identifying module and to a sound direction localized by the sound source localization module;
      
      composing an acoustic model for the sound direction based on the direction-dependent acoustic models in the acoustic model memory; and
      
      storing the acoustic model in the acoustic model memory.

2. An automatic speech recognition system, which recognizes speeches of a specified speaker in acoustic signals detected by a plurality of microphones as character information, the system comprising:
- a sound source localization module which localizes a sound direction corresponding to the specified speaker based on the acoustic signals detected by the plurality of microphones;
  
  a sound source separation module which separates speech signals of the specified speaker from the acoustic signals based on the sound direction localized by the sound source localization modulea feature extractor which extracts features of the speech signals separated by the sound source separation module;
  
  an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals;
  
  an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models in the acoustic model memory, the acoustic model composition module storing the acoustic model in the acoustic model memory; and
  
  a speech recognition module which recognizes the features extracted by the feature extractor as character information using the acoustic model composed by the acoustic model composition module.
- View Dependent Claims (5, 9, 10, 11, 12)
- - 5. A system according to claim 2, wherein the sound source separation module employs an active direction-pass filter so as to separate speeches, the filter being configured to execute a process comprising:
    - separating speeches by a narrower directional band when a sound direction, which is localized by the sound source localization module, lies close to a front, which is defined by an arrangement of the plurality of microphones; and
      
      separating speeches by a wider directional band when the sound direction lies apart from the front.
  - 9. A system according to claim 2, wherein the sound source localization module is configured to execute a process comprising:
    - performing a frequency analysis for the acoustic signals detected by the microphones to extract harmonic relationships;
      
      acquiring an intensity difference and a phase difference for the harmonic relationships extracted through the plurality of microphones;
      
      acquiring belief factors for a sound direction based on the intensity difference and the phase difference, respectively; and
      
      determining a most probable sound direction.
  - 10. A system according to claim 2, wherein the sound source localization module employs scattering theory that generates a model for an acoustic signal, which scatters on a surface of a member to which the microphones are attached, according to a sound direction so as to specify the sound direction for the speaker with the intensity difference and the phase difference detected from the plurality of microphones.
  - 11. A system according to claim 2, wherein the acoustic model composition module is configured to compose an acoustic model for the sound direction by applying weighted linear summation to the direction-dependent acoustic models in the acoustic model memory, and weights introduced into the linear summation are determined by training.
  - 12. A system according to claim 2, further comprising a speaker identification module,wherein the acoustic model memory possesses the direction-dependent acoustic models for respective speakers, andwherein the acoustic model composition module is configured to execute a process comprising:
    - referring to direction-dependent acoustic models of a speaker who is identified by the speaker identifying module and to a sound direction localized by the sound source localization module;
      
      composing an acoustic model for the sound direction based on the direction-dependent acoustic models in the acoustic model memory; and
      
      storing the acoustic model in the acoustic model memory.

8. An automatic speech recognition system, which recognizes speeches of a specified speaker in acoustic signals detected by a plurality of microphones as character information, the system comprising:
- a sound source localization module which localizes a sound direction corresponding to the specified speaker based on the acoustic signals detected by the plurality of microphones;
  
  a stream tracking module which stores the sound direction localized by the sound source localization module so as to estimate a direction in which the specified speaker is moving, the stream tracking module estimating a current position of the speaker according to the estimated direction;
  
  a sound source separation module which separates speech signals of the specified speaker from the acoustic signals based on a sound direction, which is determined by the current position of the speaker estimated by the stream tracking module;
  
  a feature extractor which extracts features of the speech signals separated by the sound source separation module;
  
  an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals;
  
  an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models in the acoustic model memory, the acoustic model composition module storing the acoustic model in the acoustic model memory; and
  
  a speech recognition module which recognizes the features extracted by the feature extractor as character information using the acoustic model composed by the acoustic model composition module.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Honda Motor Co., Ltd. (Honda Motor Company)
Original Assignee
Honda Motor Co., Ltd. (Honda Motor Company)
Inventors
Okuno, Hiroshi, Tsujino, Hiroshi, Nakadai, Kazuhiro

Application Number

US10/579,235
Publication Number

US 20090018828A1
Time in Patent Office

Days
Field of Search
US Class Current

704/234
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/028   using properties of sound s...

Automatic Speech Recognition System

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

479 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic Speech Recognition System

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

479 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links