Controlling an apparatus based on speech

US 7,885,818 B2
Filed: 09/22/2003
Issued: 02/08/2011
Est. Priority Date: 10/23/2002
Status: Active Grant

First Claim

Patent Images

1. A system for controlling an apparatus on basis of speech, comprising:

a microphone array, comprising multiple microphones for receiving respective audio signals;

a beam forming module for extracting a speech signal of a user, from the audio signals as received by the microphones, by means of enhancing first components of the audio signals which represent an utterance originating from a first position of the user relative to the microphone array;

a speech recognition unit for creating an instruction for the apparatus based on recognized speech items of the speech signal; and

a keyword recognition system for recognition of a represented by a particular audio signal;

a speech control unit being arranged to control the beam forming module, on basis of the recognition of the predetermined keyword, in order to enhance second components of the audio signals which represent a subsequent utterance originating from a second position of the user relative to the microphone array;

wherein the recognition of the predetermined keyword at the second position calibrates the beam forming module to follow the user from the first position to the second position so that the subsequent utterance originating from the second position are accepted while utterances of other users at other positions are discarded, the second position including an orientation and a distance relative to the microphone array, and the speech control unit being configured to discriminate between sounds originating from users who are located in front of each other relative the microphone array;

wherein the subsequent utterance originating from the second position will be discarded if not preceded by the recognition of the predetermined keyword originating from the second position; and

wherein the keyword recognition system is arranged to recognize the predetermined keyword that is spoken by another user and the speech control unit being arranged to control the beam forming module, on basis of this recognition, in order to enhance third components of the audio signals which represent another utterance originating from a third orientation of the other user relative to the microphone array.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus with a speech control unit includes a microphone array having multiple microphones for receiving respective audio signals, and a beam forming module for extracting a speech signal of a user, from the audio signals. A keyword recognition system recognizes a predetermined keyword that is spoken by the user and which is represented by a particular audio signal and is arranged to control the beam forming module, on basis of tie recognition. A speech recognition unit creates an instruction for the apparatus based on recognized speech items of the speech signal. As a consequence, the speech control unit is more selective for those parts of the audio signals for speech recognition which correspond to speech items spoken by the user.

33 Citations

View as Search Results

19 Claims

1. A system for controlling an apparatus on basis of speech, comprising:
- a microphone array, comprising multiple microphones for receiving respective audio signals;
  
  a beam forming module for extracting a speech signal of a user, from the audio signals as received by the microphones, by means of enhancing first components of the audio signals which represent an utterance originating from a first position of the user relative to the microphone array;
  
  a speech recognition unit for creating an instruction for the apparatus based on recognized speech items of the speech signal; and
  
  a keyword recognition system for recognition of a represented by a particular audio signal;
  
  a speech control unit being arranged to control the beam forming module, on basis of the recognition of the predetermined keyword, in order to enhance second components of the audio signals which represent a subsequent utterance originating from a second position of the user relative to the microphone array;
  
  wherein the recognition of the predetermined keyword at the second position calibrates the beam forming module to follow the user from the first position to the second position so that the subsequent utterance originating from the second position are accepted while utterances of other users at other positions are discarded, the second position including an orientation and a distance relative to the microphone array, and the speech control unit being configured to discriminate between sounds originating from users who are located in front of each other relative the microphone array;
  
  wherein the subsequent utterance originating from the second position will be discarded if not preceded by the recognition of the predetermined keyword originating from the second position; and
  
  wherein the keyword recognition system is arranged to recognize the predetermined keyword that is spoken by another user and the speech control unit being arranged to control the beam forming module, on basis of this recognition, in order to enhance third components of the audio signals which represent another utterance originating from a third orientation of the other user relative to the microphone array.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system as claimed in claim 1, wherein a first one of the microphones of the microphone array is arranged to provide the particular audio signal to the keyword recognition system.
  - 3. The system as claimed in claim 1, wherein the beam forming module is arranged to determine a first position of the user relative to the microphone array.
  - 4. An apparatus comprising:
    - a system for controlling the apparatus on basis of speech as claimed in claim 1; and
      
      processing means for execution of the instruction being created by the speech control unit.
  - 5. The apparatus as claimed in claim 4, the apparatus arranged to show that the predetermined keyword has been recognized.
  - 6. The apparatus as claimed in claim 5, further comprising audio generating means for generating an audio signal in order to show that the predetermined keyword has been recognized.
  - 7. A consumer electronics system comprising the apparatus as claimed in claim 4.
  - 8. The system of claim 1, wherein the user is informed by indications that the speech control unit is not active, is in an active state and ready to receive the utterance, or is in a state of calibration.
  - 9. The system of claim 8, wherein the indications include an animal in a sleeping state indicating that the speech control unit is not active, and in an awake state indicating that the speech control unit is in the active state.
  - 10. The system of claim 9, wherein progress of the active state is indicated by an angle of ears of the animal.
  - 11. The system of claim 10, wherein the ears are fully raised at a beginning of the active state, and fully down at an end of the active state.
  - 12. The system of claim 9, wherein the animal has an understanding look when the utterance is recognized and a puzzled look when the utterance is not recognized.
  - 13. The system of claim 1, wherein the beam forming module is connected to the microphone array, and the keyword recognition system is connected to one microphone of the microphone array for detecting the predetermined keyword, the keyword recognition system being further connected to the beam forming module for providing the detected predetermined keyword to the beam forming module.

14. A method of controlling an apparatus on basis of speech, comprising the acts of:
- receiving respective audio signals by means of a microphone array, comprising multiple microphones;
  
  extracting a speech signal of a user, from the audio signals as received by the microphones, by means of enhancing first components of the audio signals which represent an utterance originating from a first position of the user relative to the microphone array;
  
  recognizing a predetermined keyword that is spoken by based on a particular audio signal and controlling the extraction of the speech signal of the user, on basis of the recognition of the predetermined keyword, in order to enhance second components of the audio signals which represent a subsequent utterance originating from a second position of the user relative to the microphone array while discarding utterances of other users at other positions, the second position including an orientation and a distance relative co the microphone array so that sounds originating from users who are located in front of each other relative the microphone array are discriminated;
  
  creating an instruction for the apparatus based on recognized speech items of the speech signal;
  
  discarding the subsequent utterance originating from the second position if not preceded by the recognition of the predetermined keyword originating from the second position; and
  
  recognizing the predetermined keyword that is spoken by another and extracting a speech signal of the user, on basis of this recognition, in order to enhance third components of the audio signals which represent another utterance originating from a third orientation of the other user relative to the microphone array.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The method of claim 14, further comprising the act of informing the user by indications that the apparatus is not active, is in an active state and ready to receive the utterance, or is in a state of calibration.
  - 16. The method of claim 15, wherein the indications include an animal in a sleeping state indicating that the speech control unit is not active, and in an awake state indicating that the speech control unit is in the active state.
  - 17. The method of claim 16, wherein progress of the active state is indicated by an angle of ears of the animal.
  - 18. The method of claim 17, wherein the ears are fully raised at a beginning of the active state, and fully down at an end of the active state.
  - 19. The method of claim 16, wherein the animal has an understanding look when the utterance is recognized and a puzzled look when the utterance is not recognized.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Vignoli, Fabio
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
SAINT CYR, LEONARD

Application Number

US10/532,469
Publication Number

US 20060074686A1
Time in Patent Office

2,696 Days
Field of Search

704/275
US Class Current

704/275
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 2021/02087   the noise being separate sp...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/0272   Voice signal separating

Controlling an apparatus based on speech

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Controlling an apparatus based on speech

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links