Truly handsfree speech recognition in high noise environments

US 8,645,132 B2
Filed: 08/24/2011
Issued: 02/04/2014
Est. Priority Date: 08/24/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a first audio signal;

identifying the first audio signal;

selecting one of a plurality of recognition sets to recognize one or more predetermined utterances based on the identified first audio signal;

configuring a recognizer to recognize the one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics, said configuring desensitizing the recognizer to signals having said particular audio characteristics;

receiving, in the recognizer, a composite signal comprising the first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal;

recognizing the spoken utterance in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances;

executing a command corresponding to a particular one of the predetermined utterances having been recognized; and

performing an operation corresponding to the command on the first audio signal in response to the command,wherein when different audio signals are identified, different recognition sets are dynamically selected and used to configure the recognizer so that the recognizer is desensitized to the identified audio signals, and wherein the different audio signals are associated with different commands executed when an utterance is recognized.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.

23 Citations

View as Search Results

22 Claims

1. A method comprising:
- receiving a first audio signal;
  
  identifying the first audio signal;
  
  selecting one of a plurality of recognition sets to recognize one or more predetermined utterances based on the identified first audio signal;
  
  configuring a recognizer to recognize the one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics, said configuring desensitizing the recognizer to signals having said particular audio characteristics;
  
  receiving, in the recognizer, a composite signal comprising the first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal;
  
  recognizing the spoken utterance in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances;
  
  executing a command corresponding to a particular one of the predetermined utterances having been recognized; and
  
  performing an operation corresponding to the command on the first audio signal in response to the command,wherein when different audio signals are identified, different recognition sets are dynamically selected and used to configure the recognizer so that the recognizer is desensitized to the identified audio signals, and wherein the different audio signals are associated with different commands executed when an utterance is recognized.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the recognizer comprises a phrase spotting algorithm, wherein the recognizer recognizes the predetermined utterances from an ongoing stream of background audio signals, and wherein configuring the recognizer comprises:
    - receiving, in the recognizer, training samples comprising the one or more predetermined utterances in the presence of said non-random information bearing background audio signal having said particular audio characteristics;
      
      optimizing phrase spotting parameters based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized phrase spotting parameters.
  - 3. The method of claim 2 wherein configuring the recognizer further comprises:
    - optimizing acoustic models based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized acoustic models.
  - 4. The method of claim 1 wherein the background audio signal is music and wherein the first audio signal is a song.
  - 5. The method of claim 1 wherein the first audio signal is a song and wherein the operation manipulates the song according to a spoken command of the user.
  - 6. The method of claim 1 wherein the background audio signal is synthesized speech and wherein the first audio signal is one or more words of the synthesized speech.
  - 7. The method of claim 1 wherein the operation saves the first audio signal for later access by the user.
  - 8. The method of claim 1 wherein the operation associates a preference of the user with the first audio signal.
  - 9. The method of claim 1 wherein the operation shares the first audio signal with other users.
  - 10. The method of claim 1 wherein the operation purchases the first audio signal for the user.
  - 11. The method of claim 1 wherein the operation identifies information about the first audio signal.
  - 12. The method of claim 1 wherein the operation interrupts the first audio signal and stops it from continuing.

13. An apparatus comprising:
- a processor;
  
  a recognizer, the recognizer configured to recognize one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics to desensitize the recognizer to signals having said particular audio characteristics;
  
  and a microphone to receive a composite signal comprising a first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal, wherein the spoken utterance is recognized in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances, wherein a command is executed by said processor corresponding to a particular one of the predetermined utterances having been recognized; and
  
  wherein an operation corresponding to the command is performed on the first audio signal in response to the command, wherein, before configuring the recognizer, the first audio signal is identified, and wherein one of a plurality of recognition sets is selected to recognize said one or more predetermined utterances based on the identified first audio signal, and wherein when different audio signals are identified, different recognition sets are dynamically selected and used to configure the recognizer so that the recognizer is desensitized to the identified audio signals, and wherein the different audio signals are associated with different commands executed when an utterance is recognized.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The apparatus of claim 13 wherein the recognizer comprises a phrase spotting algorithm, wherein the recognizer recognizes the predetermined utterances from an ongoing stream of background audio signals, and wherein configuring the recognizer comprises:
    - receiving, in the recognizer, training samples comprising the one or more predetermined utterances in the presence of said non-random information bearing background audio signal having said particular audio characteristics;
      
      optimizing phrase spotting parameters based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized phrase spotting parameters.
  - 15. The apparatus of claim 14 wherein configuring the recognizer further comprises:
    - optimizing acoustic models based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized acoustic models.
  - 16. The apparatus of claim 13 wherein the operation saves the first audio signal for later access by the user.
  - 17. The apparatus of claim 13 wherein the operation associates a preference of the user with the first audio signal.
  - 18. The apparatus of claim 13 wherein the operation shares the first audio signal with other users.
  - 19. The apparatus of claim 13 wherein the operation purchases the first audio signal for the user.
  - 20. The apparatus of claim 13 wherein the apparatus comprises one of a mobile phone, a tablet computer, and an electronic reader.
  - 21. The apparatus of claim 13 wherein the recognizer is operable on said processor.
  - 22. The apparatus of claim 13 wherein the processor is a first electronic circuit and the recognizer is a second electronic circuit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Mozer, Todd F., Rogers, Jeff, Vermeulen, Pieter J., Shaw, Jonathan
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Shan, Jie

Application Number

US13/217,146
Publication Number

US 20130054235A1
Time in Patent Office

895 Days
Field of Search

704/233
US Class Current

704/233
CPC Class Codes

G10L 15/063   Training

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

Truly handsfree speech recognition in high noise environments

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Truly handsfree speech recognition in high noise environments

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links