TRULY HANDSFREE SPEECH RECOGNITION IN HIGH NOISE ENVIRONMENTS

US 20130054235A1
Filed: 08/24/2011
Published: 02/28/2013
Est. Priority Date: 08/24/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

configuring a recognizer to recognize one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics, said configuring desensitizing the recognizer to signals having said particular audio characteristics;

receiving, in the recognizer, a composite signal comprising a first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal;

recognizing the spoken utterance in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances;

executing a command corresponding to a particular one of the predetermined utterances having been recognized; and

performing an operation corresponding to the command on the first audio signal in response to the command.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.

Citations

23 Claims

1. A method comprising:
- configuring a recognizer to recognize one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics, said configuring desensitizing the recognizer to signals having said particular audio characteristics;
  
  receiving, in the recognizer, a composite signal comprising a first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal;
  
  recognizing the spoken utterance in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances;
  
  executing a command corresponding to a particular one of the predetermined utterances having been recognized; and
  
  performing an operation corresponding to the command on the first audio signal in response to the command.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19)
- - 2. The method of claim 1 wherein the recognizer comprises a phrase spotting algorithm, wherein the recognizer recognizes the predetermined utterances from an ongoing stream of background audio signals, and wherein configuring the recognizer comprises:
    - receiving, in the recognizer, training samples comprising the one or more predetermined utterances in the presence of said non-random information bearing background audio signal having said particular audio characteristics;
      
      optimizing phrase spotting parameters based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized phrase spotting parameters.
  - 3. The method of claim 2 wherein configuring the recognizer further comprises:
    - optimizing acoustic models based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized acoustic models.
  - 4. The method of claim 1 wherein the background audio signal is music and wherein the first audio signal is a song.
  - 5. The method of claim 1 wherein the first audio signal is a song and wherein the operation manipulates the song according to a spoken command of the user.
  - 6. The method of claim 1 wherein the background audio signal is synthesized speech and wherein the first audio signal is one or more words of the synthesized speech.
  - 7. The method of claim 1 wherein the operation saves the first audio signal for later access by the user.
  - 8. The method of claim 1 wherein the operation associates a preference of the user with the first audio signal.
  - 9. The method of claim 1 wherein the operation shares the first audio signal with other users.
  - 10. The method of claim 1 wherein the operation purchases the first audio signal for the user.
  - 11. The method of claim 1 wherein the operation identifies information about the first audio signal.
  - 12. The method of claim 1 wherein the operation interrupts the first audio signal and stops it from continuing.
  - 13. The method of claim 1 further comprising:
    - before said configuring, identifying the first audio signal; and
      
      selecting one of a plurality of recognition sets to recognize said one or more predetermined utterances based on the identified first audio signal,wherein when different audio signals are identified, different recognition sets are dynamically selected and used to configure the recognizer so that the recognizer is desensitized to the identified audio signals, and wherein the different audio signals are associated with different commands executed when an utterance is recognized.
  - 19. The apparatus of claim 1 wherein the operation shares the first audio signal with other users.

14. An apparatus comprising:
- a processor;
  
  a recognizer, the recognizer being configuring to recognize one or more predetermined utterances in the presence of a non-random information bearing background audio signal having particular audio characteristics to desensitize the recognizer to signals having said particular audio characteristics; and
  
  a microphone to receive a composite signal comprising a first audio signal and a spoken utterance of a user, wherein the first audio signal is generated by an electronic speaker, wherein the first audio signal comprises said particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal,wherein the spoken utterance is recognized in the presence of the first audio signal when the spoken utterance of the user is one of the predetermined utterances,wherein a command is executed by said processor corresponding to a particular one of the predetermined utterances having been recognized; and
  
  wherein an operation corresponding to the command is performed on the first audio signal in response to the command.
- View Dependent Claims (15, 16, 17, 18, 20, 21, 22, 23)
- - 15. The apparatus of claim 14 wherein the recognizer comprises a phrase spotting algorithm, wherein the recognizer recognizes the predetermined utterances from an ongoing stream of background audio signals, and wherein configuring the recognizer comprises:
    - receiving, in the recognizer, training samples comprising the one or more predetermined utterances in the presence of said non-random information bearing background audio signal having said particular audio characteristics;
      
      optimizing phrase spotting parameters based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized phrase spotting parameters.
  - 16. The apparatus of claim 15 wherein configuring the recognizer further comprises:
    - optimizing acoustic models based on recognition results of the training samples; and
      
      configuring the recognizer with said optimized acoustic models.
  - 17. The apparatus of claim 14 wherein the operation saves the first audio signal for later access by the user.
  - 18. The apparatus of claim 14 wherein the operation associates a preference of the user with the first audio signal.
  - 20. The apparatus of claim 14 wherein the operation purchases the first audio signal for the user.
  - 21. The apparatus of claim 14 wherein the apparatus comprises one of a mobile phone, a tablet computer, and an electronic reader.
  - 22. The apparatus of claim 14 wherein the recognizer is operable on said processor.
  - 23. The apparatus of claim 14 wherein the processor is a first electronic circuit and the recognizer is a second electronic circuit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sensory Incorporated
Original Assignee
Sensory Incorporated
Inventors
Mozer, Todd F., Vermeulen, Pieter J., Shaw, Jonathan, Rogers, Jeff

Granted Patent

US 8,645,132 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/063   Training

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

TRULY HANDSFREE SPEECH RECOGNITION IN HIGH NOISE ENVIRONMENTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

TRULY HANDSFREE SPEECH RECOGNITION IN HIGH NOISE ENVIRONMENTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links