Correcting for impulse noise in speech recognition systems

US 10,811,011 B2
Filed: 11/21/2018
Issued: 10/20/2020
Est. Priority Date: 11/21/2018
Status: Active Grant

First Claim

Patent Images

1. A system for correcting for an impulse noise in speech recognition systems, the system comprising:

a microphone;

a speaker; and

an electronic processor, communicatively coupled to the microphone and the speaker, and configured toreceive, via the microphone, an audio signal representing an utterance;

detect, within the utterance, the impulse noise;

in response to detecting the impulse noise, generate an annotated utterance including a timing of the impulse noise;

segment the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and

when a length of the other content is greater than or equal to an average word length for the annotated utterance,determine, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance;

generate a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and

play the voice prompt via the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System and method for correcting for impulse noise in speech recognition systems. One example system includes a microphone, a speaker, and an electronic processor. The electronic processor is configured to receive an audio signal representing an utterance. The electronic processor is configured to detect, within the utterance, the impulse noise, and, in response, generate an annotated utterance including a timing of the impulse noise. The electronic processor is configured to segment the annotated utterance into silence, voice content, and other content, and, when a length of the other content is greater than or equal to an average word length for the annotated utterance, determine, based on the voice content, an intent portion and an entity portion. The electronic processor is configured to generate a voice prompt based on the timing of the impulse noise and the intent portion and/or the entity portion, and to play the voice prompt.

10 Citations

20 Claims

1. A system for correcting for an impulse noise in speech recognition systems, the system comprising:
- a microphone;
  
  a speaker; and
  
  an electronic processor, communicatively coupled to the microphone and the speaker, and configured toreceive, via the microphone, an audio signal representing an utterance;
  
  detect, within the utterance, the impulse noise;
  
  in response to detecting the impulse noise, generate an annotated utterance including a timing of the impulse noise;
  
  segment the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and
  
  when a length of the other content is greater than or equal to an average word length for the annotated utterance,determine, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance;
  
  generate a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and
  
  play the voice prompt via the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the electronic processor is further configured to:
    - generate the average word length for the annotated utterance by dividing a total length of the voice content by a number of words for the utterance.
  - 3. The system of claim 2, wherein the number of words for the utterance is received from a speech recognizer.
  - 4. The system of claim 1, wherein the electronic processor is further configured to:
    - determine whether a user intent is recognized based on the intent portion; and
      
      when the user intent is not recognized and the timing of the impulse noise corresponds to the timing of the intent portion,determine a potential intent based on the entity portion; and
      
      generate the voice prompt based on the potential intent.
  - 5. The system of claim 4, wherein the voice prompt includes a request to repeat a word of the utterance based on the entity portion.
  - 6. The system of claim 4, further comprising:
    - a memory communicatively coupled to the electronic processor,wherein the electronic processor is further configured to, when the user intent is not recognized and the timing of the impulse noise does not correspond to the timing of the intent portion, write the entity portion to the memory; and
      
      wherein the voice prompt includes a request to repeat the intent portion.
  - 7. The system of claim 1, wherein the electronic processor is further configured to:
    - determine whether a user intent can be recognized based on the intent portion; and
      
      when the user intent is recognized and the timing of the impulse noise corresponds to the timing of the entity portion, generate the voice prompt to include a request to repeat a word of the utterance based on the entity portion.
  - 8. The system of claim 7, wherein the request to repeat a word of the utterance includes a request to repeat the word based on at least one of the group consisting of a word immediately preceding the impulse noise, a word immediately following the impulse noise, and a word identified based on the timing of the impulse noise and the entity portion.
  - 9. The system of claim 1, wherein the electronic processor is further configured to:
    - determine whether a user intent can be recognized based on the intent portion; and
      
      when the user intent is recognized,detect a possible repetition portion of the entity portion based on the timing of the impulse noise; and
      
      generate an adjusted entity portion based on the entity portion and the possible repetition portion.
  - 10. The system of claim 9, wherein the electronic processor is configured to generate the adjusted entity portion by removing an entity from the entity portion based on the possible repetition portion.

11. A method for correcting for an impulse noise in speech recognition systems, the method comprising:
- receiving, via a microphone communicatively coupled to an electronic processor, an audio signal representing an utterance;
  
  detecting, with the electronic processor, the impulse noise within the utterance;
  
  in response to detecting the impulse noise, generating, with the electronic processor, an annotated utterance including a timing of the impulse noise;
  
  segmenting, with the electronic processor, the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and
  
  when a length of the other content is greater than or equal to an average word length for the annotated utterance,determining, with the electronic processor, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance;
  
  generating, with the electronic processor, a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and
  
  playing, with the electronic processor, the voice prompt via a speaker communicatively coupled to the electronic processor.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, further comprising:
    - generating the average word length for the annotated utterance by dividing a total length of the voice content by a number of words for the annotated utterance.
  - 13. The method of claim 12, further comprising:
    - receiving the number of words for the annotated utterance from a speech recognizer.
  - 14. The method of claim 11, further comprising:
    - determining whether a user intent is recognized based on the intent portion; and
      
      when the user intent is not recognized and the timing of the impulse noise corresponds to the timing of the intent portion,determining a potential intent based on the entity portion; and
      
      generating the voice prompt based on the potential intent.
  - 15. The method of claim 14, wherein generating the voice prompt includes generating a request to repeat a word of the utterance based on the entity portion.
  - 16. The method of claim 14, further comprising:
    - when the user intent is not recognized and the timing of the impulse noise does not correspond to the timing of the intent portion, writing the entity portion to a memory communicatively coupled to the electronic processor; and
      
      generating the voice prompt includes generating a request to repeat the intent portion.
  - 17. The method of claim 11, further comprising:
    - determining whether a user intent can be recognized based on the intent portion; and
      
      when the user intent is recognized, generating the voice prompt to include a request to repeat a word of the utterance based on the entity portion.
  - 18. The method of claim 17, wherein the request to repeat a word of the utterance includes a request to repeat the word based on at least one of the group consisting of a word immediately preceding the impulse noise, a word immediately following the impulse noise, and a word identified based on the timing of the impulse noise and the entity portion.
  - 19. The method of claim 11, wherein the electronic processor is further configured to:
    - determine whether a user intent can be recognized based on the intent portion; and
      
      when the user intent is recognized,detect a possible repetition portion of the entity portion based on the timing of the impulse noise; and
      
      generate an adjusted entity portion based on the entity portion and the possible repetition portion.
  - 20. The method of claim 19, wherein the electronic processor is configured to generate the adjusted entity portion by removing an entity from the entity portion based on the possible repetition portion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola Solutions, Inc.
Original Assignee
Motorola Solutions, Inc.
Inventors
Blanco, Alejandro G.
Primary Examiner(s)
Woo, Stella L.

Application Number

US16/197,981
Publication Number

US 20200160859A1
Time in Patent Office

699 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 25/84   for discriminating voice fr...

Correcting for impulse noise in speech recognition systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

10 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Correcting for impulse noise in speech recognition systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links