Targeted detection of regions in speech processing data streams

US 9,224,387 B1
Filed: 12/04/2012
Issued: 12/29/2015
Est. Priority Date: 12/04/2012
Status: Active Grant

First Claim

Patent Images

1. A method of identifying speech processing results for further processing performed by a speech recognition processing component, the method comprising:

obtaining a model for a trigger word, wherein the trigger word identifies audio data for further processing;

receiving first audio data comprising first speech from a user;

performing speech processing on the first audio data to obtain speech processing results;

returning the speech processing results to the user;

receiving second audio data comprising second speech from the user;

determining that the second audio data comprises the trigger word using the trigger word model;

after the determining, creating an indicator identifying at least one of the first audio data and the speech processing results for further processing by a speech recognition training component; and

sending the indicator, the first audio data and the speech processing results to the speech recognition training component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In speech processing systems, a special audio trigger indication is configured to efficiently isolate and mark incorrect speech processing results. The trigger indication may be configured to be easily recognizable by a speech processing device under various ASR and acoustic conditions. Once a speech processing device recognizes the trigger indication, incorrectly processed speech processing results are marked and may be isolated and prioritized for review by training and upgrading processes.

31 Citations

View as Search Results

23 Claims

1. A method of identifying speech processing results for further processing performed by a speech recognition processing component, the method comprising:
- obtaining a model for a trigger word, wherein the trigger word identifies audio data for further processing;
  
  receiving first audio data comprising first speech from a user;
  
  performing speech processing on the first audio data to obtain speech processing results;
  
  returning the speech processing results to the user;
  
  receiving second audio data comprising second speech from the user;
  
  determining that the second audio data comprises the trigger word using the trigger word model;
  
  after the determining, creating an indicator identifying at least one of the first audio data and the speech processing results for further processing by a speech recognition training component; and
  
  sending the indicator, the first audio data and the speech processing results to the speech recognition training component.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein performing speech processing comprises performing natural language processing and wherein speech processing results comprise performing an action.
  - 3. The method of claim 1, further comprising;
    - identifying the first audio data or the speech processing results using the indicator; and
      
      adjusting an acoustic model, language model, or a natural language understanding capability based at least in part on the first audio data or the speech processing results.
  - 4. The method of claim 1, wherein the model comprises a hidden Markov model and a Gaussian mixture model.

5. A method performed by a speech recognition processing component, the method comprising:
- transmitting speech processing results, wherein the speech processing results were determined from first audio data;
  
  receiving second audio data indicating that the speech processing results are incorrect;
  
  identifying third audio data as corresponding to incorrect speech processing results, wherein the third audio data includes at least a portion of the first audio data;
  
  after the identifying, creating an indicator identifying the third audio data for further processing by a speech recognition training component; and
  
  sending the third audio data, the indicator and the speech processing results to the speech recognition training component.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The method of claim 5, further comprising;
    - identifying the third audio data using the indicator; and
      
      adjusting an acoustic model, language model, or natural language understanding capability based at least in part on the third audio data.
  - 7. The method of claim 5, wherein the second audio data comprises a trigger phrase.
  - 8. The method of claim 7, further comprising configuring the trigger phrase based at least in part on user input.
  - 9. The method of claim 7, further comprising configuring the trigger phrase based at least in part on an ability of a speech recognition device to recognize the trigger phrase.
  - 10. The method of claim 5, further comprising processing the second audio data with a natural language processing module to determine that the second audio data comprises speech indicating that the speech processing results are incorrect.
  - 11. The method of claim 10, wherein the speech comprises at least one of the following phrases:
    - “
      
      That'"'"'s not right,”
      
      “
      
      wrongo,”
      
      or “
      
      Incorrectomundo”
      
      .

12. A computing device configured to perform speech recognition processing, the computing device comprising:
- a processor;
  
  a memory device including instructions operable to be executed by the processor to perform a set of actions, configuring the processor;
  
  to transmit speech processing results, wherein the speech processing results were determined based at least in part on first audio data;
  
  to receive second audio indicating that the speech recognition results are incorrect;
  
  to identify third audio data as corresponding to incorrect speech recognition results;
  
  to, after identifying the third audio data, create an indicator identifying the third audio data for further processing by a speech recognition training component; and
  
  to send the third audio data, the indicator and the speech processing results to the speech recognition training component.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The computing device of claim 12, wherein the second audio data comprises a trigger phrase.
  - 14. The computing device of claim 13, in which the processor is further configured to configure the trigger phrase based at least in part on user input.
  - 15. The computing device of claim 13, in which the processor is further configured to configure the trigger phrase based at least in part on an ability of a speech recognition device to recognize the trigger phrase.
  - 16. The computing device of claim 12, in which the processor is further configured to process the second audio data with a natural language processing module to determine that the second audio data comprises speech indicating that the speech processing results are incorrect.
  - 17. The computing device of claim 16, wherein the speech comprises at least one of the following phrases:
    - “
      
      That'"'"'s not right,”
      
      “
      
      wrongo,”
      
      or “
      
      Incorrectomundo”
      
      .

18. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device configured to perform speech recognition processing, the storage medium comprising:
- program code to transmit speech processing results to a user, wherein the speech processing results were determined based at least in part on first audio data;
  
  program code to receive second audio indicating that the speech recognition results are incorrect;
  
  program code to identify third audio data as corresponding to incorrect speech recognition results;
  
  program code to, after identifying the third audio data, create an indicator identifying the third audio data for further processing by a speech recognition training component; and
  
  program code to send the third audio data, the indicator and the speech processing results to the speech recognition training component.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The non-transitory computer-readable storage medium of claim 18, wherein the second audio data comprises a trigger phrase.
  - 20. The non-transitory computer-readable storage medium of claim 19, further comprising program code to configure the trigger phrase based at least in part on user input.
  - 21. The non-transitory computer-readable storage medium of claim 19, further comprising program code to configure the trigger phrase based at least in part on an ability of a speech recognition device to recognize the trigger phrase.
  - 22. The non-transitory computer-readable storage medium of claim 18, further comprising program code to process the second audio data with a natural language processing module to determine that the second audio data comprises speech indicating that the speech processing results are incorrect.
  - 23. The non-transitory computer-readable storage medium of claim 22, wherein the speech comprises at least one of the following phrases:
    - “
      
      That'"'"'s not right,”
      
      “
      
      wrongo,”
      
      or “
      
      Incorrectomundo”
      
      .

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Slifka, Janet Louise
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/693,645
Time in Patent Office

1,120 Days
Field of Search

704/231, 704/235, 704/244
US Class Current

1/1
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 2015/088   Word spotting

G10L 2015/221   Announcement of recognition...

Targeted detection of regions in speech processing data streams

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Targeted detection of regions in speech processing data streams

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links