Targeted detection of regions in speech processing data streams

US 9,916,826 B1
Filed: 12/22/2015
Issued: 03/13/2018
Est. Priority Date: 12/04/2012
Status: Active Grant

First Claim

Patent Images

1. A method performed by a speech recognition processing component, the method comprising:

receiving first audio data;

determining, using the first audio data, speech processing results;

determining second data indicating that the speech processing results include a first incorrect portion;

determining third audio data as corresponding to the first incorrect portion, wherein the third audio data includes at least a portion of the first audio data;

after determining third audio data corresponding to the first incorrect position, generating an indicator associated with the third audio data; and

sending the third audio data, the indicator and the first incorrect portion to a speech recognition training component.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In speech processing systems, a special audio trigger indication is configured to efficiently isolate and mark incorrect speech processing results. The trigger indication may be configured to be easily recognizable by a speech processing device under various ASR and acoustic conditions. Once a speech processing device recognizes the trigger indication, incorrectly processed speech processing results are marked and may be isolated and prioritized for review by training and upgrading processes.

22 Citations

View as Search Results

21 Claims

1. A method performed by a speech recognition processing component, the method comprising:
- receiving first audio data;
  
  determining, using the first audio data, speech processing results;
  
  determining second data indicating that the speech processing results include a first incorrect portion;
  
  determining third audio data as corresponding to the first incorrect portion, wherein the third audio data includes at least a portion of the first audio data;
  
  after determining third audio data corresponding to the first incorrect position, generating an indicator associated with the third audio data; and
  
  sending the third audio data, the indicator and the first incorrect portion to a speech recognition training component.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the second data comprises audio data and the method further comprises:
    - determining, using the second data, second speech processing results;
      
      determining, using the second speech processing results, natural language processing results; and
      
      determining, using the natural language processing results that the speech processing results include the first incorrect portion.
  - 3. The method of claim 2, further comprising determining the second data comprises a trigger phrase.
  - 4. The method of claim 1, further comprising:
    - receiving the first audio data from a first device; and
      
      receiving the second data from the first device.
  - 5. The method of claim 1, further comprising:
    - receiving the first audio data from a first device; and
      
      receiving the second data from a second device.
  - 6. The method of claim 1, further comprising determining the second data using the speech processing results.
  - 7. The method of claim 1, further comprising:
    - determining a second indicator including an identifier corresponding to a user associated with the first audio data; and
      
      sending the second indicator to the speech recognition training component for further training corresponding to the user.

8. A computing system, comprising:
- at least one processor;
  
  memory including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the computing system to;
  
  receive first audio data;
  
  determine, using the first audio data, speech processing results;
  
  determine second data indicating that the speech processing results include a first incorrect portion;
  
  determine third audio data as corresponding to the first incorrect portion, wherein the third audio data includes at least a portion of the first audio data;
  
  after determining third audio data corresponding to the first incorrect position, generate an indicator associated with the third audio data; and
  
  send the third audio data, the indicator and the first incorrect portion to a speech recognition training component.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computing system of claim 8, wherein the second data comprises audio data and the computing system is further configured to:
    - determine, using the second data, second speech processing results;
      
      determine, using the second speech processing results, natural language processing results; and
      
      determine, using the natural language processing results that the speech processing results include the first incorrect portion.
  - 10. The computing system of claim 9, wherein the computing system is further configured to determine the second data comprises a trigger phrase.
  - 11. The computing system of claim 9, wherein the computing system is further configured to determine the second data using the speech processing results.
  - 12. The computing system of claim 9, wherein the computing system is further configured to:
    - determine a second indicator including an identifier corresponding to a user associated with the first audio data; and
      
      send the second indicator to the speech recognition training component for further training corresponding to the user.
  - 13. The computing system of claim 8, wherein the computing system is further configured to:
    - receive the first audio data from a first device; and
      
      receive the second data from the first device.
  - 14. The computing system of claim 8, wherein the computing system is further configured to:
    - receive the first audio data from a first device; and
      
      receive the second data from a second device.

15. A non-transitory computer-readable storage medium storing non-transitory processor-executable instructions for controlling a computing system, comprising:
- program code to receive first audio data;
  
  program code to determine, using the first audio data, speech processing results;
  
  program code to determine second data indicating that the speech processing results include a first incorrect portion;
  
  program code to determine third audio data as corresponding to the first incorrect portion, wherein the third audio data includes at least a portion of the first audio data;
  
  program code to, after determining third audio data corresponding to the first incorrect position, generate an indicator associated with the third audio data; and
  
  program code to send the third audio data, the indicator and the first incorrect portion to a speech recognition training component.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The non-transitory computer-readable storage medium of claim 15, wherein the second data comprises audio data and the non-transitory processor-executable instructions further comprise:
    - program code to determine, using the second data, second speech processing results;
      
      program code to determine, using the second speech processing results, natural language processing results; and
      
      program code to determine, using the natural language processing results that the speech processing results include the first incorrect portion.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein the non-transitory processor-executable instructions further comprise program code to determine the second data comprises a trigger phrase.
  - 18. The non-transitory computer-readable storage medium of claim 15, wherein the non-transitory processor-executable instructions further comprise:
    - program code to receive the first audio data from a first device; and
      
      program code to receive the second data from the first device.
  - 19. The non-transitory computer-readable storage medium of claim 15, wherein the non-transitory processor-executable instructions further comprise:
    - program code to receive the first audio data from a first device; and
      
      program code to receive the second data from a second device.
  - 20. The non-transitory computer-readable storage medium of claim 15, wherein the non-transitory processor-executable instructions further comprise program code to determine the second data using the speech processing results.
  - 21. The non-transitory computer-readable storage medium of claim 15, wherein the non-transitory processor-executable instructions further comprise:
    - program code to determine a second indicator including an identifier corresponding to a user associated with the first audio data; and
      
      program code to send the second indicator to the speech recognition training component for further training corresponding to the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Slifka, Janet Louise
Primary Examiner(s)
Godbold, Douglas

Application Number

US14/978,689
Time in Patent Office

812 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 2015/088   Word spotting

G10L 2015/221   Announcement of recognition...

Targeted detection of regions in speech processing data streams

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Targeted detection of regions in speech processing data streams

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links