Context-aware speech processing

US 9,502,029 B1
Filed: 06/25/2012
Issued: 11/22/2016
Est. Priority Date: 06/25/2012
Status: Active Grant

First Claim

Patent Images

1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by at least one processor, configure the at least one processor to perform operations comprising:

determining context data associated with conditions contemporaneous with speech uttered by a user and received by a user device, the determining comprising retrieving social graph data associated with the user, or accessing social graph data from the user device;

determining a correspondence of the context data to one or more previously defined speech contexts for processing speech;

when the correspondence is below a pre-determined threshold;

generating an additional speech context using the context data; and

designating the additional speech context as a current speech context; and

when the correspondence is at or above the pre-determined threshold, designating one of the previously defined speech contexts as the current speech context;

acquiring speech waveforms over a period of time or until a pre-determined amount of acquired speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech that is spoken in the conditions corresponding to the context data;

generating, using the acquired speech waveforms, an acoustic model for processing waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the waveforms are different from the acquired speech waveforms used to generate the acoustic model;

comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model;

when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context;

determining a language model associated with the current speech context; and

processing, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described herein are systems and methods for context-aware speech processing. A speech context is determined based on context data associated with a user uttering speech. The speech context and the speech uttered in that speech context may be used to build acoustic models for that speech context. An acoustic model for use in speech processing may be selected based on the determined speech context. A language model for use in speech processing may also be selected based on the determined speech context. Using the acoustic and language models, the speech may be processed to recognize the speech from the user.

60 Citations

View as Search Results

20 Claims

1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by at least one processor, configure the at least one processor to perform operations comprising:
- determining context data associated with conditions contemporaneous with speech uttered by a user and received by a user device, the determining comprising retrieving social graph data associated with the user, or accessing social graph data from the user device;
  
  determining a correspondence of the context data to one or more previously defined speech contexts for processing speech;
  
  when the correspondence is below a pre-determined threshold;
  
  generating an additional speech context using the context data; and
  
  designating the additional speech context as a current speech context; and
  
  when the correspondence is at or above the pre-determined threshold, designating one of the previously defined speech contexts as the current speech context;
  
  acquiring speech waveforms over a period of time or until a pre-determined amount of acquired speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech that is spoken in the conditions corresponding to the context data;
  
  generating, using the acquired speech waveforms, an acoustic model for processing waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the waveforms are different from the acquired speech waveforms used to generate the acoustic model;
  
  comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model;
  
  when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context;
  
  determining a language model associated with the current speech context; and
  
  processing, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-readable media of claim 1, the context data further comprising one or more of user data, application data, or environmental data.
  - 3. The computer-readable media of claim 1, the instructions further comprising:
    - processing the speech that is spoken in the conditions using the acoustic model to determine the one or more phonemes.
  - 4. The computer-readable media of claim 1, the instructions further comprising:
    - processing the speech that is spoken in the conditions with the acoustic model to determine one or more phonemes;
      
      determining a language model associated with the current speech context; and
      
      processing the one or more phonemes with the language model to generate text.

5. A method comprising:
- determining social graph data associated with speech received by a user device in at least one condition;
  
  determining that a correspondence of the social graph data to one or more previously defined speech contexts is below a pre-determined threshold, and designating one of the previously defined speech contexts as a current speech context;
  
  when the correspondence is below a pre-determined threshold;
  
  generating an additional speech context using the social graph data; and
  
  designating the additional speech context as a current speech context;
  
  acquiring speech waveforms at least one of over a period of time or until a pre-determined amount of speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech of one or more users that is spoken in the at least one condition associated with the social graph data; and
  
  generating, using the acquired speech waveforms, an acoustic model for processing the speech waveforms representing speech that is spoken in the at least one condition to determine one or more phonemes, wherein the speech waveforms are different from the acquired speech waveforms used to generate the acoustic model;
  
  comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model;
  
  when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context;
  
  determining a language model associated with the current speech context; and
  
  processing, with the language model, one or more phonemes from the speech that is spoken in the at least one condition to generate text.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 6. The method of claim 5, the determining the social graph data further comprising one or more of retrieving user data associated with the user, accessing application data from the user device, or accessing environmental data associated with the user device.
  - 7. The method of claim 6, the user data further comprising one or more of user identity, user demographics, or a user schedule.
  - 8. The method of claim 6, the application data comprising one or more of an application executing on the user device, an application state, or a use history of the application on the user device.
  - 9. The method of claim 6, the environmental data comprising one or more of user device location, sensor data, ambient sound, date/time, or proximity to other users.
  - 10. The method of claim 5, comprising:
    - generating an additional speech context using the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context; and
      
      designating the additional speech context as a current speech context.
  - 11. The method of claim 5, further comprising:
    - designating at least one of the one or more previously defined speech contexts as a current speech context using the correspondence.
  - 12. The method of claim 5, wherein generating the acoustic model further comprises:
    - generating the acoustic model using the received speech.
  - 13. The method of claim 5, further comprising:
    - processing the speech that is spoken in the one or more conditions with the acoustic model to determine the one or more phonemes.
  - 14. The method of claim 5, further comprising:
    - determining an additional speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context; and
      
      determining one or more language models associated with the additional speech context.
  - 15. The method of claim 5, further comprising:
    - determining an additional speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context;
      
      determining one or more language models associated with the additional speech context; and
      
      processing the speech that is spoken in the one or more conditions with the acoustic model and the one or more language models.

16. A system, comprising:
- at least one memory storing computer-executable instructions and speech uttered by a user; and
  
  at least one processor configured to access the at least one memory and execute the computer-executable instructions to;
  
  access the speech uttered by the user;
  
  determine social graph data associated with conditions present during utterance of the speech;
  
  determine that a correspondence of the social graph data to one or more previously defined speech contexts is below a pre-determined threshold;
  
  when the correspondence is below a pre-determined threshold;
  
  generating an additional speech context using the social graph data; and
  
  designating the additional speech context as a current speech context;
  
  acquire speech waveforms over a period of time and/or until a pre-determined amount of the speech waveforms has been acquired, wherein the speech waveforms correspond to speech of one or more users that is spoken in the conditions;
  
  generate, using the speech waveforms, an acoustic model for processing one or more waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the one or more waveforms are different from the speech waveforms used to generate the acoustic model;
  
  compare accuracy of the acoustic model with accuracy of a previously stored acoustic model;
  
  when the compared accuracy of the acoustic model reaches a pre-determined threshold, designate the acoustic model for use in a current speech context;
  
  determine a language model associated with the current speech context; and
  
  process, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, the instructions further executable by the at least one processor to:
    - determine a speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context;
      
      store the acoustic model; and
      
      associate the acoustic model with the additional speech context.
  - 18. The system of claim 16, the instructions further executable by the at least one processor to:
    - determine an additional speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context;
      
      determine the acoustic model associated with the additional speech context; and
      
      determine the one or more phonemes from the speech that is spoken in the conditions using the acoustic model.
  - 19. The system of claim 16, the instructions further executable by the at least one processor to:
    - determine an additional speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context;
      
      determine a language model associated with the additional speech context; and
      
      generate text from the speech that is spoken in the conditions using the language model.
  - 20. The system of claim 16, the instructions further executable by the at least one processor to:
    - determine an additional speech context associated with the social graph data, wherein the one or more previously defined speech contexts do not include the additional speech context;
      
      determine the one or more phonemes from the speech that is spoken in the conditions using the acoustic model;
      
      determine a language model associated with the additional speech context; and
      
      generate, from the one or more phonemes, text using the language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Bell, Matthew P., Watanabe, Yuzo, Polansky, Stephen M.
Primary Examiner(s)
Sirjani, Fariba

Application Number

US13/531,867
Time in Patent Office

1,611 Days
Field of Search

704/235
US Class Current

1/1
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

Context-aware speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

60 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Context-aware speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links