Context-aware speech processing
First Claim
1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by at least one processor, configure the at least one processor to perform operations comprising:
- determining context data associated with conditions contemporaneous with speech uttered by a user and received by a user device, the determining comprising retrieving social graph data associated with the user, or accessing social graph data from the user device;
determining a correspondence of the context data to one or more previously defined speech contexts for processing speech;
when the correspondence is below a pre-determined threshold;
generating an additional speech context using the context data; and
designating the additional speech context as a current speech context; and
when the correspondence is at or above the pre-determined threshold, designating one of the previously defined speech contexts as the current speech context;
acquiring speech waveforms over a period of time or until a pre-determined amount of acquired speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech that is spoken in the conditions corresponding to the context data;
generating, using the acquired speech waveforms, an acoustic model for processing waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the waveforms are different from the acquired speech waveforms used to generate the acoustic model;
comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model;
when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context;
determining a language model associated with the current speech context; and
processing, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text.
1 Assignment
0 Petitions
Accused Products
Abstract
Described herein are systems and methods for context-aware speech processing. A speech context is determined based on context data associated with a user uttering speech. The speech context and the speech uttered in that speech context may be used to build acoustic models for that speech context. An acoustic model for use in speech processing may be selected based on the determined speech context. A language model for use in speech processing may also be selected based on the determined speech context. Using the acoustic and language models, the speech may be processed to recognize the speech from the user.
60 Citations
20 Claims
-
1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by at least one processor, configure the at least one processor to perform operations comprising:
-
determining context data associated with conditions contemporaneous with speech uttered by a user and received by a user device, the determining comprising retrieving social graph data associated with the user, or accessing social graph data from the user device; determining a correspondence of the context data to one or more previously defined speech contexts for processing speech; when the correspondence is below a pre-determined threshold; generating an additional speech context using the context data; and designating the additional speech context as a current speech context; and when the correspondence is at or above the pre-determined threshold, designating one of the previously defined speech contexts as the current speech context; acquiring speech waveforms over a period of time or until a pre-determined amount of acquired speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech that is spoken in the conditions corresponding to the context data; generating, using the acquired speech waveforms, an acoustic model for processing waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the waveforms are different from the acquired speech waveforms used to generate the acoustic model; comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model; when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context; determining a language model associated with the current speech context; and processing, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
determining social graph data associated with speech received by a user device in at least one condition; determining that a correspondence of the social graph data to one or more previously defined speech contexts is below a pre-determined threshold, and designating one of the previously defined speech contexts as a current speech context; when the correspondence is below a pre-determined threshold; generating an additional speech context using the social graph data; and designating the additional speech context as a current speech context; acquiring speech waveforms at least one of over a period of time or until a pre-determined amount of speech waveforms has been acquired, wherein the acquired speech waveforms correspond to speech of one or more users that is spoken in the at least one condition associated with the social graph data; and generating, using the acquired speech waveforms, an acoustic model for processing the speech waveforms representing speech that is spoken in the at least one condition to determine one or more phonemes, wherein the speech waveforms are different from the acquired speech waveforms used to generate the acoustic model; comparing accuracy of the acoustic model with accuracy of a previously stored acoustic model; when the compared accuracy of the acoustic model reaches a pre-determined threshold, designating the acoustic model for use in the current speech context; determining a language model associated with the current speech context; and processing, with the language model, one or more phonemes from the speech that is spoken in the at least one condition to generate text. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
at least one memory storing computer-executable instructions and speech uttered by a user; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to; access the speech uttered by the user; determine social graph data associated with conditions present during utterance of the speech; determine that a correspondence of the social graph data to one or more previously defined speech contexts is below a pre-determined threshold; when the correspondence is below a pre-determined threshold; generating an additional speech context using the social graph data; and designating the additional speech context as a current speech context; acquire speech waveforms over a period of time and/or until a pre-determined amount of the speech waveforms has been acquired, wherein the speech waveforms correspond to speech of one or more users that is spoken in the conditions; generate, using the speech waveforms, an acoustic model for processing one or more waveforms representing speech that is spoken in the conditions to determine one or more phonemes, wherein the one or more waveforms are different from the speech waveforms used to generate the acoustic model; compare accuracy of the acoustic model with accuracy of a previously stored acoustic model; when the compared accuracy of the acoustic model reaches a pre-determined threshold, designate the acoustic model for use in a current speech context; determine a language model associated with the current speech context; and process, with the language model, one or more phonemes from the speech that is spoken in the conditions to generate text. - View Dependent Claims (17, 18, 19, 20)
-
Specification