Speech to Text Conversion
First Claim
1. A computer-implemented speech-to-text conversion method, comprising:
- receiving a voice input from a user of an electronic device and contextual metadata that describes a context of the electronic device at a time when the voice input is received;
identifying a plurality of base language models, wherein each base language model corresponds to a distinct textual corpus of content;
using the contextual metadata to generate an interpolated language model based on contributions from the plurality of base language models, wherein the contributions are weighting according to a weighting for each of the base language models; and
using the interpolated language model to convert the received voice input to a textual output.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.
-
Citations
29 Claims
-
1. A computer-implemented speech-to-text conversion method, comprising:
-
receiving a voice input from a user of an electronic device and contextual metadata that describes a context of the electronic device at a time when the voice input is received; identifying a plurality of base language models, wherein each base language model corresponds to a distinct textual corpus of content; using the contextual metadata to generate an interpolated language model based on contributions from the plurality of base language models, wherein the contributions are weighting according to a weighting for each of the base language models; and using the interpolated language model to convert the received voice input to a textual output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented system for converting speech to text, the system comprising:
-
a plurality of base language models, each base language model corresponding to a particular semantic category; an interpolated language model that is linked to the plurality of base language models; and wherein each link between the interpolated language model and each of the base language models is associated with a weight. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable storage device encoded with a computer program product, the computer program product including instructions for speech-to-text conversion that, when executed, cause data processing apparatus to perform operations comprising:
-
receiving a voice input from a user of an electronic device and contextual metadata that describes a context of the electronic device at a time when the voice input is received; identifying a plurality of base language models, wherein each base language model corresponds to a distinct textual corpus of content; using the contextual metadata to generate an interpolated language model based on contributions from the plurality of base language models, wherein the contributions are weighting according to a weighting for each of the base language models; and using the interpolated language model to convert the received voice input to a textual output. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer-implemented method, comprising;
-
extracting pairs from a historical log of query search results that includes a plurality of search queries and corresponding search results, each pair including a query and a website that corresponds to a search result for the query; generating a bipartite cluster graph based on the extracted pairs of queries and corresponding websites; training a plurality of language models based on clusters identified in the bipartite cluster graph; based on sample data obtained from input by one or more users into a web from, the sample data comprising one or more sample queries, identifying K clusters from the cluster graph that are most significant to the sample queries, K being an integer; and generating an interpolated language model for the web form based on weighted contributions from the language models trained for each of the identified K clusters. - View Dependent Claims (28, 29)
-
Specification