Language Modeling Using Entities
First Claim
1. A computer-implemented method, comprising:
- obtaining a plurality of text samples;
for each of one or more text samples in the plurality of text samples;
determining that at least one term in the text sample corresponds to a first entity in a data structure of entities, wherein the data structure includes representations of a plurality of entities and defines relationships among particular ones of the plurality of entities;
determining classes to which the first entity within the data structure of entities belongs; and
annotating the text sample with one or more labels that indicate respective classes to which the first entity corresponding to the at least one term belongs;
generating a class-based training set of text samples by substituting the one or more terms in the one or more text samples with respective class identifiers for the one or more terms that correspond to the respective labels for the one or more terms;
training a class-based language model using the class-based training set of text samples;
training a plurality of class-specific language models; and
performing speech recognition on an utterance using the class-based language model and at least one class-specific language model from among the plurality of class-specific language models.
2 Assignments
0 Petitions
Accused Products
Abstract
Among other things, this document describes a computer-implemented method. The method can include obtaining a plurality of text samples. For each of one or more text samples in the plurality of text samples, the text sample can be annotated with one or more labels that indicate respective classes to which one or more terms in the text sample are assigned, wherein annotating the text sample comprises determining that at least one term in the text sample corresponds to a first entity in a data structure of interconnected entities and determining a classification of the first entity within the data structure of interconnected entities. The method can include generating a class-based training set of text samples. A class-based language model can be trained using the class-based training set of text samples. A plurality of class-specific language models can be trained.
30 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
obtaining a plurality of text samples; for each of one or more text samples in the plurality of text samples; determining that at least one term in the text sample corresponds to a first entity in a data structure of entities, wherein the data structure includes representations of a plurality of entities and defines relationships among particular ones of the plurality of entities; determining classes to which the first entity within the data structure of entities belongs; and annotating the text sample with one or more labels that indicate respective classes to which the first entity corresponding to the at least one term belongs; generating a class-based training set of text samples by substituting the one or more terms in the one or more text samples with respective class identifiers for the one or more terms that correspond to the respective labels for the one or more terms; training a class-based language model using the class-based training set of text samples; training a plurality of class-specific language models; and performing speech recognition on an utterance using the class-based language model and at least one class-specific language model from among the plurality of class-specific language models. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more computer-readable devices having instructions stored thereon that, when executed by one or more processors, cause performance of operations comprising:
-
obtaining a plurality of text samples; for each of one or more text samples in the plurality of text samples; determining that at least one term in the text sample corresponds to a first entity in a data structure of entities, wherein the data structure includes representations of a plurality of entities and defines relationships among particular ones of the plurality of entities; determining classes to which the first entity within the data structure of entities belongs; and annotating the text sample with one or more labels that indicate respective classes to which the first entity corresponding to the at least one term belongs; generating a class-based training set of text samples by substituting the one or more terms in the one or more text samples with respective class identifiers for the one or more terms that correspond to the respective labels for the one or more terms; training a class-based language model using the class-based training set of text samples; training a plurality of class-specific language models; and performing speech recognition on an utterance using the class-based language model and at least one class-specific language model from among the plurality of class-specific language models. - View Dependent Claims (16, 17, 18)
-
-
19. A system, comprising:
one or more computers configured to provide; a data structure that includes representations of a plurality of entities and that maps relationships among particular ones of the plurality of entities; an entity classifier that assigns particular entities from among the plurality of entities in the data structure to one or more respective classes; one or more corpora of text samples; a named-entity recognition engine that identifies particular terms in a first set of text samples that correspond to entities represented in the data structure; a training sample generator that generates a training set of text samples by replacing the particular terms in the first set of text samples with class identifiers that indicate respective classes for the particular terms that are determined based on the classes that the entity classifier has assigned to the entities represented in the data structure that correspond to the particular terms; and a training engine that generates one or more language models using the training set of text samples. - View Dependent Claims (20)
Specification