Generating context-based spell corrections of entity names
First Claim
1. A system comprising:
- one or more computers including one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising;
receiving a particular sequence of terms that occurs in a document, wherein the document includes multiple sequences of terms outside of the particular sequence;
determining that the particular sequence of terms includes one or more first terms that refer to a particular entity;
selecting one or more second terms from the particular sequence of terms that includes the one or more first terms that refer to the particular entity;
generating, for each of the selected one or more second terms that are from the particular sequence of terms that includes the one or more first terms that refer to the particular entity, a name-context pair that includes (i) the one or more first terms that refer to the particular entity, and (ii) the selected one or more second terms;
determining, for each distinct name-context pair, a context consistency measure that is an estimate of a probability that the selected one or more second terms of the respective name-context pair will appear in another sequence of terms based on the occurrence in the other sequence of terms of the one or more first terms that refer to the particular entity of the respective name-context pair; and
storing context-entity name data that associates one or more of the distinct name-context pairs with the corresponding context consistency measure in a memory.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for correcting entity names. One method includes receiving texts and deriving a plurality of name-context pairs from the texts. The method further includes calculating a context consistency measure for each name-context pair and storing context-entity name data representing the name-context pairs. Another method includes identifying an entity name and one or more context terms from a query and generating candidate names for the entity name. The method further includes determining a score for each of the candidate names, selecting a number of top scoring candidate names, and using the selected candidate names to respond to the query.
-
Citations
20 Claims
-
1. A system comprising:
one or more computers including one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising; receiving a particular sequence of terms that occurs in a document, wherein the document includes multiple sequences of terms outside of the particular sequence; determining that the particular sequence of terms includes one or more first terms that refer to a particular entity; selecting one or more second terms from the particular sequence of terms that includes the one or more first terms that refer to the particular entity; generating, for each of the selected one or more second terms that are from the particular sequence of terms that includes the one or more first terms that refer to the particular entity, a name-context pair that includes (i) the one or more first terms that refer to the particular entity, and (ii) the selected one or more second terms; determining, for each distinct name-context pair, a context consistency measure that is an estimate of a probability that the selected one or more second terms of the respective name-context pair will appear in another sequence of terms based on the occurrence in the other sequence of terms of the one or more first terms that refer to the particular entity of the respective name-context pair; and storing context-entity name data that associates one or more of the distinct name-context pairs with the corresponding context consistency measure in a memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 19, 20)
-
8. A computer-implemented method comprising:
-
receiving a particular sequence of terms that occurs in a document, wherein the document includes multiple sequences of terms outside of the particular sequence; determining that the particular sequence of terms includes one or more first terms that refer to a particular entity; selecting one or more second terms from the particular sequence of terms that includes the one or more first terms that refer to the particular entity; generating, for each of the selected one or more second terms that are from the particular sequence of terms that includes the one or more first terms that refer to the particular entity, a name-context pair that includes (i) the one or more first terms that refer to the particular entity, and (ii) the selected one or more second terms; determining, for each distinct name-context pair, a context consistency measure that is an estimate of a probability that the selected one or more second terms of the respective name-context pair will appear in another sequence of terms based on the occurrence in the other sequence of terms of the one or more first terms that refer to the particular entity of the respective name-context pair; and storing context-entity name data that associates one or more of the distinct name-context pairs with the corresponding context consistency measure in a memory. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer storage medium storing instructions that, when executed by data processing apparatus, cause the one or more computers to perform operations comprising:
-
receiving a particular sequence of terms that occurs in a document, wherein the document includes multiple sequences of terms outside of the particular sequence; determining that the particular sequence of terms includes one or more first terms that refer to a particular entity; selecting one or more second terms from the particular sequence of terms that includes the one or more first terms that refer to the particular entity; generating, for each of the selected one or more second terms that are from the particular sequence of terms that includes the one or more first terms that refer to the particular entity, a name-context pair that includes (i) the one or more first terms that refer to the particular entity, and (ii) the selected one or more second terms; determining, for each distinct name-context pair, a context consistency measure that is an estimate of a probability that the selected one or more second terms of the respective name-context pair will appear in another sequence of terms based on the occurrence in the other sequence of terms of the one or more first terms that refer to the particular entity of the respective name-context pair; and storing context-entity name data that associates one or more of the distinct name-context pairs with the corresponding context consistency measure in a memory. - View Dependent Claims (16, 17, 18)
-
Specification