Generating context-based spell corrections of entity names
First Claim
1. A system comprising:
- one or more computers including one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising;
receiving a query comprising three or more terms;
identifying, from among the terms of the query, an entity name and two or more context terms;
obtaining a plurality of candidate corrected spellings for the entity name;
determining a respective count of co-occurrences of each context term with each candidate corrected spelling for the entity name, in a plurality of texts comprising;
counting, as one co-occurrence, each distinct text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once;
orcounting, as one co-occurrence, each distinct window of text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once;
determining a score for each candidate corrected spelling for the entity name based at least on the respective counts of co-occurrences of each context term with the respective candidate corrected spelling for the entity name, in the plurality of texts;
selecting one or more of the candidate corrected spellings for the entity name based at least on the scores; and
using the selected one or more candidate corrected spellings to generate a response to the query.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for correcting entity names. One method includes receiving texts and deriving a plurality of name-context pairs from the texts. The method further includes calculating a context consistency measure for each name-context pair and storing context-entity name data representing the name-context pairs. Another method includes identifying an entity name and one or more context terms from a query and generating candidate names for the entity name. The method further includes determining a score for each of the candidate names, selecting a number of top scoring candidate names, and using the selected candidate names to respond to the query.
32 Citations
42 Claims
-
1. A system comprising:
one or more computers including one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising; receiving a query comprising three or more terms; identifying, from among the terms of the query, an entity name and two or more context terms; obtaining a plurality of candidate corrected spellings for the entity name; determining a respective count of co-occurrences of each context term with each candidate corrected spelling for the entity name, in a plurality of texts comprising; counting, as one co-occurrence, each distinct text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once;
orcounting, as one co-occurrence, each distinct window of text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once; determining a score for each candidate corrected spelling for the entity name based at least on the respective counts of co-occurrences of each context term with the respective candidate corrected spelling for the entity name, in the plurality of texts; selecting one or more of the candidate corrected spellings for the entity name based at least on the scores; and using the selected one or more candidate corrected spellings to generate a response to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 40)
-
14. A computer-implemented method, comprising:
-
receiving, by one or more computers, a query comprising three or more terms; identifying, by the one or more computers and from among the terms of the query, an entity name and two or more context terms; obtaining, by the one or more computers, a plurality of candidate corrected spellings for the entity name; determining a respective count of co-occurrences of each context term with each candidate corrected spelling for the entity name, in a plurality of texts comprising; counting, as one co-occurrence, each distinct text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once;
orcounting, as one co-occurrence, each distinct window of text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once; determining a score for each candidate corrected spelling for the entity name based at least one the respective counts of co-occurrences of each context term with the respective candidate corrected spelling for the entity name, in the plurality of texts; selecting, by the one or more computers, one or more of the candidate corrected spellings for the entity name based at least on the scores; and using, by the one or more computers, the selected one or more candidate corrected spellings to generate a response to the query. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 41)
-
-
27. A non-transitory computer storage medium storing instructions that, when executed by data processing apparatus, cause the data processing apparatus to perform operations comprising:
-
receiving a query comprising three or more terms; identifying, from among the terms of the query, an entity name and two or more context terms; obtaining a plurality of candidate corrected spellings for the entity name; determining a respective count of co-occurrences of each context term with each candidate corrected spelling for the entity name, in a plurality of texts comprising; counting, as one co-occurrence, each distinct text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once;
orcounting, as one co-occurrence, each distinct window of text from the plurality of texts in which the context term and the candidate corrected spelling both appear at least once; determining a score for each candidate corrected spelling for the entity name based at least on the respective counts of co-occurrences of each context term with the respective candidate corrected spelling for the entity name, in the plurality of texts; selecting one or more of the candidate corrected spellings for the entity name based at least on the scores; and using the selected one or more candidate corrected spellings to generate a response to the query. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42)
-
Specification