Person disambiguation using name entity extraction-based clustering
First Claim
1. A method comprised of steps that are each performed by one or more computers, the steps manipulating data and information stored by the one or more computers, the steps of the method comprising:
- disambiguating person data located from one or more sets of search results, including extracting information about a person based on name entity extraction, and calculating similarity data, wherein the calculating similarity data comprises using a vector space model, wherein using the vector space model comprises determining a vector for a person, the vector comprising a plurality of entity features including one or more entity locations related to the person, one or more entity organizations related to the person, and one or more entities that the person has been associated with the person, wherein calculating similarity data comprises using a calculation in which each entity feature of the person vector has an entity weight and a nearness weight, and wherein the calculation comprises, for each entity feature, combining the corresponding entity weight and nearness weight with an entity weight and nearness weight of a same entity feature of another person vector and aggregating the combined weights of the entity features.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology for disambiguating data corresponding to persons that are located from search results, so that different persons having the same name can be clearly distinguished. Name entity extraction locates words (terms) that are within a certain distance of persons'"'"' names in the search results. The terms are used in disambiguating search results that correspond to different persons having the same name, such as location information, organization information, career information, and/or partner information. In one example, each person is represented as a vector, and similarity among vectors is calculated based on weighting that corresponds to nearness of the terms to a person, and/or the types of terms. Based on the similarity data, the person vectors that represent the same person are then merged into one cluster, so that each cluster represents (to a high probability) only one distinct person.
77 Citations
17 Claims
-
1. A method comprised of steps that are each performed by one or more computers, the steps manipulating data and information stored by the one or more computers, the steps of the method comprising:
disambiguating person data located from one or more sets of search results, including extracting information about a person based on name entity extraction, and calculating similarity data, wherein the calculating similarity data comprises using a vector space model, wherein using the vector space model comprises determining a vector for a person, the vector comprising a plurality of entity features including one or more entity locations related to the person, one or more entity organizations related to the person, and one or more entities that the person has been associated with the person, wherein calculating similarity data comprises using a calculation in which each entity feature of the person vector has an entity weight and a nearness weight, and wherein the calculation comprises, for each entity feature, combining the corresponding entity weight and nearness weight with an entity weight and nearness weight of a same entity feature of another person vector and aggregating the combined weights of the entity features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A computer-readable medium having computer executable instructions, which when executed perform steps, comprising:
disambiguating person data located from one or more text snippets, including; receiving the text snippets from a search engine in response to a query comprising a person name, the snippets including the person name; for each snippet extracting therefrom entity names of entities related to the person name, computing weights of the names of the entities according to their respective text distances in the snippet from the person name in the snippet, and constructing a person feature vector comprised of features that correspond to the names of the entities and each feature having the computed weight of its corresponding entity name; calculating similarity measures between the person feature vectors, each similarity measure representing similarity between two different person feature vectors, where for a given first person feature vector and a given second person feature vector, weights of features of the first person feature vector are combined with weights of the same features from the second person feature vector to compute the similarity measure between the first and second person feature vectors; and clustering the person feature vectors into clusters of similar feature vectors based on the similarity measures. - View Dependent Claims (13, 14, 15, 16)
-
17. A method of disambiguating names performed by one or more computers, the method comprising the following steps performed by the one or more computers:
-
receiving from a search engine text snippets, the text snippets have been found by the search engine in response to a query comprising a person name, the snippets including the person name; storing, by the one or more computers, the received text snippets; for each stored snippet, finding therein the person name and names of entities that are related to a person having the person name, computing weights of the names ofthe entities according to their respective text distances in the snippet from the person name in the snippet, and constructing a person feature vector comprised of features that correspond to the names of the entities and each feature having the computed weight of its corresponding entity name; calculating, by processing of the one or more computers, similarity measures between the person feature vectors, each similarity measure representing similarity between two different person feature vectors, where for a given first person feature vector and a given second person feature vector, weights of features of the first person feature vector are combined with weights of the same features from the second person feature vector to compute the similarity measure between the first and second person feature vectors; executing, by the one or more computers, a clustering algorithm to form clusters of the person feature vectors based on the similarity measures; merging clusters based on their having in-common same names of entities, each merged duster representing the same person name; and disambiguating the person name by treating each cluster as representing a different person having the same person name.
-
Specification