Entity clustering via data services
First Claim
1. A method comprising:
- identifying, using one or more processors, a plurality of entities found in one or more data sources, each entity representing a word or a phrase found in the one or more data sources;
sorting the plurality of entities based on a sorting length, the sorting length being a value based on a function of an entity length of each of the entities, the sorting comprising;
sorting a first portion in non-descending order, the first portion including the entities that meet a length condition based on the sorting length, the length condition being based on one or more lengths of one or more of the extracted entities; and
sorting a second portion in non-descending order, the second portion including the entities that do not meet the length condition based on the sorting length, the second portion being located after the first portion;
using a first comparison criteria, organizing the sorted plurality of entities into groups by selecting a first entity in the sorted plurality of entities and comparing the selected entity to one or more remaining entities of the sorted plurality of entities, each group identifying one of the entities as a master entity and a set of entities as subordinate entities;
using a second comparison criteria, associating a first group from the groups with a second group from the groups, the second comparison criteria comparing only a first master entity associated with the first group with a second master entity associated with the second group; and
determining that a first entity is related to a second entity based on the association between the first group and the second group.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for forming an entity cluster. In this method, a plurality of entities found in one or more data sources are identified. An entity may represent a word or a phrase found in the one or more data sources. The plurality of entities may then be organized into groups, where each group has a master entity and a set of subordinate entities. The groups are formed using a first comparison criteria. Then, using a second comparison criteria, a first group is associated with a second group. The second comparison criteria may compare the master entities associated with the first and second groups. Based on the association between the first group and the second group, the method can then determine that the first entity is related to the second entity.
-
Citations
27 Claims
-
1. A method comprising:
-
identifying, using one or more processors, a plurality of entities found in one or more data sources, each entity representing a word or a phrase found in the one or more data sources; sorting the plurality of entities based on a sorting length, the sorting length being a value based on a function of an entity length of each of the entities, the sorting comprising; sorting a first portion in non-descending order, the first portion including the entities that meet a length condition based on the sorting length, the length condition being based on one or more lengths of one or more of the extracted entities; and sorting a second portion in non-descending order, the second portion including the entities that do not meet the length condition based on the sorting length, the second portion being located after the first portion; using a first comparison criteria, organizing the sorted plurality of entities into groups by selecting a first entity in the sorted plurality of entities and comparing the selected entity to one or more remaining entities of the sorted plurality of entities, each group identifying one of the entities as a master entity and a set of entities as subordinate entities; using a second comparison criteria, associating a first group from the groups with a second group from the groups, the second comparison criteria comparing only a first master entity associated with the first group with a second master entity associated with the second group; and determining that a first entity is related to a second entity based on the association between the first group and the second group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer system comprising:
-
at least one processor; an entity extraction module implemented by the at least one processor and configured to identify a plurality of entities found in one or more data sources, each entity representing a word or a phrase found in the one or more data sources; a sorting module implemented by the at least one processor and configured to sort the plurality of entities based on a sorting length, the sorting length being a value based on a function of an entity length of each of the entities, the sorting comprising; sorting a first portion in non-descending order, the first portion including the entities that meet a length condition based on the sorting length, the length condition being based on one or more lengths of one or more of the extracted entities; and sorting a second portion in non-descending order, the second portion including the entities that do not meet the length condition based on the sorting length, the second portion being located after the first portion; an entity matching module implemented by the at least one processor and configured to; organize the sorted plurality of entities into groups using a first comparison criteria by selecting a first entity in the sorted plurality of entities and comparing the selected entity to one or more remaining entities of the sorted plurality of entities, each group identifying one of the entities as a master entity and a set of entities as subordinate entities; associate a first group from the groups with a second group from the groups using a second comparison criteria that compares only a first master entity associated with the first group with a second master entity associated with the second group; and a group association module implemented by the at least one processor and configured to determine that a first entity is related to a second entity based on the association between the first group and the second group. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A non-transitory computer-readable storage medium storing instructions for causing a processor to implement operations, the operations comprising:
-
identifying a plurality of entities found in one or more data sources, each entity representing a word or a phrase found in the one or more data sources; sorting the plurality of entities based on a sorting length, the sorting length being a value based on a function of an entity length of each of the entities, the sorting comprising; sorting a first portion in non-descending order, the first portion including the entities that meet a length condition based on the sorting length, the length condition being based on one or more lengths of one or more of the extracted entities; and sorting a second portion in non-descending order, the second portion including the entities that do not meet the length condition based on the sorting length, the second portion being located after the first portion; using a first comparison criteria, organizing the sorted plurality of entities into groups by selecting a first entity in the sorted plurality of entities and comparing the selected entity to one or more remaining entities of the sorted plurality of entities, each group identifying one of the entities as a master entity and a set of entities as subordinate entities; using a second comparison criteria, associating a first group from the groups with a second group from the groups, the second comparison criteria comparing only a first master entity associated with the first group with a second master entity associated with the second group; and determining that a first entity is related to a second entity based on the association between the first group and the second group.
-
Specification