×

Entity clustering via data services

  • US 8,572,089 B2
  • Filed: 12/15/2011
  • Issued: 10/29/2013
  • Est. Priority Date: 12/15/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • identifying, using one or more processors, a plurality of entities found in one or more data sources, each entity representing a word or a phrase found in the one or more data sources;

    sorting the plurality of entities based on a sorting length, the sorting length being a value based on a function of an entity length of each of the entities, the sorting comprising;

    sorting a first portion in non-descending order, the first portion including the entities that meet a length condition based on the sorting length, the length condition being based on one or more lengths of one or more of the extracted entities; and

    sorting a second portion in non-descending order, the second portion including the entities that do not meet the length condition based on the sorting length, the second portion being located after the first portion;

    using a first comparison criteria, organizing the sorted plurality of entities into groups by selecting a first entity in the sorted plurality of entities and comparing the selected entity to one or more remaining entities of the sorted plurality of entities, each group identifying one of the entities as a master entity and a set of entities as subordinate entities;

    using a second comparison criteria, associating a first group from the groups with a second group from the groups, the second comparison criteria comparing only a first master entity associated with the first group with a second master entity associated with the second group; and

    determining that a first entity is related to a second entity based on the association between the first group and the second group.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×