×

Resolving entities from multiple data sources for assistant systems

  • US 10,803,050 B1
  • Filed: 07/27/2018
  • Issued: 10/13/2020
  • Est. Priority Date: 04/20/2018
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising, by one or more computing systems:

  • accessing a plurality of records based on data collected from a plurality of data sources, wherein the plurality of accessed records describes attributes of a plurality of entities, and wherein the records are grouped by their corresponding data source;

    deduping the plurality of records by an entity-deduping module, wherein the entity-deduping module processes each group of records to associate each record within the group describing a particular entity with a unique entity identifier;

    selecting, for each particular entity, one of the plurality of data sources as a core source, wherein the group of records associated with the core source is selected as the core group of records;

    identifying, for a particular record in the core group of records for each particular entity, a candidate set comprising one or more records from the non-core groups of records that satisfy one or more conditions to be in the candidate set for the particular record;

    generating, for each pair of records between a record in the core group and a record in the candidate set for each particular entity, a feature vector based on a measure of similarities of respective attributes in the pair of records;

    computing, for each pair of records, a probability that the pair of records describe a common entity by processing the feature vector by a machine-learning classifier; and

    linking, for each pair of records, the record in the candidate set to a globally unique entity identifier identifying a unique entity if the probability exceeds a threshold.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×