×

Automatic document source identification systems

  • US 10,331,950 B1
  • Filed: 06/19/2018
  • Issued: 06/25/2019
  • Est. Priority Date: 06/19/2018
  • Status: Active Grant
First Claim
Patent Images

1. A document source identification system comprising:

  • one or more processors; and

    a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to;

    receive a first uploaded document having a first document type of a plurality of predetermined document types each associated with a document category;

    categorize the first uploaded document in a first document category based on the first document type;

    extract at least one first data entry from the first uploaded document based on the first document category, the at least one first data entry comprising a first sensitive data entry comprising personally identifiable information;

    identify a data entry type associated with the at least one first data entry at least in part based on one or more of a first proximate location of the at least one first data entry in the first uploaded document, a presence of one or more additional extractable data entries, and a second proximate location of the one or more additional extractable data entries;

    normalize the first sensitive data entry according to a business ruleset to produce a normalized first data entry;

    tokenize the normalized first sensitive data entry;

    execute a deterministic identification search, comprising;

    filtering a plurality of the user account data entries in a user account database based on one or more of the normalized first data entry or the data entry type associated with the normalized first data entry to identify a first subset of user account data entries, each user account data entry in the first subset of user account data entries being associated with an existing user account; and

    determining that the normalized first data entry matches zero, one, or more than one user account data entries in the first subset of user account data entries;

    execute, in response to determining that the normalized first data entry matches zero or more than one user account data entries in the first subset of user account data entries, a probabilistic identification search using a machine learning trained probabilistic model to identify a highest ranked user account data entry in the first subset of user account data entries;

    link the first uploaded document to a first existing user account associated with either the one matching user account data entry in the first subset of user account data entries or the highest ranked user account data entry in the first subset of user account data entries.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×