Automatic document source identification systems
First Claim
1. A document source identification system comprising:
- one or more processors; and
a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to;
receive a first uploaded document having a first document type of a plurality of predetermined document types each associated with a document category;
categorize the first uploaded document in a first document category based on the first document type;
extract at least one first data entry from the first uploaded document based on the first document category, the at least one first data entry comprising a first sensitive data entry comprising personally identifiable information;
identify a data entry type associated with the at least one first data entry at least in part based on one or more of a first proximate location of the at least one first data entry in the first uploaded document, a presence of one or more additional extractable data entries, and a second proximate location of the one or more additional extractable data entries;
normalize the first sensitive data entry according to a business ruleset to produce a normalized first data entry;
tokenize the normalized first sensitive data entry;
execute a deterministic identification search, comprising;
filtering a plurality of the user account data entries in a user account database based on one or more of the normalized first data entry or the data entry type associated with the normalized first data entry to identify a first subset of user account data entries, each user account data entry in the first subset of user account data entries being associated with an existing user account; and
determining that the normalized first data entry matches zero, one, or more than one user account data entries in the first subset of user account data entries;
execute, in response to determining that the normalized first data entry matches zero or more than one user account data entries in the first subset of user account data entries, a probabilistic identification search using a machine learning trained probabilistic model to identify a highest ranked user account data entry in the first subset of user account data entries;
link the first uploaded document to a first existing user account associated with either the one matching user account data entry in the first subset of user account data entries or the highest ranked user account data entry in the first subset of user account data entries.
1 Assignment
0 Petitions
Accused Products
Abstract
A document source identification system includes one or more memory devices storing instructions, and one or more processors configured to execute the instructions to cause the system to receive uploaded document(s) having at least one extractable data entry. The system may categorize the document, and extract at least one data entry from the document. The system may normalize each extracted data entry and execute a deterministic ID search to determine that the normalized data entry matches zero, one, or more than one account data entries associated with user accounts. Responsive to an exact match, the system may link the uploaded document to a user account associated with the matching data entry. Responsive to zero or multiple matches, the system may execute a probabilistic ID search identifying a highest ranked user account data entry and link the document to a user account associated with the highest ranked user account data entry.
24 Citations
20 Claims
-
1. A document source identification system comprising:
-
one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to; receive a first uploaded document having a first document type of a plurality of predetermined document types each associated with a document category; categorize the first uploaded document in a first document category based on the first document type; extract at least one first data entry from the first uploaded document based on the first document category, the at least one first data entry comprising a first sensitive data entry comprising personally identifiable information; identify a data entry type associated with the at least one first data entry at least in part based on one or more of a first proximate location of the at least one first data entry in the first uploaded document, a presence of one or more additional extractable data entries, and a second proximate location of the one or more additional extractable data entries; normalize the first sensitive data entry according to a business ruleset to produce a normalized first data entry; tokenize the normalized first sensitive data entry; execute a deterministic identification search, comprising; filtering a plurality of the user account data entries in a user account database based on one or more of the normalized first data entry or the data entry type associated with the normalized first data entry to identify a first subset of user account data entries, each user account data entry in the first subset of user account data entries being associated with an existing user account; and determining that the normalized first data entry matches zero, one, or more than one user account data entries in the first subset of user account data entries; execute, in response to determining that the normalized first data entry matches zero or more than one user account data entries in the first subset of user account data entries, a probabilistic identification search using a machine learning trained probabilistic model to identify a highest ranked user account data entry in the first subset of user account data entries; link the first uploaded document to a first existing user account associated with either the one matching user account data entry in the first subset of user account data entries or the highest ranked user account data entry in the first subset of user account data entries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A document source identification system comprising:
-
one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to; receive a first uploaded document having a first document type of a plurality of predetermined document types each associated with a document category; categorize the first uploaded document in a first document category based on the first document type; extract at least one first data entry from the first uploaded document based on the first document category; identify a data entry type associated with the at least one first data entry at least in part based on one or more of a proximate location of the at least one first data entry in the first uploaded document, a presence of one or more additional extractable data entries, and a proximate location of the one or more additional extractable data entries; normalize the at least one first data entry according to a business ruleset to produce a normalized first data entry; filter, based on one or more of the normalized first data entry or the data entry type associated with the normalized first data entry, a user account database having a plurality of existing user accounts data entries each corresponding with an existing user account to identify a first subset of existing user account data entries; determine whether the normalized first data entry matches a first existing user account data entry in the first subset of existing user account data entries; when the normalized first data entry is determined to match the first existing user account data entry, link the first uploaded document to a first existing user account associated with the first user account data entry; and when the normalized first data entry is determined to not match the first existing user account data entry; score each existing user account data entry in the first subset of existing user account data entries using a machine learning trained probabilistic model; identify, via the machine learning trained probabilistic model, a first probabilistic existing user account data entry having the highest score; and link the first uploaded document to a second existing user account associated with the first probabilistic existing user account data entry. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A document source identification method comprising:
-
receiving a first uploaded document; extracting at least one first data entry from the first uploaded document; identifying a data entry type associated with the at least one first data entry at least in part based on one or more of a proximate location of the at least one first data entry in the first uploaded document, a presence of one or more additional extractable data entries, and a proximate location of the one or more additional extractable data entries; normalizing the extracted at least one first data entry according to a business ruleset to produce a normalized first data entry; executing a deterministic identification search, comprising; filtering a plurality of the user account data entries in a user account database based on one or more of the normalized first data entry or the data entry type associated with the normalized first data entry to identify a first subset of user account data entries, each user account data entry in the first subset of user account data entries being associated with an existing user account; and determining that the normalized first data entry matches zero, one, or more than one user account data entries in the first subset of user account data entries; executing, in response to determining that the normalized first data entry matches zero or more than one user account data entries in the first subset of user account data entries, a probabilistic identification search using a machine learning trained probabilistic model to identify a highest ranked user account data entry in the first subset of user account data entries; linking the first uploaded document to a first existing user account associated with either the one matching user account data entry in the first subset of user account data entries or the highest ranked user account data entry in the first subset of user account data entries. - View Dependent Claims (19, 20)
-
Specification