Membership checking of digital text
First Claim
Patent Images
1. A system, comprising:
- a data structure populated with document tokens selected from signatures of documents and member tokens selected from database members, the data structure including entries identifying co-occurrences of the document tokens and the member tokens;
a filter component configured to identify individual documents that cannot match individual database members using the co-occurrences identified in the data structure;
a verification component configured to receive a remainder of the documents and to verify whether sub-strings of the remainder of the documents match the individual database members; and
at least one computing device configured to execute one or more of the filter component or the verification component.
2 Assignments
0 Petitions
Accused Products
Abstract
The described implementations relate to data analysis, such as membership checking. One technique identifies candidate matches between document sub-strings and database members utilizing signatures. The technique further verifies that the candidate matches are true matches.
-
Citations
20 Claims
-
1. A system, comprising:
-
a data structure populated with document tokens selected from signatures of documents and member tokens selected from database members, the data structure including entries identifying co-occurrences of the document tokens and the member tokens; a filter component configured to identify individual documents that cannot match individual database members using the co-occurrences identified in the data structure; a verification component configured to receive a remainder of the documents and to verify whether sub-strings of the remainder of the documents match the individual database members; and at least one computing device configured to execute one or more of the filter component or the verification component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
populating a data structure with document tokens selected from signatures of documents and member tokens selected from database members, the data structure including entries identifying co-occurrences of the document tokens and the member tokens; identifying individual documents that cannot match individual database members using the co-occurrences identified in the data structure; receiving a remainder of the documents; and verifying whether sub-strings of the remainder of the documents match the individual database members. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer-readable storage media having instructions stored thereon that when executed by a computing device cause the computing device to perform acts, comprising:
-
populating a data structure with document tokens selected from signatures of documents and member tokens selected from database members, the data structure including entries identifying co-occurrences of the document tokens and the member tokens; identifying individual documents that cannot match individual database members using the co-occurrences identified in the data structure; receiving a remainder of the documents; and verifying whether sub-strings of the remainder of the documents match the individual database members. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification