Matching Engine With Signature Generation
0 Assignments
0 Petitions
Accused Products
Abstract
A system and a method generates at least one signature associated with document. In one embodiment, a document comprised of text is received and parsed to generate a token set. The token set includes a plurality of tokens. Each token corresponds to the text in the document that is separated by a predefined character characteristic. A score is calculated for each token in the token set based on a frequency and distribution of the text in the document. Each token is then ranked based on the calculated score. A subset of the ranked tokes is selected and a signature is generated for each occurrence of the selected tokens. The selected list of signatures is then output.
-
Citations
25 Claims
-
1-6. -6. (canceled)
-
7. A method for generating a plurality of signatures associated with a document, the method comprising:
-
receiving a document comprising a plurality of characters; normalizing the document to remove non-informative characters from the plurality of characters; calculating a score for each informative character of the plurality of characters based on an occurrence frequency and distribution in the document; ranking each informative character of the plurality of characters based on the calculated score; selecting, from the ranked informative characters, character occurrences; and generating a signature for each selected character occurrence. - View Dependent Claims (8, 9, 10, 11, 12, 25)
-
-
13-18. -18. (canceled)
-
19. A computer readable storage medium storing instructions executable by a processor, the instructions when executed causing a processor to:
-
receive a document comprising a plurality of characters; normalize the document to remove non-informative characters from the plurality of characters; calculate a score for each informative character of the plurality of characters based on an occurrence frequency and distribution in the document; rank each informative character of the plurality of characters based on the calculated score; select, from the ranked informative characters, character occurrences; and generate a signature for each selected character occurrence. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification