METHOD FOR DOMAIN IDENTIFICATION OF DOCUMENTS IN A DOCUMENT DATABASE
First Claim
Patent Images
1. A method for processing a plurality of documents in a document database comprising:
- determining vocabulary words for each document of the plurality thereof;
determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents;
determining similarities between the plurality of documents based upon the vocabulary words and their respective relevancies; and
determining at least one domain identification for documents based upon the determined similarities.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for processing documents in a document database includes determining vocabulary words for each document, and determining a respective relevancy for each vocabulary word based upon occurrences thereof in all of the documents. Similarities are determined between the documents based upon the vocabulary words and their respective relevancies. At least one domain identification is determined for the documents based upon the determined similarities.
-
Citations
42 Claims
-
1. A method for processing a plurality of documents in a document database comprising:
-
determining vocabulary words for each document of the plurality thereof;
determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents;
determining similarities between the plurality of documents based upon the vocabulary words and their respective relevancies; and
determining at least one domain identification for documents based upon the determined similarities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for processing a plurality of documents in a document database comprising:
-
determining vocabulary words for each document of the plurality thereof;
determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents using a computer;
selecting a portion of the vocabulary words based on their respective relevancies for defining a superset of vocabulary words;
determining similarities between the documents associated with the vocabulary words in the superset of vocabulary words and their respective relevancies; and
determining at least one domain identification for documents based upon the determined similarities. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer-readable medium having computer-executable instructions for causing a computer to perform steps comprising:
-
determining vocabulary words for each document of the plurality thereof;
determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents;
determining similarities between the plurality of documents based upon the vocabulary words and their respective relevancies; and
determining at least one domain identification for documents based upon the determined similarities. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
-
-
35. A computer-implemented system for processing documents in a document database comprising:
-
a first module for determining vocabulary words for each document of the plurality thereof;
a second module for determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents;
a third module for determining similarities between the plurality of documents based upon the vocabulary words and their respective relevancies; and
a fourth module for determining at least one domain identification for documents based upon the determined similarities. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
-
Specification