Method for domain identification of documents in a document database
First Claim
Patent Images
1. A method for processing a plurality of documents in a document database using a computer-implemented system comprising a processor and a display operatively coupled to the processor, the method comprising:
- operating the processor to perform the following without requiring pre-computationsa) determining vocabulary words for each document of the plurality thereof;
b) determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents;
c) determining similarities and differences between the plurality of documents based upon the vocabulary words and their respective relevancies;
d) defining supersets of vocabulary words based on the determined similarities and differences; and
e) determining domain identifications for the supersets of vocabulary words using results of a) through d) and only after a) through d); and
operating the display to display the defined supersets of vocabulary words and their domain identifications, including display of the vocabulary words as being relevant and irrelevant based on occurrences thereof in the plurality of documents.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for processing documents in a document database includes determining vocabulary words for each document, and determining a respective relevancy for each vocabulary word based upon occurrences thereof in all of the documents. Similarities are determined between the documents based upon the vocabulary words and their respective relevancies. At least one domain identification is determined for the documents based upon the determined similarities.
56 Citations
37 Claims
-
1. A method for processing a plurality of documents in a document database using a computer-implemented system comprising a processor and a display operatively coupled to the processor, the method comprising:
-
operating the processor to perform the following without requiring pre-computations a) determining vocabulary words for each document of the plurality thereof; b) determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents; c) determining similarities and differences between the plurality of documents based upon the vocabulary words and their respective relevancies; d) defining supersets of vocabulary words based on the determined similarities and differences; and e) determining domain identifications for the supersets of vocabulary words using results of a) through d) and only after a) through d); and operating the display to display the defined supersets of vocabulary words and their domain identifications, including display of the vocabulary words as being relevant and irrelevant based on occurrences thereof in the plurality of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for processing a plurality of documents in a document database using a computer-implemented system comprising a processor and a display coupled to the processor, the method comprising:
-
operating the processor to perform the following without requiring pre-computations a) determining vocabulary words for each document of the plurality thereof; b) determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents using a computer; c) determining similarities and differences between the plurality of documents based upon the vocabulary words and their respective relevancies; d) defining supersets of vocabulary words based on the determined similarities and differences; e) determining domain identifications for the supersets of vocabulary words based upon the determined similarities and differences using results of a) through d) and only after a) through d); and dividing the domain identifications into lower level domain identifications based upon selecting vocabulary words associated with each respective lower level domain identification, with relevancies of vocabulary words associated with each lower level domain identification changing so that similar documents are grouped together for each lower level domain identification; and operating the display to display the defined supersets of vocabulary words and their domain identifications including the lower level domain identifications, including display of the vocabulary words as being relevant and irrelevant based on occurrences thereof in the plurality of documents. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. A computer-readable storage medium having computer-executable instructions stored thereon for causing a computer-implemented system comprising a processor and a display coupled to the processor to perform steps comprising:
-
operating the processor to perform the following without requiring pre-computations a) determining vocabulary words for each document of the plurality thereof; b) determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents; c) determining similarities and differences between the plurality of documents based upon the vocabulary words and their respective relevancies; d) defining supersets of vocabulary words based on the determined similarities and differences; and e) determining domain identifications for the supersets of vocabulary words using results of a) through d) and only after a) through d); and operating the display to display the defined supersets of vocabulary words and their domain identifications, including display of the vocabulary words as being relevant and irrelevant based on occurrences thereof in the plurality of documents. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
-
31. A computer-implemented system comprising:
-
a processor for processing documents in a document database, said processor configured to perform the following without requiring pre-computations a) determining vocabulary words for each document of the plurality thereof; b) determining a respective relevancy for each vocabulary word based upon occurrences thereof in the plurality of documents; c) determining similarities and differences between the plurality of documents based upon the vocabulary words and their respective relevancies; d) defining supersets of vocabulary words based on the determined similarities and differences; and e) determining domain identifications for the supersets of vocabulary words based upon the determined similarities and differences using results of a) through d) and only after a) through d); and a display coupled to said processor for displaying the defined supersets of vocabulary words and their domain identifications, including display of the vocabulary words as being relevant and irrelevant based on occurrences thereof in the plurality of documents. - View Dependent Claims (32, 33, 34, 35, 36, 37)
-
Specification