PROCESS FOR IDENTIFYING WEIGHTED CONTEXTURAL RELATIONSHIPS BETWEEN UNRELATED DOCUMENTS
First Claim
1. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising the steps of:
- assembling a plurality of unrelated documents into a group for analysis;
identifying at least one quality of interest to be analyzed;
analyzing the group of documents to determine a first frequency of the at least one quality within the group;
analyzing the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
generating relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest.
6 Assignments
0 Petitions
Accused Products
Abstract
A system that builds a network using a document collection wherein the documents are collected and represented as a plurality of nodes in a network matrix. The documents that are to be analyzed are bound to the network (corpus) at a discrete node corresponding to the document. The documents are then analyzed to determine term frequency within each document and the overall term frequency of the same term throughout the entire document grouping. This creates a weighting value that determines the relevancy of each document as compared to the entire network of documents. Finally, weighting values are normalized with relative weighting values so that the sum of the weights of all edges connected to a given node equals 1. User queries then proceed through the network from node to node using the algorithm of the present invention to locate documents relevant to the search.
-
Citations
18 Claims
-
1. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising the steps of:
-
assembling a plurality of unrelated documents into a group for analysis;
identifying at least one quality of interest to be analyzed;
analyzing the group of documents to determine a first frequency of the at least one quality within the group;
analyzing the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
generating relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unstructured and unrelated documents, comprising the steps of:
-
assembling a plurality of unrelated documents for analysis;
performing an initial analysis of said plurality of documents to identify at least one quality of interest to be analyzed based on the overall content of said plurality of documents;
determining a first frequency corresponding to the frequency of said at least one quality of interest within said plurality of documents;
performing a second analysis of the plurality of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
generating structured data about the unstructured plurality of documents based on said weighting factor. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computer based apparatus for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising:
-
means for assembling a plurality of unrelated documents into a group for analysis; and
processor means for identifying at least one quality of interest to be analyzed, wherein said processor means first analyzes the group of documents to determine a first frequency of the at least one quality within the group, wherein said processor means then analyzes the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document, said processor normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents to generate relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest.
-
-
18. A computer based apparatus for identifying interrelationships between documents within a grouping of a plurality of unstructured and unrelated documents, comprising:
-
means for assembling a plurality of unrelated documents for analysis;
means for performing an initial analysis of said plurality of documents to identify at least one quality of interest to be analyzed based on the overall content of said plurality of documents;
means for determining a first frequency corresponding to the frequency of said at least one quality of interest within said plurality of documents;
means for performing a second analysis of the plurality of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
means for normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
means for generating structured data about the unstructured plurality of documents based on said weighting factor.
-
Specification