PROCESS FOR IDENTIFYING WEIGHTED CONTEXTURAL RELATIONSHIPS BETWEEN UNRELATED DOCUMENTS

US 20060200461A1
Filed: 01/27/2006
Published: 09/07/2006
Est. Priority Date: 03/01/2005
Status: Abandoned Application

First Claim

Patent Images

1. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising the steps of:

assembling a plurality of unrelated documents into a group for analysis;

identifying at least one quality of interest to be analyzed;

analyzing the group of documents to determine a first frequency of the at least one quality within the group;

analyzing the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;

normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and

generating relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that builds a network using a document collection wherein the documents are collected and represented as a plurality of nodes in a network matrix. The documents that are to be analyzed are bound to the network (corpus) at a discrete node corresponding to the document. The documents are then analyzed to determine term frequency within each document and the overall term frequency of the same term throughout the entire document grouping. This creates a weighting value that determines the relevancy of each document as compared to the entire network of documents. Finally, weighting values are normalized with relative weighting values so that the sum of the weights of all edges connected to a given node equals 1. User queries then proceed through the network from node to node using the algorithm of the present invention to locate documents relevant to the search.

Citations

18 Claims

1. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising the steps of:
- assembling a plurality of unrelated documents into a group for analysis;
  
  identifying at least one quality of interest to be analyzed;
  
  analyzing the group of documents to determine a first frequency of the at least one quality within the group;
  
  analyzing the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
  
  normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
  
  generating relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein said at least one quality of interest comprises a plurality of qualities of interest and said step of generating relationship links includes generating discrete sets of relationship links, each of said sets of links corresponding to each of said qualities of interest within said plurality of qualities of interest.
  - 3. The method of claim 1, further comprising the steps of:
    - reviewing the content of each of said plurality of documents it identify which of those documents contain sufficient textural content for analysis; and
      
      eliminating documents from said plurality of documents that do not contain sufficient textural content.
  - 4. The method of claim 1, wherein said quality of interest is comprises a plurality of terms, said plurality of terms including a word, roots of said word, thesaurus equivalents of said word, and roots of said thesaurus equivalents of said word.
  - 5. The method of claim 1, further comprising the step of:
    - searching said plurality of documents using one of said qualities of interest using an entropic algorithm wherein said scope of said search is limited by dissipation of an initial activation value, said dissipation determined by subtracting the weighting value of each relationship link followed in the search from the initial activation value.
  - 6. The method of claim 1 wherein the documents comprise unstructured data.
  - 7. The method of claim 6 wherein the documents comprise free-form text.
  - 8. The method of claim 1 wherein the documents comprise images.
  - 9. The method of claim 2 wherein said plurality of qualities of interest is identified based on the relative frequency of said qualities of interest relative to all of the qualities contained within said plurality of documents.
  - 10. The method of claim 9 wherein said qualities of interest comprise single word entries.
  - 11. The method of claim 9 wherein said qualities of interest terms comprise a phrase.

12. A computer based method for identifying interrelationships between documents within a grouping of a plurality of unstructured and unrelated documents, comprising the steps of:
- assembling a plurality of unrelated documents for analysis;
  
  performing an initial analysis of said plurality of documents to identify at least one quality of interest to be analyzed based on the overall content of said plurality of documents;
  
  determining a first frequency corresponding to the frequency of said at least one quality of interest within said plurality of documents;
  
  performing a second analysis of the plurality of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
  
  normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
  
  generating structured data about the unstructured plurality of documents based on said weighting factor.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12, wherein said at least one quality of interest comprises a plurality of qualities of interest and said step of generating structured data includes generating discrete sets of structured data corresponding to each of said qualities of interest within said plurality of qualities of interest.
  - 14. The method of claim 12 further comprising the steps of:
    - reviewing the content of each of said plurality of documents it identify which of those documents contain sufficient textural content for analysis; and
      
      eliminating documents from said plurality of documents that do not contain sufficient textural content.
  - 15. The method of claim 12, wherein said quality of interest is comprises a plurality of terms, said plurality of terms including a word, roots of said word, thesaurus equivalents of said word, and roots of said thesaurus equivalents of said word.
  - 16. The method of claim 12, further comprising the step of:
    - searching said plurality of documents using one of said qualities of interest using an entropic algorithm wherein said scope of said search is limited by dissipation of an initial activation value by subtracting said weighting values from said initial activation value as said search passes through said structured data.

17. A computer based apparatus for identifying interrelationships between documents within a grouping of a plurality of unrelated documents, comprising:
- means for assembling a plurality of unrelated documents into a group for analysis; and
  
  processor means for identifying at least one quality of interest to be analyzed, wherein said processor means first analyzes the group of documents to determine a first frequency of the at least one quality within the group, wherein said processor means then analyzes the group of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document, said processor normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents to generate relationship links based on said normalized second frequencies corresponding to said at least one quality of interest, said relationship links extending between documents that are weighted relative to the at least one quality of interest.

18. A computer based apparatus for identifying interrelationships between documents within a grouping of a plurality of unstructured and unrelated documents, comprising:
- means for assembling a plurality of unrelated documents for analysis;
  
  means for performing an initial analysis of said plurality of documents to identify at least one quality of interest to be analyzed based on the overall content of said plurality of documents;
  
  means for determining a first frequency corresponding to the frequency of said at least one quality of interest within said plurality of documents;
  
  means for performing a second analysis of the plurality of documents to determine a second set of frequencies corresponding to the frequency of the at least one quality within each individual document;
  
  means for normalizing each of said second frequencies relative to said first frequency to generate a weighting factor for each of said documents; and
  
  means for generating structured data about the unstructured plurality of documents based on said weighting factor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Iquest Analytics, Inc.
Original Assignee
Iquest Analytics, Inc.
Inventors
Lucas, Marshall D., Lucas, Don M., Rosenthal, Joseph S.

Application Number

US11/275,771
Publication Number

US 20060200461A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/355   Class or cluster creation o...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

PROCESS FOR IDENTIFYING WEIGHTED CONTEXTURAL RELATIONSHIPS BETWEEN UNRELATED DOCUMENTS

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

PROCESS FOR IDENTIFYING WEIGHTED CONTEXTURAL RELATIONSHIPS BETWEEN UNRELATED DOCUMENTS

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links