Method and system for identifying relationships between text documents and structured variables pertaining to the text documents
First Claim
Patent Images
1. A computer implemented method for identifying relationships between text documents and structured variables pertaining to said text documents, comprising:
- providing a dictionary of keywords in said text documents;
forming categories of said text documents using said dictionary and an automated algorithm;
counting occurrences of said structured variables, said categories and combinations of said structured variables and said categories for said said text documents;
calculating probabilities of occurrences of said combinations of structured variables and categories; and
identifying a relationship between a structured variable of said structured variables and text documents included in a category of said categories based on a probability of occurrence of a combination of said structured variable and said category.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for interesting relationships in text documents includes generating a dictionary of keywords in the text documents, forming categories of the text documents using the dictionary and an automated algorithm, counting occurrences of the structured variables, categories and structured variable/category combinations in the text documents, and calculating probabilities of occurrences of the structured variable/category combinations.
-
Citations
23 Claims
-
1. A computer implemented method for identifying relationships between text documents and structured variables pertaining to said text documents, comprising:
-
providing a dictionary of keywords in said text documents; forming categories of said text documents using said dictionary and an automated algorithm; counting occurrences of said structured variables, said categories and combinations of said structured variables and said categories for said said text documents; calculating probabilities of occurrences of said combinations of structured variables and categories; and identifying a relationship between a structured variable of said structured variables and text documents included in a category of said categories based on a probability of occurrence of a combination of said structured variable and said category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for identifying relationships between text documents and structured variables pertaining to said text documents, comprising:
-
an input device for inputting text documents; a processor for; forming categories of said text documents; counting occurrences of said structured variables, categories and combinations of structured variables and categories; calculating probabilities of occurrence of said combinations of structured variables and categories; and identifying a relationship between a structured variable of said structured variables and text documents included in a category of said categories based on a probability of occurrence of a combination of said structured variable and said category; and a display, for displaying said probabilities. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A programmable storage medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for identifying relationships between text documents and structured variables pertaining to said text documents, said method comprising:
-
providing a dictionary of keywords in said text documents; forming categories of said text documents using said dictionary and an automated algorithm; counting occurrences of said structured variables, said categories and said combinations of structured variables and categories in said text documents; calculating probabilities of occurrences of said structured variable/category combinations; and identifying a relationship between a structured variable of said structured variables and text documents included in a category of said categories based on a probability of occurrence of a combination of said structured variable and said category.
-
Specification