Text analysis techniques
First Claim
Patent Images
1. A method, comprising:
- selecting a set of text documents;
selecting a number of terms included in the set;
establishing a multidimensional document space with a computer system as a function of the terms;
performing a bump hunting procedure with the computer system to identify a number of document space features, the features each corresponding to a composition of two or more concepts of the documents; and
deconvolving the features with the computer system to separately identify the concepts.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present invention includes means determining a concept representation for a set of text documents based on partial order analysis and modifying this representation if it is determined to be unidentifiable. Furthermore, the embodiment includes means for labeling the representation, mapping documents to it to provide a corresponding document representation, generating a number of document signatures each of a different type, and performing several data processing applications each with a different one of the document signatures of differing types.
292 Citations
76 Claims
-
1. A method, comprising:
-
selecting a set of text documents;
selecting a number of terms included in the set;
establishing a multidimensional document space with a computer system as a function of the terms;
performing a bump hunting procedure with the computer system to identify a number of document space features, the features each corresponding to a composition of two or more concepts of the documents; and
deconvolving the features with the computer system to separately identify the concepts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method, comprising:
- performing a routine with a computer system, including;
extracting terminological features from a set of text documents;
establishing a representation of a number of concepts of the text documents as a function of the features, the representation corresponding to an arrangement of several levels to indicate different degrees of concept specificity; and
identifying a number of different related groups of the concepts, the groups each being mathmatically determined as a function of a degree of separateness from the concept representation. - View Dependent Claims (10, 11, 12, 13)
- performing a routine with a computer system, including;
-
14. A method, comprising:
- performing a routine with a computer system, including;
extracting terminological features from a set of text documents;
establishing a representation of a number of concepts of the text documents as a function of the terminological features, the representation hierarchically indicating different degrees of specificity among related members of the concepts and corresponding to an acyclic graph organization;
determining the representation is nonidentifiable;
in response to said determining, constraining one or more processing parameters of the routine; and
providing a modified concept representation after said constraining, the modified concept representation being identifiable. - View Dependent Claims (15, 16, 17, 18)
- performing a routine with a computer system, including;
-
19. A method, comprising:
- performing a routine with a computer system, including;
extracting terminological features from a set of text documents;
establishing a representation of a number of concepts of the text documents as a function of the terminological features, the representation hierarchically indicating different degrees of specificity among related ones of the concepts in correspondence to different levels of an acyclic graph organization;
evaluating a selected document relative to the representation; and
generating a number of different document signatures for the selected document with the representation. - View Dependent Claims (20, 21, 22, 23, 24, 25)
- performing a routine with a computer system, including;
-
26. A method, comprising:
-
selecting a set of text documents;
representing the documents with a number of terms;
identifying a number of multiterm features of the text documents as a function of frequency of each of the terms in each of the documents;
relating the multiterm features and the terms with one or more data structures corresponding to a sparse matrix;
performing a latent variable analysis as a function of the terms to determine a number of concepts of the text documents from the one or more data structures; and
providing a concept representation corresponding to a multilevel acyclic graph organization in which each node of the graph corresponds to one of the concepts. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A method, comprising:
- performing a routine with a computer system, including;
determining a number of multiterm features of a set of text documents as a function of a number of terms included in the set of text documents;
identifying one of a number of first level concepts of the text documents by determining each of the terms that is identified with one of the features;
establishing one of several second level concepts of the text documents by identifying one of the terms found in each member of a subset of the first level concepts; and
providing a concept representation of the text documents, the representation including the first level concepts and the second level concepts with the subset of the first level concepts being subordinate to the one of the second level concepts. - View Dependent Claims (32, 33, 34, 35, 36, 38, 39, 40)
- performing a routine with a computer system, including;
-
37. The method of 36, which includes labeling the one of the second level concepts with the one of the terms and the other of the terms.
-
41. A method, comprising:
-
identifying a number of events;
providing a visualization of the events with a computer system, the visualization including a number of visualization objects each representing a different one of the events;
positioning each of the visualization objects along a first axis to indicate timing of each of the events relative to one another with a corresponding initiation time and a corresponding termination time of each of the events being represented by an initiation point and termination point of each of the visualization objects along the first axis; and
dimensioning each of the visualization objects between the corresponding initiation point and the corresponding termination point along the first axis to indicate event duration and along a second axis to indicate relative strength of the different one of the events. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48)
-
-
49. A method, comprising:
-
providing a set of text documents;
evaluating time variation of a number of terms included in the documents;
generating a number of clusters corresponding to the documents with a computer system as a function of the terms; and
identifying a number of events as a function of a time variation of the clusters. - View Dependent Claims (50, 51, 52, 53, 54)
-
-
55. A method, comprising:
-
providing a number of textual documents arranged relative to a period of time;
identifying a feature with a time varying distribution among the documents;
evaluating presence of the feature for each of several different segments of the time period; and
detecting an event as a function of the one of the segments with a frequency of the feature greater than other of the segments and a quantity of the documents corresponding to the feature. - View Dependent Claims (56, 57, 58, 59, 60)
-
-
61. A method, comprising:
-
selecting a set of text documents;
designating several different dimensions of the documents;
characterizing each of the dimensions with a corresponding set of words;
for each of the dimensions, performing a clustering analysis of the documents based on the corresponding set of words; and
visualizing the clustering analysis for each of the dimensions. - View Dependent Claims (62, 63, 64)
-
-
65. A method, comprising:
-
in response to an input of one or more words in a computer system, providing a list of words with the computer system as a function of a number of context vectors for a set of text documents and the one or more words;
receiving another input responsive to the list;
reweighting a number of different entries corresponding to the context vectors with the computer system based on the second input;
providing an output of related words with the computer system based said reweighting; and
repeating said receiving, said reweighting, and said providing with the computer system. - View Dependent Claims (66, 67, 68, 69, 70)
-
-
71. An apparatus, comprising:
-
means for determining a concept representation for a set of text documents based on partial order analysis;
means for modifying the concept representation if it is determined to be unidentifiable;
means for labeling the concept representation;
means for mapping documents to the concept representation to provide a corresponding document representation;
means for generating several document signature types from the document representation; and
means for performing a number of data processing applications each with a document signature of a different one of the document signature types determine from the document representation. - View Dependent Claims (72)
-
-
73. An apparatus, comprising:
- a device carrying logic executable with a computer system to extract terminological features from a set of text documents;
establish a representation of a number of concepts of the text documents as a function of the terminological features, the representation hierarchically indicating different degrees of specificity among related ones of the concepts in correspondence to different levels of an acyclic graph organization;
evaluate a selected document relative to the representation; and
generate a number of different document signatures for the selected document with the representation. - View Dependent Claims (74, 75, 76)
- a device carrying logic executable with a computer system to extract terminological features from a set of text documents;
Specification