Text analysis technique
First Claim
Patent Images
1. A method for text analysis, comprising:
- selecting a set of text documents;
selecting a number of terms included in the set;
establishing a multidimensional document space with a computer system as a function of the terms;
performing a bump hunting procedure with the computer system to identify a number of document space features, the features each corresponding to a composition of two or more concepts of the documents; and
deconvolving the features with the computer system to separately identify the concepts, wherein the concepts are stored in memory of the computer system.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present invention includes means determining a concept representation for a set of text documents based on partial order analysis and modifying this representation if it is determined to be unidentifiable. Furthermore, the embodiment includes means for labeling the representation, mapping documents to it to provide a corresponding document representation, generating a number of document signatures each of a different type, and performing several data processing applications each with a different one of the document signatures of differing types.
-
Citations
22 Claims
-
1. A method for text analysis, comprising:
-
selecting a set of text documents; selecting a number of terms included in the set; establishing a multidimensional document space with a computer system as a function of the terms; performing a bump hunting procedure with the computer system to identify a number of document space features, the features each corresponding to a composition of two or more concepts of the documents; and deconvolving the features with the computer system to separately identify the concepts, wherein the concepts are stored in memory of the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for text analysis, comprising:
- performing a routine with a computer system, including;
extracting terminological features from a set of text documents by executing a bump hunting procedure; establishing a representation of a number of concepts of the text documents as a function of the terminological features, the representation hierarchically indicating different degrees of specificity among related members of the concepts and corresponding to an acyclic graph organization; determining the representation is nonidentifiable; in response to said determining, constraining one or more processing parameters of the routine; and providing a modified concept representation after said constraining, the modified concept representation being identifiable and stored in memory of the computer system, wherein the concepts are determined by executing a deconvolution procedure with respect to the features. - View Dependent Claims (10, 11, 12)
- performing a routine with a computer system, including;
-
13. A method for text analysis, comprising:
- performing a routine with a computer system, including;
extracting terminological features from a set of text documents by executing a bump hunting procedure; establishing a representation of a number of concepts of the text documents as a function of the terminological features, the representation hierarchically indicating different degrees of specificity among related ones of the concepts in correspondence to different levels of an acyclic graph organization; evaluating a selected document relative to the representation; and generating and storing in memory of the computer system a number of different document signatures for the selected document with the representation, wherein the concepts are determined by executing a deconvolution procedure with respect to the features. - View Dependent Claims (14, 15, 16, 17, 18)
- performing a routine with a computer system, including;
-
19. A method for text analysis, comprising:
-
selecting a set of text documents; representing the documents with a number of terms; identifying a number of multiterm features of the text documents with a computer system as a function of frequency of each of the terms in each of the documents; relating the multiterm features and the terms with one or more data structures corresponding to a sparse matrix with the computer system; performing a latent variable analysis as a function of the terms to determine a number of concepts of the text documents from the one or more data structures with the computer system; and providing and storing in memory of the computer system a concept representation corresponding to a multilevel acyclic graph organization in which each node of the graph corresponds to one of the concepts, wherein the identifying is via a bump hunting procedure; and wherein the latent variable analysis includes deconvolving the features to determine the concepts. - View Dependent Claims (20, 21, 22)
-
Specification