GRAPHICAL MODELS FOR REPRESENTING TEXT DOCUMENTS FOR COMPUTER ANALYSIS
First Claim
Patent Images
1. A method, comprising:
- receiving a document including a plurality of ordered words;
creating a graph data structure for the document, wherein the graph data structure includes a plurality of nodes and edges, each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other;
storing the graph data structure in an information repository;
receiving a request to perform text analysis on the document; and
performing text analysis on the graph data structure and providing a result that is responsive to the request.
1 Assignment
0 Petitions
Accused Products
Abstract
In a method for representing a text document with a graphical model, a document including a plurality of ordered words is received and a graph data structure for the document is created. The graph data structure includes a plurality of nodes and edges, with each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. The graph data structure is stored in an information repository.
83 Citations
20 Claims
-
1. A method, comprising:
-
receiving a document including a plurality of ordered words; creating a graph data structure for the document, wherein the graph data structure includes a plurality of nodes and edges, each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other; storing the graph data structure in an information repository; receiving a request to perform text analysis on the document; and performing text analysis on the graph data structure and providing a result that is responsive to the request. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
-
receiving a graph data structure of a document that includes a plurality of words, wherein the graph data structure includes a plurality of nodes and edges, each node representing a distinct word in the document, each edge identifying a number of times two nodes occur within a predetermined distance from each other; constructing a vector-space representation of the document by assigning a unique token to each edge in the graph data structure, wherein for each edge, a frequency of the token is equivalent to the number of times the two nodes occur with the predetermined distance from each other, wherein the vector-space representation contains the tokens; and outputting the vector-space representation. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product, comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to receive a document including a plurality of ordered words; and computer readable program code configured to create a graph data structure for the document, wherein the graph data structure includes a plurality of nodes and edges, each node representing a distinct word in the document and each edge identifying a number of times two nodes occur within a predetermined distance from each other. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program product, comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to receive a graph data structure of a document that includes a plurality of words, wherein the graph data structure includes a plurality of nodes and edges, each node representing a distinct word in the document, each edge identifying a number of times two nodes occur within a predetermined distance from each other; and computer readable program code configured to construct a vector-space representation of the document by assigning a unique token to each edge in the graph data structure, wherein for each edge, a frequency of the token is equivalent to the number of times the two nodes occur with the predetermined distance from each other, wherein the vector-space representation contains the tokens. - View Dependent Claims (17, 18, 19, 20)
-
Specification