Extensible database framework for management of unstructured and semi-structured documents
First Claim
1. A computer-implemented method for querying a collection of Unstructured documents, the method comprising:
- (1) providing an Unstructured collection including at least one document;
(2) associating with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on at least four of the following attributes associated with the node and corresponding to at least one document;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;
(3) receiving a query, including at least one query keyword, for the collection of documents, and specifying at least one of keyword context and keyword content;
(4) determining a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute;
(5) providing information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set;
(6) determining if the query specifies context for the keyword;
(7) when the query specifies context for the keyword, determining if the query node provides context for the keyword;
(8) when the query node does not provide context for the keyword, replacing the query node by a left-adjacent node as a new query node, and returning to step (7) at least once;
(9) when the query node provides context for the keyword, adding the query node to a context list, and returning to step (5) at least once;
(10) determining if the query specifies content for the keyword;
(11) when the query specifies content for the keyword, determining if the query node provides content for the keyword;
(12) when the query node does not provide content for the keyword, replacing the query node by at least one of a right-adjacent node and a selected child node as a new query node, and returning to step (11) at least once; and
(13) when the query node provides content for the keyword, adding the query node to a content list, and returning to said step (5) at least once.
4 Assignments
0 Petitions
Accused Products
Abstract
Method and system for querying a collection of Unstructured or semi-structured documents to identify presence of, and provide context and/or content for, keywords and/or keyphrases. The documents are analyzed and assigned a node structure, including an ordered sequence of mutually exclusive node segments or strings. Each node has an associated set of at least four, five or six attributes with node information and can represent a format marker or text, with the last node in any node segment usually being a text node. A keyword (or keyphrase) is specified, and the last node in each node segment is searched for a match with the keyword. When a match is found at a query node, or at a node determined with reference to a query node, the system displays the context and/or the content of the query node.
-
Citations
20 Claims
-
1. A computer-implemented method for querying a collection of Unstructured documents, the method comprising:
-
(1) providing an Unstructured collection including at least one document; (2) associating with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on at least four of the following attributes associated with the node and corresponding to at least one document;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) receiving a query, including at least one query keyword, for the collection of documents, and specifying at least one of keyword context and keyword content; (4) determining a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) providing information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) determining if the query specifies context for the keyword; (7) when the query specifies context for the keyword, determining if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, replacing the query node by a left-adjacent node as a new query node, and returning to step (7) at least once; (9) when the query node provides context for the keyword, adding the query node to a context list, and returning to step (5) at least once; (10) determining if the query specifies content for the keyword; (11) when the query specifies content for the keyword, determining if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, replacing the query node by at least one of a right-adjacent node and a selected child node as a new query node, and returning to step (11) at least once; and (13) when the query node provides content for the keyword, adding the query node to a content list, and returning to said step (5) at least once. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method for querying a collection of Unstructured documents, the method comprising:
-
(1) providing an Unstructured collection including at least one document; (2) associating with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on no more than four of the following attributes associated with the node;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the document node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) receiving a query, including at least one query keyword, for the collection of documents, and specifying at least one of context and content for the keyword; (4) determining a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) providing information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) determining if the query specifies context for the keyword; (7) when the query specifies context for the keyword, determining if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, replacing the query node by a left-adjacent node as a new query node, and returning to step (7) at least once; (9) when the query node provides context for the keyword, adding the query node to a context list, and returning to step (5) at least once; (10) determining if the query specifics content for the keyword; (11) when the query specifies content for the keyword, determining if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, replacing the query node by at least one of a right-adjacent node and a selected child node as a new query node, and returning to step (11) at least once; and (13) when the query node provides content for the keyword, adding the query node to a content list, and returning to said step (5) at least once. - View Dependent Claims (6, 7)
-
-
8. A computer-implemented method for querying a collection of Unstructured documents, the method comprising:
-
(1) providing an Unstructured collection including at least one document; (2) associating with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on no more than five of the following attributes associated with the node;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the document node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) receiving a query, including at least one query keyword, for the collection of documents, and specifying at least one of context and content for the keyword; (4) determining a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) providing information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) determining if the query specifies context for the keyword; (7) when the query specifies context for the keyword, determining if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, replacing the query node by a left-adjacent node as a new query node, and returning to step (7) at least once; (9) when the query node provides context for the keyword, adding the query node to a context list, and returning to step (5) at least once; (10) determining if the query specifies content for the keyword; (11) when the query specifies content for the keyword, determining if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, replacing the query node by at least one of a right-adjacent node and a selected child node as a new query node, and returning to step (11) at least once; and (13) when the query node provides content for the keyword, adding the query node to a content list, and returning to said step (5) at least once. - View Dependent Claims (9, 10)
-
-
11. A computer-implemented system for querying a collection of Unstructured documents, the system comprising a computer that is programmed:
-
(1) to provide an Unstructured collection including at least one document; (2) to associate with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on at least four of the following attributes associated with the node and corresponding to at least one document;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) to receive a query, including at least one query keyword, for the collection of documents, and specifying at least one of keyword context and keyword content; (4) to determine a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) to provide information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) to determine if the query specifies context for the keyword; (7) when the query specifies context for the keyword, to determine if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, to replace the query node by a left-adjacent node as a new query node, and to return to step (7) at least once; (9) when the query node provides context for the keyword, to add the query node to a context list, and to return to step (5) at least once; (10) to determine if the query specifies content for the keyword; (11) when the query specifies content for the keyword, to determine if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, to replace the query node by at least one of a right-adjacent node and a selected child node as a new query node, and to return to step (11) at least once; and (13) when the query node provides content for the keyword, to add the query node to a content list, and to return to said step (5) at least once. - View Dependent Claims (12, 13, 14)
-
-
15. A computer-implemented system for querying a collection of Unstructured documents, the system comprising a computer that is programmed:
-
(1) to provide an Unstructured collection including at least one document; (2) to associate with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on no more than four of the following attributes associated with the node and corresponding to at least one document;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) to receive a query, including at least one query keyword, for the collection of documents, and specifying at least one of keyword context and keyword content; (4) to determine a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) to provide information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) to determine if the query specifies context for the keyword; (7) when the query specifies context for the keyword, to determine if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, to replace the query node by a left-adjacent node as a new query node, and to return to step (7) at least once; (9) when the query node provides context for the keyword, to add the query node to a context list, and to return to step (5) at least once; (10) to determine if the query specifies content for the keyword; (11) when the query specifies content for the keyword, to determine if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, to replace the query node by at least one of a right-adjacent node and a selected child node as a new query node, and to return to step (11) at least once; and (13) when the query node provides content for the keyword, to add the query node to a content list, and to return to said step (5) at least once. - View Dependent Claims (16, 17)
-
-
18. A computer-implemented system for querying a collection of Unstructured documents, the system comprising a computer that is programmed:
-
(1) to provide an Unstructured collection including at least one document; (2) to associate with each document in the collection a connected node structure including an ordered sequence of document nodes, with each node being labeled by a document node indicium that provides information on no more than five of the following attributes associated with the node a corresponding to at least one document;
(1) a first attribute that allows identification of a unique number associated with the node;
(2) a second attribute that specifies a descriptive label for the node;
(3) a third attribute that specifies data type for the node, from among at least two selected data types, and indicates processing requirements for the node;
(4) a fourth attribute that provides text data, if any, associated with the node;
(5) a fifth attribute that specifies a node label, if any, for a node, if any, that serves as a parent node for the node; and
(6) a sixth attribute that specifies a node label, if any, for a node, if any, that serves as a sibling node for the node, where information from the fourth attribute is included in the node indicium;(3) to receive a query, including at least one query keyword, for the collection of documents, and specifying at least one of keyword context and keyword content; (4) to determine a set of query nodes in the node structure, each of which contains at least one occurrence of the keyword in the fourth attribute; (5) to provide information on at least one selected fourth attribute containing the keyword, for at least one query node in the query node set; (6) to determine if the query specifies context for the keyword; (7) when the query specifies context for the keyword, to determine if the query node provides context for the keyword; (8) when the query node does not provide context for the keyword, to replace the query node by a left-adjacent node as a new query node, and to return to step (7) at least once; (9) when the query node provides context for the keyword, to add the query node to a context list, and to return to step (5) at least once; (10) to determine if the query specifies content for the keyword; (11) when the query specifies content for the keyword, to determine if the query node provides content for the keyword; (12) when the query node does not provide content for the keyword, to replace the query node by at least one of a right-adjacent node and a selected child node as a new query node, and to return to step (11) at least once; and (13) when the query node provides content for the keyword, to add the query node to a content list, and to return to said step (5) at least once. - View Dependent Claims (19, 20)
-
Specification