MECHANISM FOR EFFICIENTLY SEARCHING XML DOCUMENT COLLECTIONS
First Claim
1. A computer-implemented method:
- storing an index, wherein the index contains index entries that index a collection of XML documents that contain words;
wherein each index entry of said index entries includes;
a particular word of said words; and
at least one location indicator, wherein a location indicator includes;
a document identifier that identifies an XML document in the collection of XML documents that contains the particular word; and
at least one of;
(a) an order key that specifies a hierarchical position of a node within the XML document that contains the particular word, or(b) a path representation representing a path of a node within the XML document that contains the particular word.
1 Assignment
0 Petitions
Accused Products
Abstract
The techniques presented herein are directed towards providing a user-directed keyword-based search on a large collection of XML documents, and displaying a summary of results to the user. Prior to receiving search requests from a user, an offline analysis of a large collection of XML documents is performed to construct an inverted index of keywords. For each keyword, the index stores a set of location indicators that identify all the instances of the keyword found in the collection of documents. A location indicator may comprise a document identifier, an indication of the position of the node in the hierarchy of nodes within the XML document containing the keyword, and an indication of the pathname of the node containing the keyword. Once the index is constructed, keyword searching can be done efficiently by a keyword lookup in the index. Various display strategies enable the user to see the specific portion of a large XML document containing the keyword and/or path frequency information allowing the user to easily refine the search to specific paths within the collection of documents.
-
Citations
11 Claims
-
1. A computer-implemented method:
-
storing an index, wherein the index contains index entries that index a collection of XML documents that contain words; wherein each index entry of said index entries includes; a particular word of said words; and at least one location indicator, wherein a location indicator includes; a document identifier that identifies an XML document in the collection of XML documents that contains the particular word; and at least one of; (a) an order key that specifies a hierarchical position of a node within the XML document that contains the particular word, or (b) a path representation representing a path of a node within the XML document that contains the particular word. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for generating a response to a search request comprising:
-
receiving a search request comprising one or more words; for each word of the one or more words; retrieving from an index, a set of location indicators stored in association with said each word; wherein a location indicator of the set of location indicators includes; a document identifier stored in association with a path representation, wherein the path representation is stored in association with an order key; the document identifier identifying a particular XML document in a collection of XML documents containing said each word; the path representation representing a path of a node within the XML document that contains said each word; and the order key specifying the hierarchical position of the node within the XML document; and displaying a representation of the set of location indicators. - View Dependent Claims (7, 8, 9, 10, 11)
-
Specification