Phrase matching in documents having nested-structure arbitrary (document-specific) markup
First Claim
1. A method of searching a document having nested-structure document-specific markup, the method comprising:
- receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process;
deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the step of deriving the query-specific indices includes forming at least one of a group including;
an index of each word in the phrase to be matched by the phrase matching process;
an index of context tags that may be found in the document; and
an index of at least a tag or annotation to be ignored during the phrase matching process,wherein the query-independent indices were created by a method including;
a) labeling elements in the document with intervals, wherein;
a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, anda2) for single words, the intervals are defined in terms of a single index number associated with the word; and
b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating that the word or tag is present or not present at that position,wherein the step of deriving the query-specific indices involves deriving the query-specific indices from the query-independent indices without rebuilding any of the query-independent indices; and
carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
4 Assignments
0 Petitions
Accused Products
Abstract
A method of searching a document having nested-structure document-specific markup (such as Extensible Markup Language (XML)) involves 112 receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process. The method further involves 114 deriving query-specific indices based on query-independent indices that were created specific to each document, and 116 carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
38 Citations
8 Claims
-
1. A method of searching a document having nested-structure document-specific markup, the method comprising:
-
receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process; deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the step of deriving the query-specific indices includes forming at least one of a group including; an index of each word in the phrase to be matched by the phrase matching process; an index of context tags that may be found in the document; and an index of at least a tag or annotation to be ignored during the phrase matching process, wherein the query-independent indices were created by a method including; a) labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating that the word or tag is present or not present at that position, wherein the step of deriving the query-specific indices involves deriving the query-specific indices from the query-independent indices without rebuilding any of the query-independent indices; and carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
-
2. A method of searching a document having nested-structure document-specific markup, the method comprising:
-
receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process, wherein the phrase matching process includes; for each context interval, defined by a beginning index defining a position of beginning tag and a closing index defining a position of a closing tag, performing an index-nested loop by probing an index of each phrase word in order, and an index of each tag or annotation to be ignored, so as to construct at least one witness; wherein each witness is a contiguous sequence of intervals contained within the context interval and includes each phrase word occurrence exactly once and in phrase order; deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the query-independent indices were created by a method including; a) labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating that the word or tag is present or not present at that position; and carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup. - View Dependent Claims (3)
-
-
4. A method of searching a document having nested-structure document-specific markup, the method comprising:
-
receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process, wherein the phrase matching process includes; scanning, in document order, a combined index of (A) phrase words and (B) tags or annotations to be ignored, while using a stack to keep track of nested context intervals and annotation intervals; wherein; the stack includes at least one entry corresponding to a current context interval in which witnesses are identified; and the at least one entry maintains a set of (A) partial witnesses that are being identified and (B) complete witnesses that have been identified, within the current context interval; deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the query-independent indices were created by a method including; a) labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating that the word or tag is present or not present at that position; and carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
-
5. A computer program product including computer executable code or computer executable instructions that, when executed, causes a computer to perform a method of searching a document having nested-structure document-specific markup, the method comprising:
-
receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process; deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the query-independent indices were created by a method including; a) labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating whether or not the word or tag is present at that position, wherein the phrase matching process includes; for each context interval, defined by a beginning index defining a position of beginning tag and a closing index defining a position of a closing tag, performing an index-nested loop by probing an index of each phrase word in order, and an index of each tag or annotation to be ignored, so as to construct at least one witness; wherein each witness is a contiguous sequence of intervals contained within the context interval and includes each phrase word occurrence exactly once and in phrase order; and carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
-
6. A computer program product including computer executable code or computer executable instructions that, when executed, causes a computer to perform a method of searching a document having nested-structure document-specific markup, the method comprising:
-
receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process; deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the query-independent indices were created by a method including; a) labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating whether or not the word or tag is present at that position, wherein the phrase matching process includes; scanning, in document order, a combined index of (A) phrase words and (B) tags or annotations to be ignored, while using a stack to keep track of nested context intervals and annotation intervals; wherein; the stack includes at least one entry corresponding to a current context interval in which witnesses are identified; and the at least one entry maintains a set of (A) partial witnesses that are being identified and (B) complete witnesses that have been identified, within the current context interval; and carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
-
7. A system for searching a document having nested-structure document-specific markup, the system comprising:
-
means for receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process; means for deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the means for deriving query-independent indices comprises; a) means for labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) means for forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating whether or not the word or tag is present at that position, wherein the phrase matching process includes; for each context interval, defined by a beginning index defining a position of beginning tag and a closing index defining a position of a closing tag, means for performing an index-nested loop by probing an index of each phrase word in order, and an index of each tag or annotation to be ignored, so as to construct at least one witness; wherein each witness is a contiguous sequence of intervals contained within the context interval and includes each phrase word occurrence exactly once and in phrase order; and means for carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
-
8. A system for searching a document having nested-structure document-specific markup, the system comprising:
-
means for receiving a query that designates at least (A) a phrase to be matched in a phrase matching process, and (B) a selective designation of at least a tag or annotation that is to be ignored during the phrase matching process; means for deriving query-specific indices based on query-independent indices that were created specific to each document, wherein the means for deriving query-independent indices comprises; a) means for labeling elements in the document with intervals, wherein; a1) for markup tags, the intervals are defined in terms of a starting index number associated with an opening markup tag and an ending index number associated with a closing markup tag that corresponds to the opening markup tag, and a2) for single words, the intervals are defined in terms of a single index number associated with the word; and b) means for forming the query-independent indices so that they are configured to be used in the searching method by first receiving, for a word or tag in the document, a position in the document, and by then indicating whether or not the word or tag is present at that position, wherein the phrase matching process includes; means for scanning in document order, a combined index of (A) phrase words and (B) tags or annotations to be ignored, while using a stack to keep track of nested context intervals and annotation intervals; wherein; the stack includes at least one entry corresponding to a current context interval in which witnesses are identified; and the at least one entry maintains a set of (A) partial witnesses that are being identified and (B) complete witnesses that have been identified, within the current context interval; and means for carrying out the phrase matching process using the query-specific indices on the document having the nested-structure document-specific markup.
-
Specification