Method and system for indexing and searching contents of extensible mark-up language (XML) documents
First Claim
1. A method of full text searching of the content of a document in Extensible Markup Language (XML) for a desired word, comprising:
- modifying each word in the content of said XML document by suffixing to the word its field qualifiers in the nested order;
building a full-text index with said modified words; and
performing a search using said full-text index to identify said desired word.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and a computer system for indexing and searching the data content of nested field records, such as those in Extensible Markup Language (XML). The system includes an indexing and searching engine that constructs an improved full-text search index on the input XML data and then performs searches using the index. The system supports exact matches and partial matches using a wildcard character. The method transforms the input XML data into a form that encodes the data structural information by suffixing each word with its corresponding field qualifiers or an equivalent numerical pattern thereof. The resulting encoded words are then stored in a full-text index structure. Various types of full-index search may be performed. One alternative embodiment is to combine string matching and numeric or integer pattern matching to identify a particular word in a particular field. The portion of the word without field qualifiers is matched against the words in the index, and the pattern of numerals representing the word'"'"'s field qualifiers is matched against the numeral patterns of the words in the index that correspond to their respective field qualifiers. Therefore, evaluation of complex field criteria is reduced to simpler and faster numeric matching.
-
Citations
13 Claims
-
1. A method of full text searching of the content of a document in Extensible Markup Language (XML) for a desired word, comprising:
-
modifying each word in the content of said XML document by suffixing to the word its field qualifiers in the nested order;
building a full-text index with said modified words; and
performing a search using said full-text index to identify said desired word. - View Dependent Claims (2, 3, 4)
-
-
5. A method of full text searching of the content of a document in Extensible Markup Language (XML) for a desired word, comprising:
-
assigning a numerical code for each field qualifier in said XML document;
creating a code pattern for each word in said XML document from the word'"'"'s field qualifiers in the nested order using said numerical code;
modifying each word by suffixing to the word said code pattern;
building a full-text index with said modified words; and
performing a search using said full-text index to identify said desired word. - View Dependent Claims (6, 7, 8)
(a) matching the portion of said desired word absent its field qualifiers against the words in the full-text index;
(b) transforming the field qualifiers of said desired word into a numerical pattern;
(c) matching said numerical pattern against the code patterns of the words in the full-text index; and
(d) combining the results of step (a) and (c) to identify said desired word.
-
-
7. A method of full text searching of the content of a XML document for a desired word as set forth in claim 5, wherein performing a search further comprises a search using a wildcard character without field qualifiers to identify said desired word in all possible fields.
-
8. A method of full text searching of the content of a XML document for a desired word as set forth in claim 5, wherein performing a search further comprises a search using a wildcard character with field qualifiers to identify said desired word in a particular field or fields with designated field qualifiers.
-
9. A computerized system for full text searching of the content of a document in Extensible Markup Language (XML) for a desired word, comprising:
-
an indexing engine that transforms the content of the XML document into a text of words including their corresponding field qualifiers; and
builds a full-text index with said words; and
a search engine that performs a full-text index search to identify said desired word, wherein the indexing engine further modifies each word in the content of said XML document by suffixing to the word its field qualifiers in the nested order, and builds said full-text index using said modified words. - View Dependent Claims (10, 11)
-
-
12. A computerized system for full text searching of the content of a document in Extensible Markup Language (XML) for a desired word, comprising:
-
an indexing engine that transforms the content of the XML document into a text of words including their corresponding field qualifiers; and
builds a full-text index with said words; and
a search engine that performs a full-text index search to identify said desired word, wherein the indexing engine further modifies each word in the content of said XML document by suffixing to the word a numerical code pattern representing its field qualifiers in the nested order, and builds said full-text index using said modified words. - View Dependent Claims (13)
-
Specification