Method and system for indexing and searching contents of extensible markup language (XML) documents
First Claim
1. A method of indexing the content of a document in Extensible Markup Language (XML), comprising:
- modifying each word in the content of said XML document by suffixing to the word any field qualifiers associated with the word in the nested order; and
building a full-text index with said modified words;
wherein a field qualifier indicates information about the usage of the word in the document.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and a computer system for indexing and searching the data content of nested field records, such as those in Extensible Markup Language (XML). The system includes an indexing and searching engine that constructs an improved full-text search index on the input XML data and then performs searches using the index. The system supports exact matches and partial matches using a wildcard character. The method transforms the input XML data into a form that encodes the data structural information by suffixing each word with its corresponding field qualifiers or an equivalent numerical pattern thereof. The resulting encoded words are then stored in a full-text index structure. Various types of full-index search may be performed. One alternative embodiment is to combine string matching and numeric or integer pattern matching to identify a particular word in a particular field. The portion of the word without field qualifiers is matched against the words in the index, and the pattern of numerals representing the word'"'"'s field qualifiers is matched against the numeral patterns of the words in the index that correspond to their respective field qualifiers. Therefore, evaluation of complex field criteria is reduced to simpler and faster numeric matching.
32 Citations
4 Claims
-
1. A method of indexing the content of a document in Extensible Markup Language (XML), comprising:
-
modifying each word in the content of said XML document by suffixing to the word any field qualifiers associated with the word in the nested order; and building a full-text index with said modified words; wherein a field qualifier indicates information about the usage of the word in the document. - View Dependent Claims (2)
-
-
3. A method of indexing content of a document in Extensible Markup Language (XML), comprising:
-
assigning a numerical code to each field qualifier in said XML document; creating a code pattern for each word in said XML document from the word'"'"'s field qualifiers in the nested order using said numerical code; modifying each word by suffixing to the word said code pattern; and
building a full-text index with said modified words;wherein a field qualifier indicates information about the usage of the word in the document. - View Dependent Claims (4)
-
Specification