Method and apparatus for creating an index for a structured document based on a stylesheet
First Claim
1. A method for generating an index to facilitate searching through text within a document based upon an index stylesheet associated with the document, the method comprising:
- receiving the document to be indexed;
parsing the document to produce a parsed document;
retrieving instructions for creating the index for the document from the index stylesheet associated with the document, wherein the index stylesheet specifies sections of the document to skip in creating the index for the document, wherein the index stylesheet specifies a plurality of tokenizing procedures, and wherein different tokenizing procedures are used to tokenize different portions of the document which require different tokenizing instructions, and creating the index for the document by transforming the parsed document in a manner that is specified by the instructions retrieved from the index stylesheet.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that generates an index to facilitate searching through text within a document based upon an index stylesheet associated with the document. The system operates by receiving a document to be indexed and then parses the document to produce a parsed document. The system also retrieves instructions for creating the index for the document from an index stylesheet associated with the document. The system creates the index for the document by transforming the parsed document in a manner that is specified by the instructions retrieved from the index stylesheet. In one embodiment of the present invention, retrieving the index stylesheet involves retrieving the index stylesheet across a network from a remote address.
-
Citations
27 Claims
-
1. A method for generating an index to facilitate searching through text within a document based upon an index stylesheet associated with the document, the method comprising:
-
receiving the document to be indexed;
parsing the document to produce a parsed document;
retrieving instructions for creating the index for the document from the index stylesheet associated with the document, wherein the index stylesheet specifies sections of the document to skip in creating the index for the document, wherein the index stylesheet specifies a plurality of tokenizing procedures, and wherein different tokenizing procedures are used to tokenize different portions of the document which require different tokenizing instructions, and creating the index for the document by transforming the parsed document in a manner that is specified by the instructions retrieved from the index stylesheet. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for generating an index to facilitate searching through text within a document based upon an index stylesheet associated with the document, the method comprising:
-
receiving the document to be indexed, parsing the document to produce a parsed document;
retrieving instructions for creating the index for the document from the index stylesheet associated with the document, wherein the index stylesheet specifies sections of the document to skip in creating the index for the document, wherein the index stylesheet specifies a plurality of tokenizing procedures, and wherein different tokenizing procedures are used to tokenize different portions of the document which require different tokenizing instructions;
and creating the index for the document by transforming the parsed document in a manner that is specified by the instructions retrieved from the index stylesheet. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for generating an index to facilitate searching through text within a document based upon an index stylesheet associated with the document, the apparatus comprising:
-
a receiving mechanism that is configured to receive the document to be indexed;
a parser that is configured to parse the document to produce a parsed document;
a stylesheet retrieving mechanism that is configured to retrieve instructions for creating the index for the document from the index stylesheet associated with the document, wherein the index stylesheet specifies sections of the document to skip in creating the index for the document, wherein the index stylesheet specifies a plurality of tokenizing procedures, and wherein different tokenizing procedures are used to tokenize different portions of the document which require different tokenizing instructions;
and an index creation mechanism that is configured to create the index for the document by transforming the parsed document in a manner that is specified by the instructions retrieved from the index stylesheet. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
Specification