System and method for indexing type-annotated web documents
First Claim
1. A method comprising:
- establishing a document retrieval index for use in a document retrieval systemwherein the document retrieval index is organized by type and keyword entries;
organizing type entries by a type hierarchy comprising internal and leaf nodes;
determining whether to generate an inverted list for particular types in the type hierarchy mapping the types to documents including the types in dependence on the position of the types in the type hierarchy; and
generating an inverted list for at least some of the types in the type hierarchy as a result of the determination.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus generate an index for use in a document retrieval system where the index is organized by type and keyword. Redundancy in the index is reduced by organizing type entries in a hierarchy of internal and leaf nodes. Determining whether to generate an inverted list for a type is based on the position of the type in the hierarchy; generally inverted lists are generated only for types corresponding to leaf nodes. Redundancy is further reduced by re-using inverted lists generated for keywords for types when there is an overlap between keywords and types. Search performance using the document retrieval index is improved by adding entries corresponding to combinations of keywords and types. The intersections of inverted lists associated with the keywords and types comprising the combinations are determined and added to the index for use in search operations. Determining whether to add an entry for a keyword-type combination is made on a cost-benefit analysis dependent, at least in part, on the proximity of the keyword to type in documents containing the combination.
7 Citations
20 Claims
-
1. A method comprising:
-
establishing a document retrieval index for use in a document retrieval system wherein the document retrieval index is organized by type and keyword entries; organizing type entries by a type hierarchy comprising internal and leaf nodes; determining whether to generate an inverted list for particular types in the type hierarchy mapping the types to documents including the types in dependence on the position of the types in the type hierarchy; and generating an inverted list for at least some of the types in the type hierarchy as a result of the determination. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product tangibly embodying a computer program in a computer readable memory medium, the computer program configured to perform operations involving a document retrieval index when executed by digital processing apparatus, the operations comprising:
- establishing the document retrieval index, where the document retrieval index is organized by type and keyword entries;
organizing type entries by a type hierarchy comprised of internal and leaf nodes;
determining whether to generate an inverted list for particular types in the type hierarchy in dependence on the position of the types in the type hierarchy, wherein the inverted list maps the types to documents including the types; and
generating an inverted list for at least some of the types in the type hierarchy as a result of the determination. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- establishing the document retrieval index, where the document retrieval index is organized by type and keyword entries;
-
17. A system comprising:
-
at least one computer memory, the at least one computer memory storing a computer program and a document retrieval index, the computer program configured to perform operations involving the document retrieval index when executed; and processing apparatus coupled to the at least one computer memory, the processing apparatus configured to execute the computer program, wherein when the computer program is executed by the processing apparatus the system is configured to organize the document retrieval index by type and keyword entries;
to organize the type entries by a type hierarchy comprising internal and leaf nodes;
to determine whether to generate an inverted list for particular types depending on the position of the types in the type hierarchy; and
to generate an inverted list for at least some of the types in the type hierarchy as a result of the determination. - View Dependent Claims (18, 19, 20)
-
Specification