Object oriented information retrieval framework mechanism
First Claim
1. A computer system comprising:
- a central processing unit;
a user interface; and
a main memory having an operating system that supports an object oriented programming environment containing a framework that provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in one or more documents stored in the computer system, such that the documents relevant to the user query will be identified, thereby providing a query result, wherein the framework includes;
at least one object oriented index class object that includes at least one word index object that maps each word contained in a document to the document containing the word by performing a preprocessing operation on the document to generate the at least one word index object; and
at least one object oriented query index object that processes a user query so as to produce a query result from comparison of the user query and the word index objects in response to a user query.
1 Assignment
0 Petitions
Accused Products
Abstract
A framework for use with object-oriented programming systems provides a reusable object oriented (OO) framework for use with object oriented programming systems that provides an information retrieval (IR) shell that permits a framework user to define an index class that includes word index objects and provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in the word index objects that relates to stored documents. The information in word index objects is produced by preprocessing operations on documents such that the documents relevant to the user query will be identified, thereby providing a query result. The information retrieval system user can load documents into the computer system storage, index documents so their information can be subject to a query search, and request query evaluation to identify and retrieve documents most closely related to the subject matter of a user query.
102 Citations
61 Claims
-
1. A computer system comprising:
-
a central processing unit;
a user interface; and
a main memory having an operating system that supports an object oriented programming environment containing a framework that provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in one or more documents stored in the computer system, such that the documents relevant to the user query will be identified, thereby providing a query result, wherein the framework includes;
at least one object oriented index class object that includes at least one word index object that maps each word contained in a document to the document containing the word by performing a preprocessing operation on the document to generate the at least one word index object; and
at least one object oriented query index object that processes a user query so as to produce a query result from comparison of the user query and the word index objects in response to a user query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An object oriented framework for use in a computer system having an operating system that supports an object oriented programming environment that defines an index class that includes word index objects and provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in the word index objects relating to documents stored in the computer system, wherein the information in each word index object is produced by a preprocessing operation on a corresponding document, such that the documents relevant to the user query will be identified, thereby providing a query result, wherein the framework includes:
-
at least one object oriented index class object that includes at least one word index object that maps each word contained in a document to the document containing the word by performing a preprocessing operation on the document to generate the at least one word index object; and
at least one object oriented query index object that processes a user query to produce a query result from comparison of the user query and the word index objects in response to a user query. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A program product for use in a computer system having an operating system that supports an object-oriented programming environment, the program product comprising:
-
a signal bearing media; and
a framework recorded on the signal bearing media, the framework providing an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in object oriented index class objects that relate to one or more documents stored in the computer system, such that the documents relevant to the user query will be identified, thereby providing a query result, wherein the framework includes;
at least one object oriented index class object that includes at least one word index object that maps each word contained in a document to the document containing the word by performing a preprocessing operation on the document to generate the at least one word index object; and
at least one object oriented query index object that processes a user query to produce a query result from comparison of the user query and the word index objects in response to a user query. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method of executing an application program in a computer system having a central processing unit that controls processing in the computer system, a user interface, and a main memory having an operating system that supports an object oriented programming environment, the method comprising the steps of:
-
providing an object oriented framework that provides an extensible information retrieval system, the framework including at least one core function that cannot be modified by a user and at least one extensible function defined by a user to customize the framework and thereby define a desired information retrieval system, the framework including;
at least one object oriented index class object that includes at least one word index object that maps each word contained in a document to the document containing the word by performing a preprocessing operation on the document to generate the at least one word index object; and
at least one object oriented query index object that processes a user query to produce a query result from comparison of the user query and the word index objects in response to a user query; and
evaluating a user query by comparing information contained in the user query with information contained in the at least one index class object that relates to one or more documents stored in the computer system, such that the documents relevant to the user query will be identified, thereby providing a query result. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
a user-extensible object oriented framework residing in the memory, the framework including at least one core function that cannot be modified by a user and at least one extensible function defined by a user to customize the framework and thereby define a desired information retrieval system, the framework including;
a load document processor for loading and preprocessing a plurality of documents, the load document processor comprising;
a parser for separating text words in at least one of the plurality of documents from other text characters to generate a parsed word list;
a stoplist processor for deleting from the parsed word list any words that are defined in a stoplist collection to be too common to be useful in searching; and
a stemmer processor for mapping a plurality of words with a common stem to the common stem;
an index processor for creating at least one word index corresponding to the plurality of documents; and
a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result.
-
-
48. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
a user-extensible object oriented framework residing in the memory, the framework including at least one core function that cannot be modified by a user and at least one extensible function defined by a user to customize the framework and thereby define a desired information retrieval system, the framework including;
a load document processor for loading and preprocessing a plurality of documents;
an index processor for creating at least one word index corresponding to the plurality of documents, the index processor comprising;
a word index mechanism for loading a word index with words from the plurality of documents;
a posting list mechanism that contains a list of the plurality of documents and corresponding counts for words and word stems in each of the plurality of documents; and
a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result.
-
-
49. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
a user-extensible object oriented framework residing in the memory, the framework including at least one core function that cannot be modified by a user and at least one extensible function defined by a user to customize the framework and thereby define a desired information retrieval system, the framework including;
a load document processor for loading and preprocessing a plurality of documents;
an index processor for creating at least one word index corresponding to the plurality of documents; and
a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result, the query processor comprising;
a parser for separating text words in the query from other text characters in the query;
a stoplist processor for deleting from the text words in the parsed query any words that are defined in a stoplist collection to be too common to be useful in searching; and
a stemmer processor for mapping a plurality of the text words in the query with a common stem to the common stem.
-
-
50. A program product comprising:
-
(A) a user-extensible object oriented framework mechanism comprising;
(1) a load document processor for loading and preprocessing a plurality of documents, the load document processor comprising;
(a) a parser for separating text words in at least one of the plurality of documents from other text characters to generate a parsed word list;
(b) a stoplist processor for deleting from the parsed word list any words that are defined in a stoplist collection to be too common to be useful in searching; and
(c) a stemmer processor for mapping a plurality of words with a common stem to the common stem;
(2) an index processor for creating at least one word index corresponding to the plurality of documents; and
(3) a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result; and
(B) signal bearing media bearing the framework mechanism. - View Dependent Claims (51, 52)
-
-
53. A program product comprising:
-
(A) a user-extensible object oriented framework mechanism comprising;
(1) a load document processor for loading and preprocessing a plurality of documents;
(2) an index processor for creating at least one word index corresponding to the plurality of documents, the index processor comprising;
(a) a word index mechanism for loading a word index with words from the plurality of documents;
(b) a posting list mechanism that contains a list of the plurality of documents and corresponding counts for words and word stems in each of the plurality of documents; and
(3) a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result; and
(B) signal bearing media bearing the framework mechanism. - View Dependent Claims (54, 55)
-
-
56. A program product comprising:
-
(A) a user-extensible object oriented framework mechanism comprising;
(1) a load document processor for loading and preprocessing a plurality of documents;
(2) an index processor for creating at least one word index corresponding to the plurality of documents; and
(3) a query processor for receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result, the query processor comprising;
(a) a parser for separating text words in the query from other text characters in the query;
(b) a stoplist processor for deleting from the text words in the parsed query any words that are defined in a stoplist collection to be too common to be useful in searching; and
(c) a stemmer processor for mapping a plurality of the text words in the query with a common stem to the common stem; and
(B) signal bearing media bearing the framework mechanism. - View Dependent Claims (57, 58)
-
-
59. A method of retrieving information from a plurality of documents comprising the step of:
-
providing a user-extensible object oriented framework mechanism, the framework mechanism performing the steps of;
(A) loading and preprocessing a plurality of documents, the step of loading and preprocessing the plurality of documents includes the steps of;
separating text words in at least one of the plurality of documents from other text characters to generate a parsed word list;
deleting from the parsed word list any words that are defined in a stoplist collection to be too common to be useful in searching; and
mapping a plurality of words with a common stem to the common stem;
(B) creating at least one word index corresponding to the plurality of documents; and
(C) receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result.
-
-
60. A method of retrieving information from a plurality of documents comprising the step of:
-
providing a user-extensible object oriented framework mechanism, the framework mechanism performing the steps of;
(A) loading and preprocessing a plurality of documents;
(B) creating at least one word index corresponding to the plurality of documents, the step of creating the at least one word index corresponding to the plurality of documents includes the steps of;
loading a word index with words from the plurality of documents;
generating a posting list mechanism that contains a list of the plurality of documents and corresponding counts for words and word stems in each of the plurality of documents; and
(C) receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result.
-
-
61. A method of retrieving information from a plurality of documents comprising the step of:
-
providing a user-extensible object oriented framework mechanism, the framework mechanism performing the steps of;
(A) loading and preprocessing a plurality of documents;
(B) creating at least one word index corresponding to the plurality of documents; and
(C) receiving a query and determining if any of the plurality of documents match the query by processing the query and comparing the processed query to the plurality of words in the at least one word index, thereby providing a query result, the step of receiving a query and determining if any of the plurality of documents match the query includes the steps of;
separating text words in the query from other text characters in the query;
deleting from the text words in the parsed query any words that are defined in a stoplist collection to be too common to be useful in searching; and
mapping a plurality of the text words in the query with a common stem to the common stem.
-
Specification