Method and apparatus for retrieving documents based on information other than document content
First Claim
1. A method of selecting electronic documents from among a plurality of electronic documents, the method comprising the steps of:
- storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for retrieving documents from a collection of documents based on information other than the contents of a desired document. The collection of documents, which may be a hypertext system or documents available via the World Wide Web, is indexed. In one embodiment, an indexing process of a search engine receives one or more specifications that identify documents, or document locations, and non-content information such as a tag word or code word. The indexing process searches the index to identify all documents in the index that match one or more of the specifications. If a match is found, the tag word is added to the index, and information about the matching document is stored in the index in association with the tag word. A search query is submitted to the search engine. The search query is automatically modified to add a reference to the tag word, such as a query term that will exclude any index entry for a document associated with the tag word. The search is executed against the index, and a set of search results is generated. Accordingly, the search results automatically exclude all documents associated with the tag word. These techniques may be used, for example, to implement a Web search service that produces more accurate search results or that prevents certain documents, such as pornographic materials, from appearing in search results.
-
Citations
18 Claims
-
1. A method of selecting electronic documents from among a plurality of electronic documents, the method comprising the steps of:
-
storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
2. A method of selecting electronic documents from among a plurality of electronic documents, the method comprising the steps of:
-
storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with one or more tag words, and in which one of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
3. A method of processing queries that select an electronic document from among a plurality of documents, the method comprising the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing further includes the steps of;
receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with the tag word, and in which each of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with the tag word.
-
-
4. A method of processing queries that select an electronic document from among a plurality of documents, the method comprising the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with the tag word, and in which one of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with the tag words.
-
-
5. A method of processing queries that select an electronic document from among a plurality of documents, the method comprising the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildeard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
6. A method of constructing an index of a plurality of electronic documents for use in selecting electronic documents from among the plurality of electronic documents, the method comprising the steps of:
-
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, wherein the tag words do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents;
storing, in the index, information associating each of the one or more tag words with the one document when the one document satisfies the criteria associated with the tag words;
wherein the step of receiving data includes the steps of receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format; and
wherein the step of storing information comprises the steps of retrieving a location identifier of each of the documents;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
7. A method of constructing an index of a plurality of electronic documents for use in selecting electronic documents from among the plurality of electronic documents, the method comprising the steps of:
-
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, wherein the tag words do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents;
storing, in the index, information associating each of the one or more tag words with the one document when the one document satisfies the criteria associated with the tag words;
wherein the step of receiving data includes the steps of receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with one or more tag words, and in which one of the specifications is expressed in a wildcard format;
and wherein the step of storing information comprises the steps of;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildeard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
8. A method of constructing an index of a plurality of electronic documents for use in selecting electronic documents from among the plurality of electronic documents, the method comprising the steps of:
-
receiving data that indicates one or more document property values and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more document property values, wherein the document property values do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents; and
storing, in the index, information associating each of the one or more document property values with the one document when the one document satisfies the criteria associated with the document property values.
-
-
9. A method of selecting electronic documents from among a plurality of electronic documents, the method comprising the steps of:
-
storing a document property value in an index in association with information identifying an electronic document, in which the document property value comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the document property value; and
creating a set of search results by searching the index based on the modified search query.
-
-
10. A computer-readable medium carrying instructions for selecting electronic documents from among a plurality of electronic documents, the computer-readable medium comprising instructions for performing the steps of:
-
storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying, the search query to create a modified search query by adding to the search query a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
11. A computer-readable medium carrying instructions for selecting electronic documents from among a plurality of electronic documents, the computer-readable medium comprising instructions for performing the steps of:
-
storing a tag word in an index in association with information identifying an electronic document, in which the tag word comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with one or more tag words, and in which one of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
12. A computer-readable medium carrying instructions for processing queries that select an electronic document from among a plurality of documents, the computer-readable medium carrying instructions for performing the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing further includes the steps of;
receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with the tag word, and in which each of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with the tag word.
-
-
13. A computer-readable medium carrying instructions for processing queries that select an electronic document from among a plurality of documents, the computer-readable medium comprising instructions for performing the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with the tag word, and in which one of the specifications is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with the tag words.
-
-
14. A computer-readable medium carrying instruction for processing queries that select an electronic document from among a plurality of documents, the computer-readable medium comprising instructions for performing the steps of:
-
storing a tag word in an index in association with information identifying the electronic document, in which the tag word indicates that access to the electronic document is restricted;
receiving a search query that requests the electronic document;
modifying the search query to create a modified search query by adding a search term that references the tag word; and
creating a set of search results by searching the index based on the modified search query;
wherein the step of storing includes the steps of;
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
15. A computer-readable medium carrying instructions for constructing an index of a plurality of electronic documents for use in selecting electronic documents from among the plurality of electronic documents, the computer-readable medium comprising instructions for performing the steps of:
-
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, wherein the tag words do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents;
storing, in the index, information associating each of the one or more tag words with the one document when the one document satisfies the criteria associated with the tag words;
wherein the step of receiving data includes the steps of receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, and in which at least a portion of the data is expressed in a wildcard format;
and wherein the step of storing information comprises the steps of retrieving a location identifier of each of the documents;
matching each location identifier to each of the criteria; and
when one location identifier matches one of the criteria, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
16. A computer-readable medium carrying instructions for constructing an index of a plurality of electronic documents for use in selecting electronic documents from among the plurality of electronic documents, the computer-readable medium carrying instructions for performing the steps of:
-
receiving data that indicates one or more tag words and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more tag words, wherein the tag words do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents;
storing, in the index, information associating each of the one or more tag words with the one document when the one document satisfies the criteria associated with the tag words;
wherein the step of receiving data includes the steps of receiving specifications of one or more of the documents that are indexed in the index, in which each of the specifications is associated with one or more tag words, and in which one of the specifications is expressed in a wildcard format;
and wherein the step of storing information comprises the steps of;
retrieving a location identifier of each of the documents that are indexed in the index;
matching each location identifier to each of the specifications by interpreting the one of the specifications that is in the wildcard format according to one or more wildcard format rules; and
when one location identifier matches one of the specifications, storing, in the index, information associating such location identifier with one or more of the tag words.
-
-
17. A computer-readable medium carrying instructions for constructing an index of a plurality of electronic documents for use in selecting electronic document from among the plurality of electronic documents, the computer-readable medium comprising instructions for performing the steps of:
-
receiving data that indicates one or more document property values and criteria to be used to determine which of the plurality of documents should be associated with each of the one or more document property values, wherein the document property values do not appear in a content of the electronic documents;
storing a list of words that are within one document of the plurality of documents; and
storing, in the index, information associating each of the one or more document property values with the one document when the one document satisfies the criteria associated with the document property values.
-
-
18. A computer-readable medium carrying instructions for selecting electronic documents from among a plurality of electronic documents, the computer-readable medium comprising instructions for performing the steps of:
-
storing a document property value in an index in association with information identifying an electronic document, in which the document property value comprises data that does not appear in a content of the electronic document;
receiving a search query;
modifying the search query to create a modified search query by adding to the search query a search term that references the document property value; and
creating a set of search results by searching the index based on the modified search query.
-
Specification