Generating search results based on user feedback

US 8,150,843 B2
Filed: 07/02/2009
Issued: 04/03/2012
Est. Priority Date: 07/02/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for processing search results, comprising:

receiving a search request against a corpus of documents, wherein the search request specifies one or more search terms;

generating an initial set of search results, wherein the initial set of search results identifies a plurality of documents responsive to the search request, ranked in an initial ordering, and wherein each of the plurality of documents contains text;

receiving user indication of;

(i) at least one document relevant to the one or more search terms and (ii) at least one document irrelevant to the one or more search terms;

by operation of one or more computer processors, training a new statistical classifier using each relevant document as a positive training example to form a first category of documents recognized by the statistical classifier and using each irrelevant document as a negative training example to form a second category of documents recognized by the statistical classifier, wherein the at least one relevant document and the at least one irrelevant document form a training set for the new statistical classifier;

supplying each document in the initial set of search results and not in the training set, to the trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;

supplying one or more documents from the corpus and not included in the set of initial search results, to the trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;

re-ranking the initial set of search results based on the measures of similarity obtained from the trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds a first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds a second user-configurable threshold; and

outputting the re-ranked search results for display to a user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods and articles of manufacture are disclosed for generating search results based on user feedback. A request may be received to generate search results retrieved using a search string. The request may include user feedback for one or more selected documents of the search results. Improved search results may be generated based on the search results and the feedback for one or more selected documents of the search results. The improved search results may be output to a graphical display device.

Citations

24 Claims

1. A computer-implemented method for processing search results, comprising:
- receiving a search request against a corpus of documents, wherein the search request specifies one or more search terms;
  
  generating an initial set of search results, wherein the initial set of search results identifies a plurality of documents responsive to the search request, ranked in an initial ordering, and wherein each of the plurality of documents contains text;
  
  receiving user indication of;
  
  (i) at least one document relevant to the one or more search terms and (ii) at least one document irrelevant to the one or more search terms;
  
  by operation of one or more computer processors, training a new statistical classifier using each relevant document as a positive training example to form a first category of documents recognized by the statistical classifier and using each irrelevant document as a negative training example to form a second category of documents recognized by the statistical classifier, wherein the at least one relevant document and the at least one irrelevant document form a training set for the new statistical classifier;
  
  supplying each document in the initial set of search results and not in the training set, to the trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
  
  supplying one or more documents from the corpus and not included in the set of initial search results, to the trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
  
  re-ranking the initial set of search results based on the measures of similarity obtained from the trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds a first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds a second user-configurable threshold; and
  
  outputting the re-ranked search results for display to a user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the selection of one or more documents from the plurality of documents is received in response to monitoring what documents referenced in the initial search results are accessed by the user and prompting the user to indicate whether each accessed document is relevant to what the user is searching for.
  - 3. The computer-implemented method of claim 1, wherein the statistical classifier is one of a naï
    - ve Bayes classifier, a linear classifier, a latent semantic indexing classifier and an artificial neural network.
  - 4. The computer-implemented method of claim 1, wherein the documents with similarity scores within a user-configurable percentile are re-ranked at the top of the search results.
  - 5. The computer-implemented method of claim 4, wherein the number of documents that are re-ranked in the search results is limited by a user-configurable maximum count.
  - 6. The computer-implemented method of claim 5, wherein each document in the re-ranked search results is tagged with the one or more search terms, and wherein the tags are stored for use in initially ordering search results responsive to subsequent search requests.
  - 7. The computer-implemented method of claim 6, wherein the user indication of each document comprises a user-specified numerical score indicating an extent to which the user finds the respective document to be relevant to the one or more search terms, and wherein the new statistical classifier is trained using the user-specified numerical scores.
  - 8. The computer-implemented method of claim 7, further comprising:
    - receiving user indication of, from the re-ranked search results;
      
      (i) one or more documents relevant to the one or more search terms and (ii) one or more documents irrelevant to the one or more search terms;
      
      using the one or more relevant documents and the one or more irrelevant documents to further train the new statistical classifier to recognize the first category of documents and the second category of documents, respectively, wherein the training set, the one or more relevant documents, and the one or more irrelevant documents together form an augmented training set;
      
      supplying each document in the re-ranked search results and not in the augmented training set, to the further-trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
      
      supplying one or more documents from the corpus and not included in the set of re-ranked search results, to the further-trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
      
      ranking yet again the re-ranked search results based on the measures of similarity obtained from the further-trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds the first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds the second user-configurable threshold; and
      
      outputting the yet-again-ranked search results for display to the user.

9. A computer program product, the computer program product comprising a computer usable storage medium having computer usable program code for processing search results, the code being configured for:
- receiving a search request against a corpus of documents, wherein the search request specifies one or more search terms;
  
  generating an initial set of search results, wherein the initial set of search results identifies a plurality of documents responsive to the search request, ranked in an initial ordering, and wherein each of the plurality of documents contains text;
  
  receiving user indication of;
  
  (i) at least one document relevant to the one or more search terms and (ii) at least one document irrelevant to the one or more search terms;
  
  by operation of one or more computer processors when executing the computer usable program code, training a new statistical classifier using each relevant document as a positive training example to form a first category of documents recognized by the statistical classifier and using each irrelevant document as a negative training example to form a second category of documents recognized by the statistical classifier, wherein the at least one relevant document and the at least one irrelevant document form a training set for the new statistical classifier;
  
  supplying each document in the initial set of search results and not in the training set, to the trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
  
  supplying one or more documents from the corpus and not included in the set of initial search results, to the trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
  
  re-ranking the initial set of search results based on the measures of similarity obtained from the trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds a first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds a second user-configurable threshold; and
  
  outputting the re-ranked search results for display to a user.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer program product of claim 9, wherein the selection of one or more documents from the plurality of documents is received in response to monitoring what documents referenced in the initial search results are accessed by the user and prompting the user to indicate whether each accessed document is relevant to what the user is searching for.
  - 11. The computer program product of claim 9, wherein the statistical classifier is one of a naï
    - ve Bayes classifier, a linear classifier, a latent semantic indexing classifier and an artificial neural network.
  - 12. The computer program product of claim 9, wherein the documents with similarity scores within a user-configurable percentile are re-ranked at the top of the search results.
  - 13. The computer program product of claim 12, wherein the number of documents that are re-ranked in the search results is limited by a user-configurable maximum count.
  - 14. The computer program product of claim 13, wherein each document in the re-ranked search results is tagged with the one or more search terms, and wherein the tags are stored for use in initially ordering search results responsive to subsequent search requests.
  - 15. The computer program product of claim 14, wherein the user indication of each document comprises a user-specified numerical score indicating an extent to which the user finds the respective document to be relevant to the one or more search terms, and wherein the new statistical classifier is trained using the user-specified numerical scores.
  - 16. The computer program product of claim 15, wherein the code is further configured for:
    - receiving user indication of, from the re-ranked search results;
      
      (i) one or more documents relevant to the one or more search terms and (ii) one or more documents irrelevant to the one or more search terms;
      
      using the one or more relevant documents and the one or more irrelevant documents to further train the new statistical classifier to recognize the first category of documents and the second category of documents, respectively, wherein the training set, the one or more relevant documents, and the one or more irrelevant documents together form an augmented training set;
      
      supplying each document in the re-ranked search results and not in the augmented training set, to the further-trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
      
      supplying one or more documents from the corpus and not included in the set of re-ranked search results, to the further-trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
      
      ranking yet again the re-ranked search results based on the measures of similarity obtained from the further-trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds the first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds the second user-configurable threshold; and
      
      outputting the yet-again-ranked search results for display to the user.

17. A system, comprising:
- one or more computer processors; and
  
  a memory containing an application program configured for processing search results, which, when executed by the one or more computer processors is configured to perform an operation comprising;
  
  receiving a search request against a corpus of documents, wherein the search request specifies one or more search terms;
  
  generating an initial set of search results, wherein the initial set of search results identifies a plurality of documents responsive to the search request, ranked in an initial ordering, and wherein each of the plurality of documents contains text;
  
  receiving user indication of;
  
  (i) at least one document relevant to the one or more search terms and (ii) at least one document irrelevant to the one or more search terms;
  
  training a new statistical classifier using each relevant document as a positive training example to form a first category of documents recognized by the statistical classifier and using each irrelevant document as a negative training example to form a second category of documents recognized by the statistical classifier, wherein the at least one relevant document and the at least one irrelevant document form a training set for the new statistical classifier;
  
  supplying each document in the initial set of search results and not in the training set, to the trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
  
  supplying one or more documents from the corpus and not included in the set of initial search results, to the trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
  
  re-ranking the initial set of search results based on the measures of similarity obtained from the trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds a first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds a second user-configurable threshold; and
  
  outputting the re-ranked search results for display to a user.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The system of claim 17, wherein the selection of one or more documents from the plurality of documents is received in response to monitoring what documents referenced in the initial search results are accessed by the user and prompting the user to indicate whether each accessed document is relevant to what the user is searching for.
  - 19. The system of claim 17, wherein the statistical classifier is one of a naï
    - ve Bayes classifier, a linear classifier, a latent semantic indexing classifier and an artificial neural network.
  - 20. The system of claim 17, wherein the documents with similarity scores within a user-configurable percentile are re-ranked at the top of the search results.
  - 21. The system of claim 20, wherein the number of documents that are re-ranked in the search results is limited by a user-configurable maximum count.
  - 22. The system of claim 21, wherein each document in the re-ranked search results is tagged with the one or more search terms, and wherein the tags are stored for use in initially ordering search results responsive to subsequent search requests.
  - 23. The system of claim 22, wherein the user indication of each document comprises a user-specified numerical score indicating an extent to which the user finds the respective document to be relevant to the one or more search terms, and wherein the new statistical classifier is trained using the user-specified numerical scores.
  - 24. The system of claim 23, wherein the operation further comprises:
    - receiving user indication of, from the re-ranked search results;
      
      (i) one or more documents relevant to the one or more search terms and (ii) one or more documents irrelevant to the one or more search terms;
      
      using the one or more relevant documents and the one or more irrelevant documents to further train the new statistical classifier to recognize the first category of documents and the second category of documents, respectively, wherein the training set, the one or more relevant documents, and the one or more irrelevant documents together form an augmented training set;
      
      supplying each document in the re-ranked search results and not in the augmented training set, to the further-trained statistical classifier to obtain a measure of similarity between the respective document and at least one of the categories recognized by the trained statistical classifier;
      
      supplying one or more documents from the corpus and not included in the set of re-ranked search results, to the further-trained statistical classifier to obtain a measure of similarity between each of the one or more documents and at least one of the categories recognized by the trained statistical classifier;
      
      ranking yet again the re-ranked search results based on the measures of similarity obtained from the further-trained statistical classifier, comprising ranking each document having a measure of similarity to the first category of documents that exceeds the first user-configurable threshold, ahead of each document having a measure of similarity to the second category of documents that exceeds the second user-configurable threshold; and
      
      outputting the yet-again-ranked search results for display to the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Holt, Alexander Wolcott, Moran, Michael E., Chitiveli, Srinivas Varma, Emanuel, Barton Wayne
Primary Examiner(s)
Truong, Cam-Y
Assistant Examiner(s)
VO, CECILE H

Application Number

US12/497,463
Publication Number

US 20110004609A1
Time in Patent Office

1,006 Days
Field of Search

707/5, 707/706, 707/711, 707722-723, 707/728, 707/748, 707/759, 707/765, 707/769
US Class Current

707/723
CPC Class Codes

G06F 16/3326   using relevance feedback fr...

G06F 16/353   into predefined classes

G06F 16/9535   Search customisation based ...

G06Q 10/10   Office automation; Time man...

Generating search results based on user feedback

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Generating search results based on user feedback

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links