Utilizing information redundancy to improve text searches

US 20060116996A1
Filed: 01/20/2006
Published: 06/01/2006
Est. Priority Date: 06/18/2003
Status: Active Grant

First Claim

Patent Images

1. A system that facilitates data retrieval, comprising:

a query component that executes a query to a first dataset; and

a projection component that executes the query across a second dataset, and analyzes properties of results of the query on the first dataset and results of the second dataset to generate a ranked result set of the query to the first dataset.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.

Citations

20 Claims

1. A system that facilitates data retrieval, comprising:
- a query component that executes a query to a first dataset; and
  
  a projection component that executes the query across a second dataset, and analyzes properties of results of the query on the first dataset and results of the second dataset to generate a ranked result set of the query to the first dataset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, the projection component executes the query across the second dataset in response to determining that projection is required on the first dataset.
  - 3. The system of claim 1, the projection component automatically executes the query across the second dataset substantially simultaneously with execution of the query across the first dataset.
  - 4. The system of claim 1, the second dataset having higher redundancy than the first dataset.
  - 5. The system of claim 1, the projection component analyzes the properties of the results by generating word probability distributions for each result.
  - 6. The system of claim 1, the projection component analyzes the properties of the results by determining a similarity measure for each result.
  - 7. The system of claim 6, the similarity property is a cosine distance.
  - 8. The system of claim 1, the projection component evaluates the results of the second dataset for pairwise information redundancy with the results of the first dataset.
  - 9. The system of claim 1, the projection component determines the average pairwise redundancy of the results of the second dataset with the results of the first dataset.
  - 10. The system of claim 1, the properties of the results related to at least one of textual content, image content, and audio content.

11. A method of facilitating data retrieval, comprising:
- receiving a query for processing by a search engine against a first dataset;
  
  executing the query against the first dataset and a second dataset;
  
  analyzing properties of results of the second dataset query against results of the first dataset query to determine information redundancy;
  
  reranking the results of the first dataset query according to the information redundancy.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, the first dataset having lower data redundancy than the second dataset.
  - 13. The method of claim 11, the query executed against the first and second dataset substantially simultaneously.
  - 14. The method of claim 11, the query executed against the second dataset only in response to the execution of the query against the first dataset returning a minimum number of results.
  - 15. The method of claim 11, the properties analyzed by:
    - generating word probability distributions for each of the results; and
      
      determining an average pairwise information redundancy value of the results of the second dataset with the results of the first dataset using a similarity measure.
  - 16. The method of claim 15, the similarity measure being at least one of a cosine distance measure, a Jaccard coefficient measure, a weighted Jaccard coefficient measure, and a weighted mutual information measure.
  - 17. The method of claim 11, employing a subset of the results of the second dataset query against the results of the first dataset query to determine information redundancy.

18. The method of claim 18, the subset of the results of the second dataset query is determined based upon at least one of selecting the first one hundred results, selecting results based upon the success of the returned document including more than one of the search terms, selecting results based upon the inclusion of at least two key search terms of multiple search terms, selecting results based upon including a string of search terms in the required sequence, selecting results based upon including the search terms within a required spatial parameter, selecting results based upon properties of at least one of image content and audio contained therein, and selecting results based upon at least one of the number and type of hyperlinks to other websites.

19. A method of facilitating data retrieval, comprising:
- processing a query against a plurality of documents;
  
  measuring information redundancy of a returned document of a return set by determining an average pairwise information redundancy value between the returned document and the remaining documents of the return set; and
  
  providing a ranked output of documents according to corresponding pairwise information redundancy values.
- View Dependent Claims (20)
- - 20. The method of claim 19, further comprising selecting the documents associated with the higher average pairwise information redundancy values for the ranked output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Dumais, Susan T., Brill, Eric D.

Granted Patent

US 7,152,057 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/3347   using vector based model

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99935   Query augmenting and refini...

Utilizing information redundancy to improve text searches

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Utilizing information redundancy to improve text searches

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links