Utilizing information redundancy to improve text searches
First Claim
1. A machine implemented system that facilitates data retrieval, comprising:
- a query component that receives a query to a first dataset, anda projection component that executes the query across a second dataset, and analyzes properties of results of the query on the first dataset and results of the second dataset to generate a refined version of the query to run on the first dataset to facilitate responding to the query across the first dataset, the projection component analyzes the properties of the results by determining a similarity measure that is a cosine distance for each result.
2 Assignments
0 Petitions
Accused Products
Abstract
Architecture for improving text searches using information redundancy. A search component is coupled with an analysis component to rerank documents returned in a search according to a redundancy values. Each returned document is used to develop a corresponding word probability distribution that is further used to rerank the returned documents according to the associated redundancy values. In another aspect thereof, the query component is coupled with a projection component to project answer redundancy from one document search to another. This includes obtaining the benefit of considerable answer redundancy from a second data source by projecting the success of the search of the second data source against a first data source.
-
Citations
61 Claims
-
1. A machine implemented system that facilitates data retrieval, comprising:
-
a query component that receives a query to a first dataset, and a projection component that executes the query across a second dataset, and analyzes properties of results of the query on the first dataset and results of the second dataset to generate a refined version of the query to run on the first dataset to facilitate responding to the query across the first dataset, the projection component analyzes the properties of the results by determining a similarity measure that is a cosine distance for each result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A machine implemented system that facilitates data retrieval, comprising:
-
a query component that receives a query to a first dataset; and a projection component that executes the query across a second dataset, and generates a result set that is employed in connection with the first dataset to facilitate responding to the query, the projection component analyzes the properties of the result set by determining a similarity measure that is a cosine distance for each result. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A machine implemented system that facilitates data retrieval, comprising:
-
a search component that executes a query and returns a dataset; and an analysis component that determines relevance of a subset of the returned dataset as a function of similarity properties thereof with respect to the entire returned dataset, the similarity properties determined according to a similarity measure that is a cosine distance measure. - View Dependent Claims (38, 39, 40, 41, 42, 43)
-
-
44. A machine implemented method of facilitating data retrieval, comprising:
-
receiving a query for processing by a search engine against a first dataset; executing the query against the first dataset and a second dataset; analyzing properties of results of the first dataset query and of results of the second dataset query to determine a refined version of the query by determining a similarity measure that is a cosine distance for each result; transmitting the refined version of the query to the search engine; and reranking the results of the first dataset query according to the refined version of the query. - View Dependent Claims (45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A machine implemented method of facilitating data retrieval, comprising:
-
receiving a query for processing by a search engine against a first dataset; executing the query against the first dataset and a second dataset, the query against the second dataset used to characterize likely properties of a good answer to the query; generating a result set from the query of the second dataset by determining the average pairwise information redundancy between results of the first dataset query and the second dataset query, the average pairwise information redundancy is based at least upon a cosine distance measurement for each result; applying the result set in a subsequent query of the first dataset; and providing a ranked output according to the result set query. - View Dependent Claims (55, 56, 57)
-
-
58. A machine implemented method of facilitating data retrieval, comprising:
-
processing a query against a plurality of documents; measuring information redundancy of a returned document of a return set by determining an average pairwise information redundancy value between the returned document and the remaining documents of the return set, the average pairwise information redundancy is based at least upon a cosine distance measurement for each document; and providing a ranked output of documents according to corresponding pairwise information redundancy values. - View Dependent Claims (59)
-
-
60. A machine implemented system that facilitates data retrieval, comprising:
-
means for processing a query against a plurality of documents; means for measuring information redundancy of a returned document of a return set by determining an average pairwise information redundancy value between the returned document and the remaining documents of the return set, the average pairwise information redundancy is based at least upon a cosine distance measurement for each document; and means for providing a ranked output of documents according to corresponding pairwise information redundancy values.
-
-
61. A machine implemented system that facilitates data retrieval, comprising:
-
means for receiving a query for processing by a search engine against a first dataset; means for executing the query against the first dataset and a second dataset, the query against the second dataset used to characterize likely properties of a good answer to the query; means for generating a result set from the query of the second dataset by determining the average pairwise information redundancy between results of the first dataset query and the second dataset query, the average pairwise information redundancy is based at least upon a cosine distance measurement for each result; means for applying the result set in a subsequent query of the first dataset; and means for providing a ranked output according to the result set query.
-
Specification