Content propagation for enhanced document retrieval
First Claim
1. A method providing computer-implemented content propagation for enhanced document retrieval, the method comprising:
- identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;
extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;
calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;
indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents,wherein the indexing generates one or more enhanced documents;
analyzing the one or more enhanced documents to locate relevance information based on a search query;
ranking the one or more enhanced documents based on relevance scores;
communicating ranked results and snippet descriptions for the one or more enhanced documents, based on the search query;
wherein the one or more sources of data comprise a search query log, and wherein calculating relevance further comprises;
identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;
determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and
determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected;
wherein determining missing end-user selection(s) further comprises clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.
128 Citations
45 Claims
-
1. A method providing computer-implemented content propagation for enhanced document retrieval, the method comprising:
-
identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents; extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information; calculating relevance between respective features of the metadata to content of associated ones of the one or more documents; indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents, wherein the indexing generates one or more enhanced documents; analyzing the one or more enhanced documents to locate relevance information based on a search query; ranking the one or more enhanced documents based on relevance scores; communicating ranked results and snippet descriptions for the one or more enhanced documents, based on the search query; wherein the one or more sources of data comprise a search query log, and wherein calculating relevance further comprises; identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source; determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected; wherein determining missing end-user selection(s) further comprises clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-readable storage medium comprising computer-executable instructions providing content propagation for enhanced document retrieval, the computer-executable instructions comprising instructions for:
-
identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents; extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information; calculating relevance between respective features of the metadata to content of associated ones of the one or more documents; indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents, wherein the indexing generates one or more enhanced documents; analyzing one or more enhanced documents to locate relevance information based on a search query; ranking one or more enhanced document retrieval based on relevance scores; communicating ranked results and snippet descriptions for enhanced document, retrieval, based on the search query; wherein the one or more sources of data comprise a search query log, and wherein the instructions for calculating relevance further comprise instructions for; identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source; determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected; wherein the instructions for determining missing end-user selection(s) further comprise instructions for clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computing device providing content propagation for enhanced document retrieval, the computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for; identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents; extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information; calculating relevance between respective features of the metadata to content of associated ones of the one or more documents; indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents, wherein the indexing generates one or more enhanced documents; analyzing one or more enhanced documents to locate relevance information based on a search query; ranking one or more enhanced document retrieval based on relevance scores; communicating ranked results and snippet descriptions for enhanced document, retrieval, based on the search query; wherein the one or more sources of data comprise a search query log, and wherein the instructions for calculating relevance further comprise instructions for; identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source; determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected; wherein the instructions for determining missing end-user selection(s) further comprise instructions for clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, thc similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A computing device providing content propagation for enhanced document retrieval, the computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for; identifying means to identify reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents; extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information; calculating relevance between respective features of the metadata to content of associated ones of the one or more documents; indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents; wherein the indexing generates one or more enhanced documents; analyzing means to analyze one or more enhanced documents to locate relevance information based on a search query; ranking means to rank one or more enhanced document retrieval based on relevance scores; and communicating means to communicate ranked results and snippet descriptions for enhanced document, retrieval, based on the search query; wherein the calculating means further comprise clustering means to cluster heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45)
-
Specification