Content propagation for enhanced document retrieval

US 7,305,389 B2
Filed: 04/15/2004
Issued: 12/04/2007
Est. Priority Date: 04/15/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method providing computer-implemented content propagation for enhanced document retrieval, the method comprising:

identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;

extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;

calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;

indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents,wherein the indexing generates one or more enhanced documents;

analyzing the one or more enhanced documents to locate relevance information based on a search query;

ranking the one or more enhanced documents based on relevance scores;

communicating ranked results and snippet descriptions for the one or more enhanced documents, based on the search query;

wherein the one or more sources of data comprise a search query log, and wherein calculating relevance further comprises;

identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;

determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and

determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected;

wherein determining missing end-user selection(s) further comprises clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods providing computer-implemented content propagation for enhanced document retrieval are described. In one aspect, reference information directed to one or more documents is identified. The reference information is identified from one or more sources of data that are independent of a data source that includes the one or more documents. Metadata that is proximally located to the reference information is extracted from the one or more sources of data. Relevance between respective features of the metadata to content of associated ones of the one or more documents is calculated. For each document of the one or more documents, associated portions of the metadata is indexed with the relevance of features from the respective portions into original content of the document. The indexing generates one or more enhanced documents.

128 Citations

View as Search Results

45 Claims

1. A method providing computer-implemented content propagation for enhanced document retrieval, the method comprising:
- identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;
  
  extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;
  
  calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;
  
  indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents,wherein the indexing generates one or more enhanced documents;
  
  analyzing the one or more enhanced documents to locate relevance information based on a search query;
  
  ranking the one or more enhanced documents based on relevance scores;
  
  communicating ranked results and snippet descriptions for the one or more enhanced documents, based on the search query;
  
  wherein the one or more sources of data comprise a search query log, and wherein calculating relevance further comprises;
  
  identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;
  
  determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and
  
  determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected;
  
  wherein determining missing end-user selection(s) further comprises clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as recited in claim 1, wherein the reference information comprises at least one of a link or substantially unique document ID associated with a document of the one or more documents.
  - 3. A method as recited in claim 1, wherein the one or more documents comprise at least one of knowledge base article(s), product help, task, or developer data.
  - 4. A method as recited in claim 1, wherein the one or more sources of data comprise at least one of service request(s), newsgroup posting(s), or search query log(s).
  - 5. A method as recited in claim 1, wherein the metadata comprises at least one of semantically or contextually related to associated ones of the one or more documents.
  - 6. A method as recited in claim 1, wherein the metadata comprises at least one of a title of a document, product problem context, or product problem resolution information.
  - 7. A method as recited in claim 1, wherein for each enhanced document of the one or more enhanced documents, there is a corresponding original document from which the enhanced document was generated.
  - 8. A method as recited in claim 1, wherein calculating the relevance is based on how many times a particular document of the one or more documents is identified within its context in the metadata.
  - 9. A method as recited in claim 1, wherein the metadata comprises at least one of article title(s), product problem context, or product problem resolution information, and wherein calculating relevance further comprises weighting the article title(s) or product problem context to indicate a greater relevance than any product problem resolution information.
  - 10. A method as recited in claim 1, wherein calculating relevance further comprises assigning greater relevance to feature(s) of the metadata that occur in content of the data source with greater frequency as compared to the frequency of occurrence of other metadata features in the content.
  - 11. A method as recited in claim 1, wherein calculating relevance further comprises assigning greater weight to feature(s) of the metadata found in a document of the one or more documents as a function of an age of the document.
  - 12. A method as recited in claim 1, wherein the features are represented with respective nodes in the first and second clusters, and wherein the importance measurement(s) for each of the nodes is based on a similarity function that measures a distance between objects in the first and second clusters.

13. A computer-readable storage medium comprising computer-executable instructions providing content propagation for enhanced document retrieval, the computer-executable instructions comprising instructions for:
- identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;
  
  extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;
  
  calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;
  
  indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents,wherein the indexing generates one or more enhanced documents;
  
  analyzing one or more enhanced documents to locate relevance information based on a search query;
  
  ranking one or more enhanced document retrieval based on relevance scores;
  
  communicating ranked results and snippet descriptions for enhanced document, retrieval, based on the search query;
  
  wherein the one or more sources of data comprise a search query log, and wherein the instructions for calculating relevance further comprise instructions for;
  
  identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;
  
  determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and
  
  determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected;
  
  wherein the instructions for determining missing end-user selection(s) further comprise instructions for clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The computer-readable storage medium of claim 13, wherein the reference information comprises at least one of a link or substantially unique document ID associated with a document of the one or more documents.
  - 15. The computer-readable storage medium of claim 13, wherein the one or more documents comprises at least one of knowledge base article(s), product help, task, or developer data.
  - 16. The computer-readable storage medium of claim 13, wherein the one or more sources of data comprise at least one of service request(s), newsgroup posting(s), or search query log(s).
  - 17. The computer-readable storage medium of claim 13, wherein the metadata is semantically or contextually related to associated ones of the one or more documents.
  - 18. The computer-readable storage medium of claim 13, wherein the metadata comprises at least one of a title of a document, product problem context, or product problem resolution information.
  - 19. The computer-readable storage medium of claim 13, wherein for each enhanced document of the one or more enhanced documents, there is a corresponding original document from which the enhanced document was generated.
  - 20. The computer-readable storage medium of claim 13, wherein calculating the relevance is based on how many times a particular document of the one or more documents is identified within its context in the metadata.
  - 21. The computer-readable storage medium of claim 13, wherein the metadata comprises at least one of article title(s), product problem context, or product problem resolution information, and wherein the instructions for calculating relevance further comprise instructions for weighting the article title(s) or product problem context to indicate a greater relevance than any product problem resolution information.
  - 22. The computer-readable storage medium of claim 13, wherein the instructions for calculating relevance further comprise instructions for assigning greater relevance to feature(s) of the metadata that occur in content of the data source with greater frequency as compared to the frequency of occurrence of other metadata features in the content.
  - 23. The computer-readable storage medium of claim 13, wherein the instructions for calculating relevance further comprise instructions for assigning greater weight to feature(s) of the metadata found in a document of the one or more documents as a function of an age of the document.
  - 24. The computer-readable storage medium of claim 13, wherein the features are represented with respective nodes in the first and second clusters, and wherein the importance measurement(s) for each of the nodes is based on a similarity function that measures a distance between objects in the first and second clusters.

25. A computing device providing content propagation for enhanced document retrieval, the computing device comprising:
- a processor; and
  
  a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for;
  
  identifying reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;
  
  extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;
  
  calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;
  
  indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents,wherein the indexing generates one or more enhanced documents;
  
  analyzing one or more enhanced documents to locate relevance information based on a search query;
  
  ranking one or more enhanced document retrieval based on relevance scores;
  
  communicating ranked results and snippet descriptions for enhanced document, retrieval, based on the search query;
  
  wherein the one or more sources of data comprise a search query log, and wherein the instructions for calculating relevance further comprise instructions for;
  
  identifying search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;
  
  determining article(s) selected by an end-user from search query results, the article(s) being from the data source; and
  
  determining missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected;
  
  wherein the instructions for determining missing end-user selection(s) further comprise instructions for clustering heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, thc similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The computing device of claim 25, wherein the reference information comprises at least one of a link or substantially unique document ID associated with a document of the one or more documents.
  - 27. The computing device of claim 25, wherein the one or more documents comprise at least one of knowledge base article(s), product help, task, or developer data.
  - 28. The computing device of claim 25, wherein the one or more sources of data comprise at least one of service request(s), newsgroup posting(s), or search query log(s).
  - 29. The computing device of claim 25, wherein the metadata is at least one of semantically or contextually related to associated ones of the one or more documents.
  - 30. The computing device of claim 25, wherein the metadata comprises at least one of a title of a document, product problem context, or product problem resolution information.
  - 31. The computing device of claim 25, wherein for each enhanced document of the one or more enhanced documents, there is a corresponding original document from which the enhanced document was generated.
  - 32. The computing device of claim 25, wherein calculating the relevance is based on how many times a particular document of the one or more documents is identified within its context in the metadata.
  - 33. The computing device of claim 25, wherein the metadata comprises at least one of article title(s), product problem context, or product problem resolution information, and wherein the instructions for calculating relevance further comprise instructions for weighting the article title(s) or product problem context to indicate a greater relevance than any product problem resolution information.
  - 34. The computing device of claim 25, wherein the instructions for calculating relevance further comprise instructions for assigning greater relevance to feature(s) of the metadata that occur in content of the data source with greater frequency as compared to the frequency of occurrence of other metadata features in the content.
  - 35. The computing device of claim 25, wherein the instructions for calculating relevance further comprise instructions for assigning greater weight to feature(s) of the metadata found in a document of the one or more documents as a function of an age of the document.
  - 36. The computing device of claim 25, wherein the features are represented with respective nodes in the first and second clusters, and wherein the importance measurement(s) for each of the nodes is based on a similarity function that measures a distance between objects in the first and second clusters.

37. A computing device providing content propagation for enhanced document retrieval, the computing device comprising:
- a processor; and
  
  a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for;
  
  identifying means to identify reference information directed to one or more documents, wherein the reference information identified from one or more sources of data, is independent from a data source comprising the one or more documents;
  
  extracting metadata that is proximally located to the reference information, which is surrounding the reference information and is semantically or contextually related to the reference information;
  
  calculating relevance between respective features of the metadata to content of associated ones of the one or more documents;
  
  indexing associated portions of the metadata with the relevance of features from the respective portions along with relevance scores, into original content of the document, for each document of the one or more documents;
  
  wherein the indexing generates one or more enhanced documents;
  
  analyzing means to analyze one or more enhanced documents to locate relevance information based on a search query;
  
  ranking means to rank one or more enhanced document retrieval based on relevance scores; and
  
  communicating means to communicate ranked results and snippet descriptions for enhanced document, retrieval, based on the search query;
  
  wherein the calculating means further comprise clustering means to cluster heterogeneous objects using inter-layer links to determine importance measurements for features of the heterogeneous objects, the heterogeneous object comprising a first cluster of similar queries and a second cluster of related documents, the similar queries having been identified in the search query log, the similar queries being associated search result(s) comprising the one or more documents, the related documents being identified in the search result(s) independent of whether individual ones of the related documents were selected by an end-user from the search results.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45)
- - 38. The computing device of claim 37, wherein the reference information comprises at least one of a link or substantially unique document ID associated with a document of the one or more documents.
  - 39. The computing device of claim 37, wherein the one or more documents comprise at least one of knowledge base article(s), product help, task, or developer data.
  - 40. The computing device of claim 37, wherein the one or more sources of data comprise at least one of service request(s), newsgroup posting(s), or search query log(s).
  - 41. The computing device of claim 37, wherein the metadata is semantically or contextually related to associated ones of the one or more documents.
  - 42. The computing device of claim 37, wherein the metadata comprises at least one of article title(s), product problem context, or product problem resolution information, and wherein the calculating means to calculate relevance further comprise weighting means to weight the article title(s) or product problem context to indicate a greater relevance than any product problem resolution information.
  - 43. The computing device of claim 37, wherein the calculating means to calculate relevance further comprise assigning means to assign greater relevance to feature(s) of the metadata that occur in content of the data source with greater frequency as compared to the frequency of occurrence of other metadata features in the content.
  - 44. The computing device of claim 37, wherein the calculating means to calculate relevance further comprise assigning means to assign greater weight to feature(s) of the metadata found in a document of the one or more documents as a function of an age of the document.
  - 45. The computing device of claim 37, wherein the one or more sources of data comprise a search query log, and wherein the calculating means to calculate relevance further comprise:
    - identifying means to identify search queries from the search query log, wherein the search queries have a relatively high frequency of occurrence (FOO) to search the data source;
      
      determining means to determine article(s) selected by an end-user from search query results, the article(s) being from the data source; and
      
      calculating means to calculate missing end-user selection(s), where a missing end-user selection is an article in the search query results that was not selected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Ma, Wei-Ying, Chen, Zheng, Zhang, Benyu, Zeng, Hua-Jun, Cook, Daniel B., Hon, Hsiao-Wuen, Hirschler, Gabor, Samuelson, Kurt, Fries, Karen
Primary Examiner(s)
FLEURANTIN, JEAN B

Application Number

US10/826,161
Publication Number

US 20050234952A1
Time in Patent Office

1,328 Days
Field of Search

707 1- 10, 707100-1041, 707200-206, 715/500, 715514-516, 715811-819, 715/854, 709/217, 709/219
US Class Current

707/721
CPC Class Codes

G06F 16/328   Management therefor

G06F 16/38   Retrieval characterised by ...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9538   Presentation of query results

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99942   Manipulating data structure...

Content propagation for enhanced document retrieval

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

128 Citations

45 Claims

Specification

Solutions

Use Cases

Quick Links

Content propagation for enhanced document retrieval

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

128 Citations

45 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links