Enhanced document retrieval
First Claim
1. A method implemented by a computing device for enhanced document retrieval, the method comprising:
- receiving a search query from an end-user;
responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including terms) of the search query;
wherein the enhanced document is derived from a base document, the base document having been modified with metadata mined from one or more different documents, the metadata being associated with one or more respective references to the base document, the metadata including one or more of a title of a document, product problem context, and a product problem resolution information, the one or more different documents being independent of the base document;
calculating term proximity to determine relevance of the enhanced document as follows;
wherein α
, β
are parameters configured to control relative weight of each part of the search query, Hit represents a percentage of the terms in a document in a database over all terms, the database comprising the one or more documents, and EditDistance represents a misorder between the search query and the document; and
returning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for enhanced document retrieval are described. In one aspect, a search query from an end-user is received. Responsive to receiving the search query, search results are retrieved. The search results include an enhanced document and a set of non-enhanced documents. The enhanced document and the non-enhanced documents include term(s) of the search query. The enhanced document is derived from a base document. The base document was modified with metadata mined from one or more different documents. The metadata is associated with one or more respective references to the base document. The one or more different documents are independent of the base document.
-
Citations
33 Claims
-
1. A method implemented by a computing device for enhanced document retrieval, the method comprising:
-
receiving a search query from an end-user; responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including terms) of the search query; wherein the enhanced document is derived from a base document, the base document having been modified with metadata mined from one or more different documents, the metadata being associated with one or more respective references to the base document, the metadata including one or more of a title of a document, product problem context, and a product problem resolution information, the one or more different documents being independent of the base document; calculating term proximity to determine relevance of the enhanced document as follows; wherein α
, β
are parameters configured to control relative weight of each part of the search query, Hit represents a percentage of the terms in a document in a database over all terms, the database comprising the one or more documents, and EditDistance represents a misorder between the search query and the document; andreturning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
wherein α
, β
are parameters configured to control relative weight of each part of the search query, Iref represents an importance from frequency of reference, and Iage represents an age of a document from a database associated with the base document, the document comprising at least a subset of the terms and/or keywords.
-
-
8. The method of claim 7, wherein Iref and Iage are determined as follows:
-
9. The method of claim 1, wherein after determining the relevance and before returning the ranked results, the method further comprises:
-
creating a respective snippet description for each result of the top-ranked results, the snippet description indicating significance of the result in view of term(s) of the search query; and wherein the ranked search results comprise the respective snippet description for each result of the top-ranked results.
-
-
10. The method of claim 9, wherein creating further comprises:
-
locating one or more blocks from a retrieved document in the top-ranked search results; and highlighting term(s) of the search query in the one or more blocks.
-
-
11. The method of claim 10, wherein locating further comprises:
-
identifying the one or more blocks with a sliding window of configurable size that is applied to portions of the retrieved document; measuring an amount of query-related information carried by text delineated by the sliding window, the measure being based on quantitative criteria such as word frequency, word proximity to a query term, and/or word position; and combining the quantitative criteria with a trained classifier to identify a substantially most informative block for the snippet description.
-
-
12. The method of claim 11, wherein the configurable size is a function of client computing device user interface space available for display of the snippet description.
-
13. The method of claim 11, wherein the trained classifier is trained with linear regression as a function of:
-
wherein x is a vector, y is a value of a straight line to fit value(s) associated with the quantitative criteria, “
residual”
e is a random variable with mean zero, coefficients bj are determined by a condition that a sum of a square residual is small, variables xj are inputs such as log or polynomial of inputs.
-
-
14. A tangible computer-readable medium comprising computer-program instructions executable by a processor to provide content propagation for enhanced document retrieval, the computer-program instructions when executed by a processor, performing operations comprising:
-
receiving a search query from an end-user; responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including term(s) of the search query, the enhanced document being derived from a base document, the base document having been modified with metadata mined from one or more different documents the metadata being associated with one or more respective references to the base document, the one or more different documents being independent of the base document; calculating term proximity as follows; wherein α
, β
are parameters configured to control relative weight of each part of the search query, Hit represents a percentage of the terms in a document in a database over all terms, the database comprising the one or more documents, and EditDistance represents a misorder between the search query and the document;determining relevance of the enhanced document and the set of non-enhanced documents in view of the term proximity and search query popularity criteria; returning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A tangible computer-readable medium comprising computer-program instructions executable by a processor to provide content propagation for enhanced document retrieval, the computer-program instructions when executed by a processor, performing operations comprising:
-
receiving a search query from an end-user; responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including term(s) of the search query, the enhanced document being derived from a base document, the base document having been modified with metadata mined from one or more different documents, the metadata being associated with one or more respective references to the base document, the one or more different documents being independent of the base document; calculating popularity as follows; wherein α
, β
are parameters configured to control relative weight of each part of the search query, Iref represents an importance from frequency of reference, and Iage represents an age of a document from a database associated with the base document, the document comprising at least a subset of the terms and/or keywords;determining relevance of the enhanced document and the set of non-enhanced documents in view of search query term proximity criteria and the popularity; returning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A computing device for enhanced document retrieval, the computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for; receiving a search query from an end-user; responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including term(s) of the search query, the enhanced document being derived from a base document, the base document having been modified with metadata mined from one or more different documents, the metadata being associated with one or more respective references to the base document, the one or more different documents being independent of the base document; calculating term proximity as follows; wherein α
, β
are parameters configured to control relative weight of each part of the search query, Hit represents a percentage of the terms in a document in a database over all terms, the database comprising the one or more documents, and EditDistance represents a misorder between the search query and the document;determining relevance of the enhanced document and the set of non-enhanced documents in view of the term proximity and search query popularity criteria; and returning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance. - View Dependent Claims (26, 27, 28)
-
-
29. A computing device for enhanced document retrieval, the computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for; receiving a search query from an end-user; responsive to receiving the search query, retrieving search results, the search results comprising an enhanced document and a set of non-enhanced documents, the enhanced document and the non-enhanced documents including term(s) of the search query, the enhanced document being derived from a base document, the base document having been modified with metadata mined from one or more different documents, the metadata being associated with one or more respective references to the base document, the one or more different documents being independent of the base document; calculating popularity as follows; wherein α
, β
are parameters configured to control relative weight of each part of the search query, Iref represents an importance from frequency of reference, and Iage represents an age of a document from a database associated with the base document the document comprising at least a subset of the terms and/or keywords;determining relevance of the enhanced document and the set of non-enhanced documents in view of the search query term proximity criteria and the popularity; and returning ranked search results for presentation to the end-user, the ranked search resulting being ranked as a function of the relevance. - View Dependent Claims (30, 31, 32, 33)
-
Specification