Retrieval of structured documents
First Claim
Patent Images
1. A method, comprising:
- performing a query for a search term of a database containing a plurality of structured documents;
ferreting out those structured documents that do not include the search term;
evaluating elements of matched structured documents which are those structured documents that do contain the search term by;
ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.
3 Assignments
0 Petitions
Accused Products
Abstract
This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.
-
Citations
58 Claims
-
1. A method, comprising:
-
performing a query for a search term of a database containing a plurality of structured documents;
ferreting out those structured documents that do not include the search term;
evaluating elements of matched structured documents which are those structured documents that do contain the search term by;
ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method, comprising:
-
ranking structured elements within a structured document, the structured document includes a document element or root element, at least one section element, and at least one paragraph elements, the ranking including;
for each paragraph element, in which Weight(ti,Pj) stands for the weight of the term ti in the paragraph Pj, “
tf(ti,Pj)”
is the term frequency of ti in this paragraph, N denotes the number of documents in the corpus, and ni represents the number of documents containing the term ti, calculating the terms'"'"' weight according to the calculation;
for any section element Ej at the upper levels following a bottom-up fashion, in which “
I(ti,Ej)”
is the entropy measure of the term ti in element Ej, wherein if Weight(ti,Ej)≧
average(Ej)+std_dev(Ej), the term ti is selected as an index term of the element Ej and all sub-elements of Ej would eliminate ti from their index term list, where “
average (Ej)”
denotes the arithmetic average of all terms'"'"' weights in the element Ej, and std_dev(Ej) denotes the standard deviation of these weights, calculating term weights using the calculation Weight(ti, Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej);
repeating the calculating the term weights using the calculation Weight(ti, Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej) until the root element (i.e., the document element) is reached;
obtaining paths for all evaluated candidate elements, and assign query terms'"'"' weight for elements to paths respectively;
ranking paths in which is the inverse document frequency (IDF) value of query term ti, which represents the query term'"'"'s weight and Q is the number of query terms in a query, using the calculation;
and returning elements corresponding to the ranked paths in a descending order. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A method, comprising:
-
obtaining paths for all elements, and assigning query terms'"'"' weight for the paths; and
ranking paths, in which where is the inverse document frequency (IDF) value of query term ti, which represents the query term'"'"'s weight, and Q is the number of query terms in a query, using the calculation;
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer readable medium having computer executable instructions that when executed by a general process computer is capable of performing a method, the method comprising:
-
performing a query for a search term of a database containing a plurality of structured documents;
filtering out those structured documents that do not include the search term; and
evaluating matched structured documents which are those structured documents that do contain the search term by ranking the individual elements based on how well each individual element matches the search term. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A method for retrieving a portion of at least one document from a plurality of documents, comprising:
-
performing a query for a search term on a plurality of the documents;
setting the threshold of the weight of each element to be retrieved;
determining which one of the thresholded documents exceed the threshold; and
ranking the individual elements that exceed the threshold. - View Dependent Claims (41)
-
-
42. A method of displaying to a user the relevancy of the element of a document, comprising:
-
querying the document for a search term;
ranking the relevancy of the element for the search term;
returning to the user certain relevant portion of the document; and
indicating to the user the relevance of the returned portion of the document to the entire document. - View Dependent Claims (43, 44, 45, 46, 47, 48)
-
-
49. A method of displaying to a user the relevancy of the element of a document, comprising:
-
querying the document for a search term;
weighting the relevancy of the element for the search term; and
displaying a path structure of the document, the path structure indicates the relevance of the element to other elements in the document.
-
-
50. A computer readable medium having computer executable instructions that when executed by a general process computer is capable of performing a method, the method comprising:
-
querying a document including a plurality of paths, each one of the plurality of paths indicating certain elements that are being queried a search term;
weighting the relevancy of different ones of the paths for the search term;
ranking different ones of the paths in response to the weighted different ones of the paths; and
displaying a path structure of the plurality of elements within the document, the path structure indicates the relevance of different ones of the paths within the document.
-
-
51. A method, comprising:
-
querying a document having a plurality of elements for a search term;
ranking different ones of the elements in response to the weighted different ones of the elements; and
scaling the elements of the structured document. - View Dependent Claims (52, 53, 54)
-
-
55. A method to be used prior to ranking different paths in a hierarchical index, comprising:
-
querying a document having a plurality of elements for a search term, different ones of the plurality of elements are arranged in the different paths; and
weighting the relevancy of different ones of the paths based on the search term. - View Dependent Claims (56, 57, 58)
-
Specification