Retrieval of Structured Documents
First Claim
Patent Images
1. A method, comprising:
- ranking structured elements within a structured document, the structured document includes a document element or root element, at least one section element, and at least one paragraph elements, the ranking including;
for each paragraph element, in which Weight(tj,Pj) stands for the weight of the term ti in the paragraph Pj, “
tf(ti,Pj)”
is the term frequency of ti in this paragraph, N denotes the number of documents in the corpus, and ni represents the number of documents containing the term ti calculating the terms'"'"' weight according to the calculation;
for any section element Ej at the upper levels following a bottom-up fashion, in which “
I(ti,Ej)”
is the entropy measure of the term ti in element Ej, wherein if Weight(ti,Ej)≧
average(Ej)+std_dev(Ej), the term ti is selected as an index term of the element Ej and all sub-elements of Ej would eliminate ti from their index term list, where “
average (Ej)”
denotes the arithmetic average of all terms'"'"' weights in the element Ej, and std_dev(Ej) denotes the standard deviation of these weights, calculating term weights using the calculation Weight(ti,Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej);
repeating the calculating the term weights using the calculation Weight(ti,Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej) until the root element (i.e., the document element) is reached;
obtaining paths for all evaluated candidate elements, and assign query terms'"'"' weight for elements to paths respectively;
ranking paths in which is the inverse document frequency (IDF) value of query term ti, which represents the query term'"'"'s weight and Q is the number of query terms in a query, using the calculation;
returning elements corresponding to the ranked paths in a descending order.
2 Assignments
0 Petitions
Accused Products
Abstract
This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.
-
Citations
10 Claims
-
1. A method, comprising:
-
ranking structured elements within a structured document, the structured document includes a document element or root element, at least one section element, and at least one paragraph elements, the ranking including; for each paragraph element, in which Weight(tj,Pj) stands for the weight of the term ti in the paragraph Pj, “
tf(ti,Pj)”
is the term frequency of ti in this paragraph, N denotes the number of documents in the corpus, and ni represents the number of documents containing the term ti calculating the terms'"'"' weight according to the calculation;for any section element Ej at the upper levels following a bottom-up fashion, in which “
I(ti,Ej)”
is the entropy measure of the term ti in element Ej, wherein if Weight(ti,Ej)≧
average(Ej)+std_dev(Ej), the term ti is selected as an index term of the element Ej and all sub-elements of Ej would eliminate ti from their index term list, where “
average (Ej)”
denotes the arithmetic average of all terms'"'"' weights in the element Ej, and std_dev(Ej) denotes the standard deviation of these weights, calculating term weights using the calculation Weight(ti,Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej);repeating the calculating the term weights using the calculation Weight(ti,Ej)=ln(1+tf(ti,Ej))×
I(ti,Ej) until the root element (i.e., the document element) is reached;obtaining paths for all evaluated candidate elements, and assign query terms'"'"' weight for elements to paths respectively; ranking paths in which is the inverse document frequency (IDF) value of query term ti, which represents the query term'"'"'s weight and Q is the number of query terms in a query, using the calculation; returning elements corresponding to the ranked paths in a descending order. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method, comprising
displaying ranked elements of a structured document containing a search term by: -
ranking individual elements of the document based on a number of appearances of the search term in each of the individual elements; displaying a hierarchical structure of the structured document by providing a hierarchical tree that displays a structure of the individual elements of the structured document; and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user. - View Dependent Claims (10)
-
Specification