System and method of ranking and retrieving documents based on authority scores of schemas and documents
First Claim
1. A method for retrieving documents and associated schemas, comprising:
- retrieving a seed set of documents with a hyperlinked structure, and associated schemas;
maintaining a hub score, h(d), and an authority score, a(d), for each document of the seed set, and an authority score a(s) for each schema used by one or more documents in the set;
initializing the hub score, h(d), the authority score, a(d), and the authority score a(s) to respective predefined criteria; and
iteratively recomputing any one or more of;
the authority score a(s) for each schema based on the hub scores h(d) of the documents that use the schema and the authority scores a(d) of the documents that use the schema;
the authority scores a(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the hub scores (d) of the documents that point to said each document;
or the hub scores h(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the authority scores a(d) of the documents that said each document points to;
ordering the documents according to any one or more of the authority score a(d) or the hub score h(d) of the documents, ordering the schemas according to the authority score a(s) of the schemas, returning an ordered set of documents from the seed set of documents.
3 Assignments
0 Petitions
Accused Products
Abstract
The ranking manager and associated method of the present invention rank the authority of XML documents and their corresponding schemas using an iterative process over a set of hyperlinked XML documents and their schemas. The ranking manager introduces the notion of authoritative schemas and document structure, and maintains an authority score for each document in the set, a hub score for each document in the set, and an authority score for each schema that is used by one or more documents in the set. The ranking manager initializes these scores according to predefined criteria, and then recomputes in successive iterations: (1) the authority scores for each schema based on the hub scores of the documents that use the schema and the authority scores of the documents that use the schema, (2) the authority scores for each document based on the authority score for the schema that it uses and the hub scores of the documents that point to it, and/or (3) the hub scores for each document in the set based on the authority score for the schema that it uses and the authority scores of the documents that it points to. The ranking manager performs these computations until convergence or a threshold value of difference is reached.
278 Citations
20 Claims
-
1. A method for retrieving documents and associated schemas, comprising:
-
retrieving a seed set of documents with a hyperlinked structure, and associated schemas;
maintaining a hub score, h(d), and an authority score, a(d), for each document of the seed set, and an authority score a(s) for each schema used by one or more documents in the set;
initializing the hub score, h(d), the authority score, a(d), and the authority score a(s) to respective predefined criteria; and
iteratively recomputing any one or more of;
the authority score a(s) for each schema based on the hub scores h(d) of the documents that use the schema and the authority scores a(d) of the documents that use the schema;
the authority scores a(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the hub scores (d) of the documents that point to said each document;
orthe hub scores h(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the authority scores a(d) of the documents that said each document points to;
ordering the documents according to any one or more of the authority score a(d) or the hub score h(d) of the documents, ordering the schemas according to the authority score a(s) of the schemas, returning an ordered set of documents from the seed set of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for ranking a set of documents, d, with a hyperlinked structure, and associated schemas, comprising:
-
maintaining a hub score, h(d), and an authority score, a(d), for each document, d, and an authority score a(s) for each schema used by one or more documents in the set;
initializing the hub score, h(d), the authority score, a(d), and the authority score a(s) to respective predefined criteria; and
iteratively recomputing any one or more of;
the authority score a(s) for each schema based on the hub scores h(d) of the documents that use the schema and the authority scores a(d) of the documents that use the schema;
the authority scores a(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the hub scores (d) of the documents that point to said each document;
the hub scores h(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the authority scores a(d) of the documents that said each document points to. - View Dependent Claims (9, 10, 11, 12)
ranking the documents in the set and the associated schemas according to the recomputed authority score a(s), the authority scores a(d), and the hub scores h(d).
-
-
12. The method of claim 9, wherein iteratively recomputing including recomputing the authority score a(s), the authority scores a(d), and the hub scores h(d) until a threshold value of difference is reached;
- and
ranking the documents in the set and the associated schemas according to the recomputed authority score a(s), the authority scores a(d), and the hub scores h(d).
- and
-
13. A system for retrieving a set of documents, d, with a hyperlinked structure, and associated schemas, comprising:
-
a ranking module for maintaining a hub score, h(d), and an authority score, a(d), for each document, d, and an authority score a(s) for each schema used by one or more documents in the set;
the ranking manager initializing the hub score, h(d), the authority score, a(d), and the authority score a(s) to respective predefined criteria; and
the ranking module iteratively recomputing any one or more of;
the authority score a(s) for each schema based on the hub scores h(d) of the documents that use the schema and the authority scores a(d) of the documents that use the schema;
the authority scores a(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the hub scores (d) of the documents that point to said each document;
the hub scores h(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the authority scores a(d) of the documents that said each document points to. - View Dependent Claims (14, 15, 16, 17)
the ranking manager ranks the documents in the set and the associated schemas according to the recomputed authority score a(s), the authority scores a(d), and the hub scores h(d).
-
-
17. The method of claim 14, wherein the ranking manager iteratively recomputes the authority score a(s), the authority scores a(d), and the hub scores h(d) until a threshold value of difference is reached;
- and
the ranking manager ranks the documents in the set and the associated schemas according to the recomputed authority score a(s), the authority scores a(d), and the hub scores h(d).
- and
-
18. A computer software product for ranking a set of documents, d, with a hyperlinked structure, and associated schemas, comprising:
-
a ranking module for maintaining a hub score, h(d), and an authority score, a(d), for each document, d, and an authority score a(s) for each schema used by one or more documents in the set;
the ranking manager initializing the hub score, h(d), the authority score, a(d), and the authority score a(s) to respective predefined criteria; and
the ranking module iteratively recomputing any one or more of;
the authority score a(s) for each schema based on the hub scores h(d) of the documents that use the schema and the authority scores a(d) of the documents that use the schema;
the authority scores a(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the hub scores (d) of the documents that point to said each document;
the hub scores h(d) for said each document in the set based on the authority score a(s) for the schema used by said each document and the authority scores a(d) of the documents that said each document points to. - View Dependent Claims (19, 20)
wherein the ranking manager iteratively recomputes the authority score a(s), the authority scores a(d), and the hub scores h(d) any one or more conditions is satisfied;
convergence is reached or threshold value of difference is reached; and
the ranking manager ranks the documents in the set and the associated schemas according to the recomputed authority score a(s), the authority scores a(d), and the hub scores h(d).
-
Specification