Method and system of ranking and clustering for document indexing and retrieval
First Claim
1. A relevancy ranking method comprising the steps of:
- parsing an input query into at least one query predicate structure;
parsing a set of documents to generate at least one document predicate structure;
comparing each of said at least one query predicate structure with each of said at least one document predicate structure;
calculating a matching degree using a multilevel modifier strategy to assign different relevance values to different parts of each of said at least one query predicate structure and said at least one document predicate structure match; and
calculating a similarity coefficient based on pairs of said at least one query predicate structure and each of said at least one document predicate structure to determine relevance of each one of said set of documents to said input query.
6 Assignments
0 Petitions
Accused Products
Abstract
A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user'"'"'s query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of known documents to generate one or more document predicate structures. A comparison of each query predicate structure with each document predicate structure is performed to determine a matching degree, represented by a real number. A multilevel modifier strategy is implemented to assign different relevance values to the different parts of each predicate structure match to calculate the predicate structure'"'"'s matching degree. The relevance of a document to a user'"'"'s query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates. Documents are autonomously clustered using a self-organizing neural network that provides a coordinate system that makes judgments in a non-subjective fashion.
222 Citations
69 Claims
-
1. A relevancy ranking method comprising the steps of:
-
parsing an input query into at least one query predicate structure;
parsing a set of documents to generate at least one document predicate structure;
comparing each of said at least one query predicate structure with each of said at least one document predicate structure;
calculating a matching degree using a multilevel modifier strategy to assign different relevance values to different parts of each of said at least one query predicate structure and said at least one document predicate structure match; and
calculating a similarity coefficient based on pairs of said at least one query predicate structure and each of said at least one document predicate structure to determine relevance of each one of said set of documents to said input query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53)
-
-
21. A clustering method comprising the steps of:
-
parsing an input query into at least one query predicate structure;
vectorizing said input query;
identifying each of said query predicate structures by a predicate key that is an integer, and constructing multi-dimensional vectors, for each of said query predicate structures, using said integers;
parsing a plurality of documents into at least one document predicate structure for each of said plurality of documents;
vectorizing said set of documents;
identifying said at least one document predicate structure by a predicate key that is an integer, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding ones of said predicate keys;
comparing said at least one query predicate structure with said plurality of document predicate structures for a said plurality of documents;
clustering similar documents, within said plurality of documents, where said at least one document vector representation matches said at least one query predicate structure. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 36, 49)
-
-
54. A method of vectorizing a set of document predicate structures, comprising the steps of:
identifying each set of predicates and arguments in said set of predicate structures by predicate keys that are integer representations, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding one of said predicate keys. - View Dependent Claims (55, 56, 57, 58, 60, 65, 66, 69)
-
59. A relevancy ranking system comprising:
-
at least one ontological parser to parse an input query into at least one query predicate structure, and a set of documents each into at least one document predicate structure;
an input query predicate storage unit that stores said at least one input query predicate structure;
a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
a query vectorization unit that converts said at least one query predicate structure into multidimensional numerical query vectors;
a document vectorization unit that converts said at least one document predicate structures into multidimensional numerical document vectors; and
a relevancy ranking unit that compares each of said at least one input query predicate structure with each of said at least one document predicate structure, calculates a matching degree to assign different relevance values to different parts of each of said at least one query predicate structure and said at least one document predicate structure match, and calculates a similarity coefficient based on pairs of said at least one query predicate structure and each of said at least one document predicate structure to determine relevance of each one of said set of documents to said input query.
-
-
61. A relevancy ranking system comprising:
-
at least one ontological parser to parse an input query into at least one query predicate structure, and a set of documents each into at least one document predicate structure;
an input query predicate storage unit that stores said at least one input query predicate structure;
a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
a document vectorization unit that converts said at least one document predicate structure into multidimensional numerical vectors;
a query vectorization unit that converts said at least one query predicate structures into multidimensional numerical vectors;
a relevancy ranking unit that compares each of said at least one input query predicate structure with each of said at least one document predicate structure, calculates a matching degree to assign different relevance values to different parts of each of said at least one query predicate structure and said at least one document predicate structure match, and calculates a similarity coefficient based on pairs of said at least one query predicate structure and each of said at least one document predicate structure to determine relevance of each one of said set of documents to said input query; and
a neural network for providing clusters of matching ones of said set of documents that match said input query. - View Dependent Claims (62, 63, 64)
-
-
67. A clustering system comprising:
-
at least one ontological parser to parse an input query into at least one query predicate structure, and a set of documents each into at least one document predicate structure;
an input query predicate storage unit that stores said at least one input query predicate structure;
a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
a document vectorization unit that converts said at least one document predicate structure into multidimensional numerical vector representations;
a query vectorization unit that converts said at least one query predicate structure into multidimensional numerical vector representations; and
a neural network for providing clusters of matching ones of said set of documents that match said input query.
-
-
68. A question and answering system comprising:
-
at least one ontological parser to parse an input query into at least one query predicate structure, and a set of documents each into at least one document predicate structure for each of a plurality of documents;
a query vectorization unit that converts said at least one query predicate structure into multidimensional numerical vector representations, wherein each of said query predicate structures are identified by a predicate key that is an integer, and multi-dimensional vectors for each of said query predicate structures are constructed using said integers;
a document vectorization unit that converts said at least one document predicate structure for each of a plurality of documents into multidimensional numerical vector representations, wherein said at least one document predicate structure is identified by a predicate key that is an integer, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding ones of said predicate keys;
clustering unit that groups similar documents, within said plurality of documents, where said at least one document vector representation matches said at least one query predicate structure; and
a relevancy ranking unit that compares said at least one query predicate structure with said plurality of document predicate structures for each of said plurality of documents.
-
Specification