Method and system of ranking and clustering for document indexing and retrieval

US 6,766,316 B2
Filed: 01/18/2001
Issued: 07/20/2004
Est. Priority Date: 01/18/2001
Status: Active Grant

First Claim

Patent Images

1. A relevancy ranking method comprising the steps of:

parsing an input query into at least one query predicate structure;

parsing a set of documents to generate at least one document predicate structure;

comparing said at least one query predicate structure with said at least one document predicate structure;

calculating a matching degree using a multilevel modifier strategy to assign different relevance values to different parts of said at least one query predicate structure and said at least one document predicate structure match; and

calculating a similarity coefficient based on pairs of said at least one query predicate structure and said at least one document predicate structure to determine relevance of each one of said set of documents to said input query.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user'"'"'s query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of known documents to generate one or more document predicate structures. A comparison of each query predicate structure with each document predicate structure is performed to determine a matching degree, represented by a real number. A multilevel modifier strategy is implemented to assign different relevance values to the different parts of each predicate structure match to calculate the predicate structure'"'"'s matching degree. The relevance of a document to a user'"'"'s query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates. Documents are autonomously clustered using a self-organizing neural network that provides a coordinate system that makes judgments in a non-subjective fashion.

333 Citations

63 Claims

1. A relevancy ranking method comprising the steps of:
- parsing an input query into at least one query predicate structure;
  
  parsing a set of documents to generate at least one document predicate structure;
  
  comparing said at least one query predicate structure with said at least one document predicate structure;
  
  calculating a matching degree using a multilevel modifier strategy to assign different relevance values to different parts of said at least one query predicate structure and said at least one document predicate structure match; and
  
  calculating a similarity coefficient based on pairs of said at least one query predicate structure and said at least one document predicate structure to determine relevance of each one of said set of documents to said input query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. A relevancy ranking method as recited in claim 1, wherein said step of parsing an input query into at least one predicate structure is performed using an ontological parser.
  - 3. A relevancy ranking method as recited in claim 1, wherein said step of parsing a set of documents to generate at least one document predicate structure is performed using an ontological parser.
  - 4. A relevancy ranking method as recited in claim 1, wherein said matching degree is a real number.
  - 5. A relevancy ranking method as recited in claim 1, wherein said calculating a matching degree step comprises the steps of:
6. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon an abstraction level of said at least one query predicate structure and said at least one document predicate structure, wherein said match is assigned a small weight when said match is relatively abstract.
7. A relevancy ranking method as recited in claim 6, wherein said abstraction level of said at least one query predicate structure and said at least one document predicate structure comprises predicate only matches, argument only marches, and predicate and argument matches, wherein said predicate only matches are more abstract than said argument only matches, and said argument only matches are more abstract than said predicate and argument matches.
8. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon concept proximity representing an ontological relationship between two concepts.
9. A relevancy ranking method as recited in claim 8, wherein said ontological relationship between two concepts is closer when a difference between said two concepts is smaller, and said matching degree is assigned a higher relevancy bonus.
10. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a location of a predicate in one of said documents in said set of documents.
11. A relevancy ranking method as recited in claim 10, wherein when said location is disposed in the beginning of said one of said documents, said one of said documents is assigned a higher relevancy number.
12. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a degree of proper noun matching.
13. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a matching degree of words having a same word stern.
14. A relevancy ranking method as recited in claim 1, further comprising the step of identifying said at least one document predicate structure by a predicate key that is an integer representation, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding ones of said predicate keys.
15. A relevancy ranking method as recited in claim 14, comprising the further step of constructing multi-dimensional vectors using said integer representation.
16. A relevancy ranking method as recited in claim 15, comprising the further step of normalizing said multi-dimensional vectors.
17. A relevancy ranking method as recited in claim 1, further comprising the step of identifying said at least one query predicate structure by a predicate key that is an integer representation, and constructing multi-dimensional vectors, for said at least one query predicate structure, using said integer representation.
18. A relevancy ranking method as recited in claim 16, further comprising the step of identifying said at least one query predicate structure by a second predicate key that is a second integer representation, and constructing second multi-dimensional vectors, for said at least one query predicate structure, using said second integer representation.
19. A relevancy ranking method as recited in claim 18, further comprising the steps of:
- performing a dot-product operation between multi-dimensional vectors for said at least one query predicate structure and said second multi-dimensional vectors for said at least one document predicate structure;
  
  ranking each of said documents in said document set from largest dot-product result to smallest dot-product result; and
  
  returning said rankings.
20. A relevancy ranking method as recited in claim 1, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a size of each of said documents in said set of documents.

21. A clustering method comprising the steps of:
- parsing an input query into at least one query predicate structure;
  
  vectorizing said input query;
  
  identifying said at least one query predicate structure by a first predicate key that is a first integer, and constructing multi-dimensional vectors, for said at least one query predicate structure, using said integer;
  
  parsing a plurality of documents into at least one document predicate structure for each of said plurality of documents;
  
  vectorizing said plurality of documents;
  
  identifying said at least one document predicate structure by a second predicate key that is a second integer, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding ones of said second predicate keys;
  
  comparing said at least one query predicate structure with said at least one document predicate structure for said plurality of documents;
  
  clustering similar documents, within said plurality of documents, wherein at least one document vector representation matches said at least one query predicate structure.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
- - 22. A clustering method as recited in claim 21, wherein said clustering is performed based on patterns of predicate pairs of said matching ones of said plurality of documents.
  - 23. A clustering method as recited in claim 22, wherein said clustering step further comprises comparing said at least one query predicate structure of said input query to a map of said clustered matches.
  - 24. A clustering method as recited in claim 23, wherein said clustering step further comprises identifying clusters most likely to fit said input query.
  - 25. A clustering method as recited in claim 23, wherein said clustering step further comprises providing a feedback mechanism so that users can determine if a cluster is a good fit.
  - 26. A clustering method as recited in claim 23, wherein said clustering step comprises the steps of:
27. A clustering method as recited in claim 21, wherein said clustering is performed using a neural network, said clustering step performs steps of:
- self-organizing matching ones of said plurality of documents that match said input query; and
  
  retrieving clusters of said matching ones of said plurality of documents that match said input query.
28. A clustering method as recited in claim 27, wherein said neural network comprises a plurality of neurodes.
29. A clustering method as recited in claim 28, wherein said step of self-organizing said matching ones of said plurality of documents that match said input query comprises the steps of:
- developing a map from said plurality of neurodes; and
  
  determining clusters of said plurality of neurodes that represent ones of said plurality of documents conceptually near one another.
30. A clustering method as recited in claim 29, comprising the further step of clustering matching ones of said plurality of documents that match said input query.
31. A clustering method as recited in claim 30, wherein said clustering is performed based on patterns of predicate pairs of said matching ones of said plurality of documents.
32. A clustering method as recited in claim 31, wherein said clustering step further comprises comparing said at least one query predicate structure of said input query to a map of said clustered matches.
33. A clustering method as recited in claim 32, wherein said clustering step further comprises identifying clusters most likely to fit said input query.
34. A clustering method as recited in claim 32, wherein said clustering step further comprises providing a feedback mechanism so that users can determine if a cluster is a good fit.
35. A clustering method as recited in claim 32, wherein said clustering step comprises the steps of:
- self-organizing to adapt a collection of said plurality of documents matching said input query; and
  
  identifying and returning at least one appropriate cluster of said collection of said plurality of documents.
36. A clustering method as recited in claim 21, further comprising the steps of:
- calculating a matching degree using a multilevel modifier strategy to assign different relevance values to different parts of said at least one query predicate structure and said at least one document predicate structure match; and
  
  calculating a similarity coefficient based on pairs of said at least one query predicate structure and said at least one document predicate structure to determine relevance of each one of said plurality of documents to said input query.
37. A clustering method as recited in claim 36, wherein said step of parsing an input query into at least one predicate structure is performed using an ontological parser.
38. A clustering method as recited in claim 36, wherein said step of parsing a plurality of documents to generate at least one document predicate structure is performed using an ontological parser.
39. A clustering method as recited in claim 36, wherein said matching degree is a real number.
40. A clustering method as recited in claim 36, wherein said step of calculating a matching degree comprises the steps of:
- dynamically comparing an overall pattern of document predicate structures for each one of said plurality of documents to said at least one query predicate structure and returning a ranking based on a predicate vector similarity measure;
  
  comparing said at least one query predicate structure and said at least one document predicate structure and returning a predicate structure similarity measure;
  
  comparing similarity between predicate parts of said at least one query predicate structure and said at least one document predicate structure and returning a predicate matching similarity measure;
  
  comparing argument parts of said at least one query predicate structure and said at least one document predicate structure and returning an argument similarity measure;
  
  comparing concepts of said at least one query predicate structure and said at least one document predicate structure and returning a concept similarity measure; and
  
  comparing proper nouns of said at least one query predicate structure and said at least one document predicate structure and returning a proper noun similarity measure.
41. A clustering method as recited in claim 36, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon an abstraction level of said at least one query predicate structure and said at least one document predicate structure, wherein said match is assigned a small weight when said match is relatively abstract.
42. A clustering method as recited in claim 41, wherein said abstraction level of said at least one query predicate structure and said at least one document predicate structure comprises predicate only matches, argument only matches, and predicate and argument matches, wherein said predicate only matches are more abstract than said argument only matches, and said argument only matches are more abstract than said predicate and argument matches.
43. A clustering method as recited in claim 36, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon concept proximity representing an ontological relationship between two concepts.
44. A clustering method as recited in claim 43, wherein said ontological relationship between two concepts is closer when a difference between said two concepts is smaller, and said matching degree is assigned a higher relevancy bonus.
45. A clustering method as recited in claim 36, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a location of a predicate in one of said documents in said plurality of documents.
46. A clustering method as recited in claim 45, wherein when said location is disposed in the beginning of said one of said plurality of documents, said one of said plurality of documents is assigned a higher relevancy number.
47. A clustering method as recited in claim 36, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a degree of proper noun matching.
48. A clustering method as recited in claim 36, wherein said step of calculating said matching degree using a multilevel modifier strategy determines said relevance values based upon a matching degree of words having a same word stem.
49. A clustering method as recital in claim 21, wherein said second integer is an integer representation.
50. A clustering method as recited in claim 21, comprising the further step of constructing second multi-dimensional vectors using said second integer.
51. A clustering method as recited in claim 50, comprising the further step of normalizing said second multi-dimensional vectors.
52. A clustering method as recited in claim 49, wherein said first integer is a second integer representation.

53. A relevancy ranking system comprising:
- at least one ontological parser to parse an input query into at least one query predicate structure and to parse each document of a set of documents into at least one document predicate structure;
  
  an input query predicate storage unit that stores said at least one input query predicate structure;
  
  a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
  
  a query vectorization unit that converts said at least one query predicate structure into multi-dimensional numerical query vectors;
  
  a document vectorization unit that converts said at least one document predicate structure into second multi-dimensional numerical document vectors; and
  
  a relevancy ranking unit that compares each of said at least one input query predicate structure with said at least one document predicate structure, calculates a matching degree to assign different relevance values to different parts of said at least one query predicate structure and said at least one document predicate structure match, and calculates a similarity coefficient based on pairs of said at least one query predicate structure and said at least one document predicate structure to determine relevance of each one of said set of documents to said input query.
- View Dependent Claims (54, 55, 56)
- - 54. A relevancy ranking system as recited in claim 53, wherein said matching degree is a real number.
  - 55. A relevancy ranking system as recited in claim 53, further comprising a feedback mechanism so that users can determine if a cluster is a good match for said input query.
  - 56. A relevancy ranking system as recited in claim 53, wherein a neural network self-organizes and retrieves clusters of said matching ones of said set of documents that match said input query.

57. A relevancy ranking system comprising:
- at least one ontological parser to parse an input query into at least one query predicate structure and to parse each of a set of documents into at least one document predicate structure;
  
  an input query predicate storage unit that stores said at least one input query predicate structure;
  
  a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
  
  a document vectorization unit that converts said at least one document predicate structure into multi-dimensional numerical vectors;
  
  a query vectorization unit that converts said at least one query predicate structures into second multi-dimensional numerical vectors;
  
  a relevancy ranking unit that compares each of said at least one input query predicate structure with each of said at least one document predicate structure, calculates a matching degree to assign different relevance values to different parts of said at least one query predicate structure and said at least one document predicate structure match, and calculates a similarity coefficient based on pairs of said at least one query predicate structure and said at least one document predicate structure to determine relevance of each one of said set of documents to said input query; and
  
  a neural network for providing clusters of matching ones of said set of documents that match said input query.
- View Dependent Claims (58, 59, 60)
- - 58. A relevancy ranking system as recited in claim 57, further comprising a feedback mechanism so that users can determine if a provided cluster is a good match for said input query.
  - 59. A relevancy ranking system as recited in claim 57, wherein said neural network self-organizes and retrieves clusters of said matching ones of said set of documents that match said input query.
  - 60. A relevancy ranking system as recited in claim 57, wherein said neural network comprises a plurality of neurodes.

61. A clustering system comprising:
- at least one ontological parser to parse an input query into at least one query predicate structure and to parse each of a set of documents into at least one document predicate structure;
  
  an input query predicate storage unit that stores said at least one input query predicate structure;
  
  a document predicate storage unit that stores said at least one document predicate structure for each of said documents in said set;
  
  a document vectorization unit that converts said at least one document predicate structure into multi-dimensional numerical vector representations;
  
  a query vectorization unit that converts said at least one query predicate structure into second multi-dimensional numerical vector representations; and
  
  a neural network for providing clusters of matching ones of said set of documents that match said input query.

62. A question and answering system comprising:
- at least one ontological parser to parse an input query into at least one query predicate structure and to parse each of a set of documents into at least one document predicate structure;
  
  a query vectorization unit that converts said at least one query predicate structure into multi-dimensional numerical vector representations, wherein said at least one query predicate structure is identified by a first predicate key that is a first integer, and multi-dimensional vectors for said at least one query predicate structure is constructed using said first integer;
  
  a document vectorization unit that converts said at least one document predicate structure for each of said set of documents into multi-dimensional numerical vector representations, wherein said at least one document predicate structure is identified by a second predicate key that is a second integer, wherein conceptual nearness of two of said document predicate structures is estimated by subtracting corresponding ones of said second predicate keys;
  
  a clustering unit that groups similar documents, within said set of documents, wherein said at least one multi-dimensional numerical vector representation matches said at least one query predicate structure; and
  
  a relevancy ranking unit that compares said at least one query predicate structure with said at least one document predicate structure for each of said set of documents.
- View Dependent Claims (63)
- - 63. A question and answering system as recited in claim 62, further comprising an answer formulation unit that provides a natural language response to said input query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Leidos, Inc. (Leidos Holdings, Inc.)
Original Assignee
Science Applications International Corporation
Inventors
Wang, Lei, Tseng, Jason Chun-Ming, Caudill, Maureen
Primary Examiner(s)
KINDRED, ALFORD W

Application Number

US09/761,188
Publication Number

US 20020129015A1
Time in Patent Office

1,279 Days
Field of Search

707/1-6, 707/7, 707/100-102, 707/200
US Class Current

1/1
CPC Class Codes

G06F 16/3347   using vector based model

G06F 16/353   into predefined classes

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Method and system of ranking and clustering for document indexing and retrieval

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

333 Citations

63 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system of ranking and clustering for document indexing and retrieval

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

333 Citations

63 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links