Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts
First Claim
Patent Images
1. A method of determining a relationship between first and second textual inputs, the method comprising:
- identifying clauses in the first textual input based on clause type, the identified clauses being of clause types that have predetermined characteristics indicative of usefulness in determining the relationship; and
determining the relationship based on the clauses identified.
2 Assignments
0 Petitions
Accused Products
Abstract
A system is utilized for determining a relationship between first and second textual inputs. The system identifies clauses in the first textual input having predetermined characteristics indicative of usefulness in determining the relationship. The relationship is then determined based on the clauses identified. The clauses can be eliminated from the first textual input, weighted in the first textual input, or simply annotated.
158 Citations
27 Claims
-
1. A method of determining a relationship between first and second textual inputs, the method comprising:
-
identifying clauses in the first textual input based on clause type, the identified clauses being of clause types that have predetermined characteristics indicative of usefulness in determining the relationship; and
determining the relationship based on the clauses identified. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
determining the relationship based on terms in the second textual input and terms in the first textual input other than those in the identified clauses.
-
-
3. The method of claim 2 wherein the first textual input comprises a document and further comprising:
providing an index having an index entry corresponding to the document, the index entry including terms in the document found outside the identified clauses.
-
4. The method of claim 3 wherein determining the relationship comprises:
determining similarity between the first and second textual inputs based on the terms in the second textual input and terms in the index.
-
5. The method of claim 1 and further comprising:
-
annotating the identified clauses with weighting values; and
in an index, providing an index entry corresponding to the first textual input, the index entry including the identified clauses and the weighting values.
-
-
6. The method of claim 5 wherein determining the relationship comprises:
determining a similarity between the first and second textual inputs based on terms in the second textual inputs and corresponding terms in the index, and based on the weighting values.
-
7. The method of claim 1 wherein the first textual input comprises a document returned from an information retrieval system in response to a query which comprises the second textual input, and wherein determining the relationship comprises:
determining a similarity between the query and the document.
-
8. The method of claim 7 wherein the steps of identifying clauses and determining the relationship are repeated for each of a plurality of documents returned by the information retrieval system, and further comprising:
ranking the plurality of documents based on the relationship determined for each of the plurality of documents.
-
9. The method of claim 1 wherein identifying clauses comprises:
-
identifying syntactic dependencies in the first textual input; and
determining whether the syntactic dependencies correspond to clauses having the predetermined characteristics.
-
-
10. The method of claim 9 wherein identifying syntactic dependencies comprises:
performing grammatical analysis on the first textual input to break the first textual input at sentence boundaries and to perform a grammatical analysis associated with each sentence.
-
11. The method of claim 10 wherein the grammatical analysis results in a syntax tree and further comprising:
pruning the syntax tree to eliminate branches corresponding to identified clauses, and wherein determining the relationship comprises determining the relationship based on the pruned syntax tree.
-
12. The method of claim 10 wherein the grammatical analysis results in a syntax tree and further comprising:
annotating branches of the syntax tree corresponding to identified clauses, and wherein determining the relationship comprises determining the relationship based on the annotated syntax tree.
-
13. The method of claim 1 wherein the first textual input comprises a portion of a document and the second textual input comprises a portion of a text corpus.
-
14. The method of claim 1 wherein the first and second textual inputs each comprise documents and wherein determining the relationship comprises determining a similarity in meaning between the documents.
-
15. The method of claim 14 wherein determining the relationship further comprises determining whether the first and second textual inputs are to be clustered in a logical cluster based on the similarity between the first and second documents.
-
16. A computer readable medium storing an index of textual material used for determining a relationship between first and second textual inputs, the index comprising a data structure including:
-
a plurality of terms from the textual material, the plurality of terms having words contained in clauses in the textual material removed therefrom, the clauses being of clause types having predetermined characteristics indicative of usefulness in determining the relationship. - View Dependent Claims (17)
a tree structure corresponding to each of the plurality of sentences, each tree structure being indicative of syntactic dependencies representing clauses in the corresponding sentences.
-
-
18. A computer readable medium storing an index of textual material used for determining a relationship between first and second textual inputs, the index comprising a data structure including:
-
a plurality of terms from the textual material, the plurality of terms having words contained in predetermined types of clauses in the textual material having predetermined characteristics indicative of usefulness in determining the relationship being annotated therein. - View Dependent Claims (19, 20, 21)
a tree structure corresponding to each of the plurality of sentences, each tree structure being indicative of syntactic dependencies representing clauses in the corresponding sentences, and wherein the syntactic dependencies representing the predetermined clause types are annotated in the tree structure.
-
-
20. The computer readable medium of claim 19 wherein the predetermined clause types are annotated with a binary type annotation.
-
21. The computer readable medium of claim 19 wherein the predetermined clause types are annotated with a weight value indicative of the usefulness of the predetermined clause types in determining the relationship.
-
22. A method of generating an index corresponding to a textual corpus, the index for use in determining a relationship between portions of the textual corpus and a textual input, the method comprising:
-
identifying clauses in the textual corpus as being of clause types having predetermined characteristics indicative of usefulness in determining the relationship; and
generating the index based on the clauses identified.
-
-
23. A method of identifying clause types corresponding to clauses in a textual input, wherein the clauses have predetermined characteristics indicative of usefulness in determining a meaning of the textual input, the method comprising:
-
(a) selecting a clause type;
(b) removing all clauses of the selected clause type from the textual input;
(c) providing an index corresponding to the textual input with the clauses removed;
(d) performing information retrieval operations on the index;
(e) determining a reduction in a size of the index achieved by removing the clauses;
(f) determining performance of the information retrieval operations; and
(g) if performance is adequate, given the reduction in the size of the index, identifying the selected clause as a clause having the predetermined characteristics of usefulness. - View Dependent Claims (24, 25, 26, 27)
adding the selected clause to a list of clause types having the predetermined characteristics.
-
-
25. The method of claim 23 and further comprising:
repeating steps (a)-(g) until all desired clause types have been selected.
-
26. The method of claim 23 and further comprising:
if performance is inadequate, conducting failure analysis to determine whether the selected clause type should be conditionally identified as a clause having the predetermined characteristics.
-
27. The method of claim 23 and further comprising performing steps (a)-(g) in the order listed.
Specification