Concept matching system
First Claim
1. A system for retrieving documents related to a concept from a text corpus comprising:
- a computer comprising non-transitory storage media which stores;
a set of at least four semantic classes, each including at least five keywords, which classes are combinable in different combinations thereof according to predefined syntactic rules to express the concept,a set of user-selected keywords for each of the semantic classes to be used in searching documents in the text corpus, each of a plurality of the sets of user-selected keywords including a plurality of user-selected keywords, at least some of the semantic classes including keywords which are used in relevant expressions in retrieved text when the constituent notion is being conveyed and including keywords having different meanings from other keywords of the same semantic class and which are not synonymous with the other keywords of the same semantic class, anda plurality of the syntactic rules to be applied to identified text portions which include one or more of the user-selected keywords, each of the syntactic rules identifying a pair of semantic classes comprising a respective first of the semantic classes and a respective second of the semantic classes, whereby different rules identify different pairs of semantic classes, the rule being satisfied when any first keyword from the first of the pair of semantic classes and any second keyword from the second of the pair of semantic classes are in any one of a plurality of syntactic relationships; and
a concept matching module, which accesses the memory, and which identifies text portions within the text corpus which include one or more of the keywords and which applies each of the syntactic rules to the text portions and identifies those text portions which each satisfy at least one of the syntactic rules, and retrieves documents which include at least one of the identified text portions.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for retrieving documents related to a concept from a text corpus includes a set of stored semantic classes which are combinable to express the concept each class including a set of keywords, each set of keywords including at least one keyword. Syntactic rules are applied to identified text portions which include one or more of the keywords. A rule is satisfied when keywords from the first and second semantic classes are in any one of a plurality of syntactic relationships. A concept matching module identifies text portions within the text corpus which include one or more of the keywords, for applying the syntactic rules to the text portions, and for identifying those text portions which satisfy at least one of the rules. Documents to be retrieved may include at least one of the identified text portions.
-
Citations
20 Claims
-
1. A system for retrieving documents related to a concept from a text corpus comprising:
-
a computer comprising non-transitory storage media which stores; a set of at least four semantic classes, each including at least five keywords, which classes are combinable in different combinations thereof according to predefined syntactic rules to express the concept, a set of user-selected keywords for each of the semantic classes to be used in searching documents in the text corpus, each of a plurality of the sets of user-selected keywords including a plurality of user-selected keywords, at least some of the semantic classes including keywords which are used in relevant expressions in retrieved text when the constituent notion is being conveyed and including keywords having different meanings from other keywords of the same semantic class and which are not synonymous with the other keywords of the same semantic class, and a plurality of the syntactic rules to be applied to identified text portions which include one or more of the user-selected keywords, each of the syntactic rules identifying a pair of semantic classes comprising a respective first of the semantic classes and a respective second of the semantic classes, whereby different rules identify different pairs of semantic classes, the rule being satisfied when any first keyword from the first of the pair of semantic classes and any second keyword from the second of the pair of semantic classes are in any one of a plurality of syntactic relationships; and a concept matching module, which accesses the memory, and which identifies text portions within the text corpus which include one or more of the keywords and which applies each of the syntactic rules to the text portions and identifies those text portions which each satisfy at least one of the syntactic rules, and retrieves documents which include at least one of the identified text portions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for retrieving documents related to a concept from a text corpus comprising:
-
a computer comprising non-transitory storage media which stores; a set of semantic classes which are combinable to express the concept and wherein there are from three to ten semantic classes which are combinable according to syntactic rules in different combinations thereof, a set of user-selected keywords for each of the semantic classes to be used in searching documents in the text corpus, each of a plurality of the sets of user-selected keywords including a plurality of user-selected keywords, at least some of the semantic classes including keywords which are used in relevant expressions in retrieved text when the constituent notion is being conveyed and including keywords having different meanings from other keywords of the same semantic class and which are not synonymous with the other keywords of the semantic class, and the syntactic rules to be applied to identified text portions which include one or more of the user-selected keywords, each of the syntactic rules identifying a pair of semantic classes including a respective first of the semantic classes and a respective second of the semantic classes, the rule being satisfied when any first keyword from the first of the semantic classes and any second keyword from the second of the semantic classes are in any one of a plurality of syntactic relationships; and a concept matching module, which accesses memory, and which identifies text portions within the text corpus which include one or more of the keywords and which applies each of the syntactic rules to the text portions and identifies those text portions which each satisfy at least one of the syntactic rules, and retrieves documents which include at least one of the identified text portions.
-
-
15. A computer system for retrieving documents related to a concept from a text corpus comprising:
-
a computer comprising non-transitory storage media which stores; a set of semantic classes which are combinable in different pairs thereof according to predefined syntactic rules to express the concept, and, for each of the semantic classes, a set of predefined user-selected keywords to be used in searching documents in the text corpus, each keyword expressing a constituent notion represented by the semantic class and which has been identified as being used in relevant expressions where the constituent notion is being conveyed, at least some of the keywords having different meanings from other keywords of the same semantic class and which are not synonymous with the other keywords in the semantic class, and a plurality of the syntactic rules to be applied to identified text portions which include one or more of the predefined keywords, each of the syntactic rules identifying a respective first of the semantic classes and a respective second of the semantic classes, the rule being satisfied when any keyword from the first of the semantic classes and any keyword from the second of the semantic classes are in any one of a plurality of syntactic relationships, whereby different rules identify different combinations of semantic classes; and a concept matching module, which accesses the memory comprising; a software component which labels selected keywords in the text corpus; a software component which associates the labeled keywords with a semantic class, each of the semantic classes including a plurality of the keywords; a software component which labels pairs of keywords of selected semantic classes which are in any one of a plurality of syntactic relationships; and a software component which identifies documents which include a labeled pair of keywords; and a display which allows a user to interact with the concept matching module and input the keywords. - View Dependent Claims (16)
-
-
17. A method for retrieving documents related to a concept from a text corpus of stored documents comprising:
-
for each of a set of at least four predefined semantic classes, which classes are combinable according to predefined syntactic rules in different combinations thereof to express the concept, storing a set of user-selected keywords in computer readable form on computer storage media as being associated with that class, each of at least four of the predefined semantic classes including at least five keywords, at least some of the semantic classes including keywords which are used in relevant expressions in retrieved text when the constituent notion is being conveyed and including keywords having different meanings from other keywords of the same semantic class and which are not synonymous with the other keywords of the semantic class; labeling keywords in the text corpus which belong to at least one of the predefined semantic classes; labeling pairs of labeled keywords, using a computer, which are in any one of a plurality of syntactic relationships and which meet one of the predefined syntactic rules, each of the syntactic rules identifying a pair of semantic classes comprising a respective first of the semantic classes and a respective second of the semantic classes, whereby different rules identify different pairs of semantic classes, the rule being satisfied when any first keyword from the first of the pair of semantic classes and any second keyword from the second of the pair of semantic classes are in any one of a plurality of syntactic relationships, wherein at least some of the keyword pairs are assigned a weight indicative of retrieval performance; and labeling documents in the corpus which include at least one labeled pair; and retrieving at least a portion of the documents from the corpus which include at least one labeled pair based on a ranking which takes into account the weight accorded to the keyword pairs. - View Dependent Claims (18, 19, 20)
-
Specification