METHOD AND SYSTEM FOR EXTENDING KEYWORD SEARCHING TO SYNTACTICALLY AND SEMANTICALLY ANNOTATED DATA
First Claim
1. A method in a computer system for performing a relationship search of a corpus of documents, each document having at least one sentence, comprising:
- receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity or an action;
transforming the search query into a Boolean expression;
under control of the computer system, automatically determining a set of data objects that match the Boolean expression using a keyword-style search of a data structure that indexes terms of the documents in a memory of the computer system by including, for at least one of a plurality of terms, grammatical relationship information that specifies that the term is a subject, object, or modifier of another term, and including for at least one of the plurality of terms, semantic information that specifies an entity type that identifies the term as a type of person, location, or thing; and
returning an indication of a plurality of matching objects in the corpus that encompass the desired grammatical relationship.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface.
106 Citations
38 Claims
-
1. A method in a computer system for performing a relationship search of a corpus of documents, each document having at least one sentence, comprising:
-
receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity or an action; transforming the search query into a Boolean expression; under control of the computer system, automatically determining a set of data objects that match the Boolean expression using a keyword-style search of a data structure that indexes terms of the documents in a memory of the computer system by including, for at least one of a plurality of terms, grammatical relationship information that specifies that the term is a subject, object, or modifier of another term, and including for at least one of the plurality of terms, semantic information that specifies an entity type that identifies the term as a type of person, location, or thing; and returning an indication of a plurality of matching objects in the corpus that encompass the desired grammatical relationship. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable memory medium containing instructions that control a computer processor to search a corpus of documents, each document having at least one sentence, by performing a method comprising:
-
receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity or an action; transforming the search query into a Boolean expression; determining a set of data objects that match the Boolean expression using a keyword-style search of a data structure that indexes terms of the documents by including, for at least one of a plurality of the terms, grammatical relationship information that specifies that the term is a subject, object, or modifier of another term, and including for at least one of the plurality of terms, semantic information that specifies an entity type that identifies the term as a type of person, location, or thing; and returning an indication of a plurality of matching objects in the corpus that encompass the desired relationship. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A relationship search engine that searches a corpus of documents, each document having at least one sentence, comprising:
-
a memory; a data structure that is configured to index and store in the memory terms of the documents along with annotations that include relationship information, each annotation associated with at least one term, wherein the relationship information stored with at least a corresponding one of the terms specifies an entity type that identifies the corresponding term as a type of person, place, or thing; a keyword search engine that is configured, when executed on a computer processor, to perform pattern matches of an input string against the data structure and return an indication of a plurality of matching objects of the corpus; and a query processor that is configured, when executed on a computer processor, to receive a relationship search query that is indicative of at least one syntactically or semantically annotated term; transform the relationship search query into at least one Boolean expression; and invoke the keyword search engine to determine and return indications to objects that match the at least one Boolean expression by pattern matching the at least one annotated term indicated by the search query to the data structure, such that each matching object encompasses the relationship specified by the relationship search. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
Specification