Method and system for extending keyword searching to syntactically and semantically annotated data
First Claim
1. A method in a computer system for preparing a corpus of documents for performing electronic searches, each document having at least one sentence, each sentence having a plurality of terms, comprising:
- for each sentence of each document,parsing the sentence under the control of the computer system to generate a parse structure having a plurality of syntactic elements that correspond to the terms of the sentence;
determining from the structure of the parse structure and the plurality of syntactic elements a corresponding grammatical role for each of a plurality of the terms of the sentence, each grammatical role being at least one of a subject, an object, a governing verb, a modifier, or a part of a prepositional phrase;
normalizing the plurality of terms of the sentence having corresponding grammatical roles to a plurality of tagged terms, each tagged term indicating an association between the term of the sentence that corresponds to the grammatical role and an associated tag type that specifies the corresponding grammatical role, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a subject or an object of the sentence, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a modifier of another term of the sentence that has an associated tag type that specifies that the another term is a subject, object, or verb of the sentence, and wherein at least one of the tagged terms has an associated tag type that additionally specifies semantic information that refers to an entity type that identifies the associated term of the sentence as a type of person, location, or thing; and
transforming each sentence to an enhanced data structure of terms stored as one or more inverted indexes of terms annotated with relationship information, wherein the plurality of the tagged terms are stored therein and indexed as additional terms of the sentence, each additional term including the term of the sentence and the associated tag type, thereby enabling a search engine to perform relationship searches by determining from the enhanced data structure whether a designated search term having an associated tag type that specifies a grammatical role and/or an entity type is present in the sentence in a same role, in a manner similar to the manner the search engine uses to determine whether a designated term is present in the sentence, at least one of the relationship searches capable of returning a plurality of relationships between at least two entities as a result of a single search specification.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface.
243 Citations
115 Claims
-
1. A method in a computer system for preparing a corpus of documents for performing electronic searches, each document having at least one sentence, each sentence having a plurality of terms, comprising:
for each sentence of each document, parsing the sentence under the control of the computer system to generate a parse structure having a plurality of syntactic elements that correspond to the terms of the sentence; determining from the structure of the parse structure and the plurality of syntactic elements a corresponding grammatical role for each of a plurality of the terms of the sentence, each grammatical role being at least one of a subject, an object, a governing verb, a modifier, or a part of a prepositional phrase; normalizing the plurality of terms of the sentence having corresponding grammatical roles to a plurality of tagged terms, each tagged term indicating an association between the term of the sentence that corresponds to the grammatical role and an associated tag type that specifies the corresponding grammatical role, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a subject or an object of the sentence, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a modifier of another term of the sentence that has an associated tag type that specifies that the another term is a subject, object, or verb of the sentence, and wherein at least one of the tagged terms has an associated tag type that additionally specifies semantic information that refers to an entity type that identifies the associated term of the sentence as a type of person, location, or thing; and transforming each sentence to an enhanced data structure of terms stored as one or more inverted indexes of terms annotated with relationship information, wherein the plurality of the tagged terms are stored therein and indexed as additional terms of the sentence, each additional term including the term of the sentence and the associated tag type, thereby enabling a search engine to perform relationship searches by determining from the enhanced data structure whether a designated search term having an associated tag type that specifies a grammatical role and/or an entity type is present in the sentence in a same role, in a manner similar to the manner the search engine uses to determine whether a designated term is present in the sentence, at least one of the relationship searches capable of returning a plurality of relationships between at least two entities as a result of a single search specification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
45. A computer-readable memory medium containing contents that, when executed, causes a computing system to index a corpus of documents for electronic searching, each document having at least one sentence, each sentence having a plurality of terms, by performing a method comprising:
for each sentence of each document, parsing the sentence to generate a parse structure having a plurality of syntactic elements that correspond to the terms of the sentence; determining from the structure of the parse structure and the plurality of syntactic elements a corresponding grammatical role for each of a plurality of the terms of the sentence, each grammatical role being at least one of a subject, an object, a governing verb, a modifier, or a part of a prepositional phrase; normalizing the plurality of terms of the sentence having corresponding grammatical roles to a plurality of tagged terms, each tagged term indicating an association between the term of the sentence that corresponds to the grammatical role and an associated tag type that specifies the corresponding grammatical role, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a subject or an object of the sentence, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a modifier of another term of the sentence that has an associated tag type that specifies that the another term is a subject, object, or verb of the sentence, and wherein at least one of the tagged terms has an associated tag type that additionally specifies semantic information that refers to an entity type that identifies the associated term of the sentence as a person, location, or thing; and transforming each sentence to an enhanced data structure of terms stored as one or more inverted indexes of terms annotated with relationship information, wherein the plurality of the tagged terms are stored therein and indexed as additional terms of the sentence, each additional term including the term of the sentence and the associated tag type, thereby enabling a search engine to perform relationship searches by determining from the enhanced data structure whether a designated search term having an associated tag type that specifies a grammatical role and/or an entity type is present in the sentence in a same role, in a manner similar to the manner the search engine uses to determine whether a designated term is present in the sentence, at least one of the relationships search capable of returning a plurality of relationships between at least two entities as a result of a single search specification. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78)
-
79. A computing system that is configured to index a corpus of documents for electronic searching, each document having at least one sentence, each sentence having a plurality of terms, comprising:
-
a parser that is configured, when executed, to parse each sentence of each document to generate a dependency structure that specifies a plurality of syntactic elements that correspond to a plurality of the terms of the sentence and their grammatical relationship to each other; a post processing module that is configured, when executed, to normalize the dependency structure to a plurality of tagged terms, each tagged term indicating an association between the term that corresponds to the syntactic element and an associated tag type, the associated tag type specifying a grammatical role of the corresponding term as used in the sentence, the grammatical role designating at least one of a subject, an object, a governing verb, a modifier, or a part of a prepositional phrase, wherein at least one of the tagged terms has an associated tag type that that specifies that the corresponding term is a subject or an object of the sentence, wherein at least one of the tagged terms has an associated tag type that specifies that the associated term of the sentence is a modifier of another term of the sentence that has an associated tag type that specifies that the another term is a subject, object, or verb of the sentence, and wherein at least one of the tagged terms has an associated tag type that additionally refers to an entity type that identifies the corresponding term as a type of person, place, or thing; and a sentence transformation module that is configured, when executed, to transform the plurality of tagged terms to an enhanced data structure that stores and treats each tagged term as an encoded additional term of the sentence in one or more inverted indexes of terms annotated with relationship information, thereby enabling a search engine, to perform relationship searches by determining from the enhanced data structure whether a designated term having an associated tag type that specifies a desired grammatical role and/or a desired entity type is present in the sentence in a same role, in a manner similar to the manner the search engine uses to determine whether a designated term is present in the sentence, at least one of the relationship searches capable of returning a plurality of relationships between at least two entities as a result of a single search specification. - View Dependent Claims (80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)
-
-
104. A computer-readable memory medium that contains a reverse index for storing a corpus of documents according to terms present in the documents, the index configured to be accessed by a computer processor that is controlled by search engine to match a relationship query against the corpus of documents using pattern or string matching, the index comprising:
-
a plurality of terms, each term of the plurality of terms indicating at least one sentence in which the term occurs; and a plurality of tagged terms, each tagged term specifying a grammatical role that indicates a grammatical relationship of an associated term in the at least one sentence to other terms in the at least one sentence, each tagged term indicating the at least one sentence in which the associated term occurs, at least one of the tagged terms specifying a grammatical role that indicates that the associated term is a subject or an object, at least one of the tagged terms having an associated tag type that specifies that the associated term of the sentence is a modifier of another term of the sentence that has an associated tag type that specifies that the another term is a subject, object, or verb of the sentence, and at least one of the tagged terms additionally specifying a semantic tag that specifies that the associated term is a type of person, location, or thing; such that the search engine can determine, by pattern matching query terms against the terms and tagged terms of the reverse index, a set of sentences that match a relationship indicated by the query. - View Dependent Claims (105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115)
-
Specification