Method and system for enhanced data searching
First Claim
1. A method in a computer system for transforming at least one sentence of a document or a query into a canonical representation, each sentence having a plurality of terms, comprising:
- for each sentence,parsing the sentence to generate a parse structure having a plurality of syntactic elements;
determining a set of meaningful terms of the sentence from the syntactic elements;
determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term in the set of meaningful terms, wherein the grammatical role is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, or verb modifier;
determining an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles, wherein the additional grammatical role is not determined from the parse structure and the additional grammatical role indicates that the at least one of the meaningful terms is a subject or an object in addition to the grammatical role determined from the parse structure; and
storing in an enhanced data representation data structure a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that the at least one meaningful term is associated with a plurality of grammatical relationships.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for syntactically indexing and searching data sets to achieve more accurate search results are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set, as well as processes natural language queries subsequently submitted against the data set. The SQE comprises a Query Preprocessor, a Data Set Preprocessor, a Query Builder, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface. After preprocessing the data set, the SQE parses the data set and determines the syntactic and grammatical roles of each term to generate enhanced data representations for each object in the data set. The SQE indexes and stores these enhanced data representations in the data set repository. Upon subsequently receiving a query, the SQE parses the query similarly and searches the indexed stored data set to locate data that contains similar terms used in similar grammatical roles. In this manner, the SQE is able to achieve more contextually accurate search results more frequently than using traditional search engines.
-
Citations
89 Claims
-
1. A method in a computer system for transforming at least one sentence of a document or a query into a canonical representation, each sentence having a plurality of terms, comprising:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements; determining a set of meaningful terms of the sentence from the syntactic elements; determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term in the set of meaningful terms, wherein the grammatical role is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, or verb modifier; determining an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles, wherein the additional grammatical role is not determined from the parse structure and the additional grammatical role indicates that the at least one of the meaningful terms is a subject or an object in addition to the grammatical role determined from the parse structure; and storing in an enhanced data representation data structure a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that the at least one meaningful term is associated with a plurality of grammatical relationships. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
33. A computer-readable memory medium containing instructions for controlling a computer processor to transform at least one sentence of a document or a query into a canonical representation, each sentence having a plurality of terms, by performing a method comprising:
for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements; determining a set of meaningful terms of the sentence from the syntactic elements; determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term in the set of meaningful terms, wherein the grammatical role is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, or verb modifier; determining an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles, wherein the additional grammatical role is not determined from the parse structure and the additional grammatical role indicates that the at least one of the meaningful terms is a subject or an object in addition to the grammatical role determined from the parse structure; and storing in an enhanced data representation data structure a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that the at least one meaningful term is associated with a plurality of grammatical relationships. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
53. A syntactic query engine for transforming at least one sentence of a document or a query into a canonical representation, each sentence having a plurality of terms, comprising:
-
parser residing in a memory medium that is configured to, when executed, decompose each sentence to generate a parse structure for the sentence having a plurality of syntactic elements; and postprocessor residing in a memory medium that is configured to, when executed receive from the parser the parse structure of the sentence; determine a set of meaningful terms of the sentence from the syntactic elements; determine from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term in the set of meaningful terms, wherein the grammatical role is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, or verb modifier; determine an additional grammatical role for at least one of the meaningful terms, such that the at least one meaningful term is associated with at least two different grammatical roles, wherein the additional grammatical role is not determined from the parse structure and the additional grammatical role indicates that the at least one of the meaningful terms is a subject or an object in addition to the grammatical role determined from the parse structure; and store, in an enhanced data representation data structure, a representation of each association between a meaningful term and its determined grammatical roles, in a manner that indicates a grammatical relationship between a plurality of the meaningful terms and such that the at least one meaningful term is associated with a plurality of grammatical relationships. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81)
-
-
82. A method in a computer system for storing a normalized data structure representing at least one sentence of a document or a query, each sentence having a plurality of terms, comprising:
-
determining a set of meaningful terms of each sentence and at least one grammatical role for each meaningful term, wherein the grammatical role is at least one of a subject, object, verb, part of a prepositional phrase, noun modifier, or verb modifier; and storing sets of grammatical relationships between a plurality of meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains a subject table having a set of meaningful term pairs that are subjects relative to the governing verb, an object table having a set of meaningful term pairs that are objects relative to the governing verb, a subject-object table representing an association between the subject table and the object table, a preposition table having a set of meaningful terms that are verb modifiers of prepositional phrases that relate to the governing verb, and a noun modifier table having a set of meaningful term pairs that are noun modifiers of noun phrases that relate to the governing verb. - View Dependent Claims (83, 84)
-
-
85. A data processing system comprising a computer processor and a memory, the memory containing structured data that stores a normalized representation of sentence data, the structured data being manipulated by the computer processor under the control of program code and stored in the memory as:
-
a subject table having a set of meaningful term pairs, each pair having a meaningful term that is associated with a grammatical role of a verb and a meaningful term that is associated with a grammatical role of a subject relative to the verb; an object table having a set of meaningful term pairs, each pair having a meaningful term that is associate with a grammatical role of a verb and a meaningful term that is associated with a grammatical role of an object relative to the verb; a representation of associations between the subject table and the object table, the representation indicating, for each meaningful term associated with the grammatical role of the verb, the meaningful terms that are associated with the grammatical role of subject relative to the verb and the meaningful terms that are associated with the grammatical role of object relative to the verb; a preposition table having a set of meaningful term groups, each group having a meaningful term that is associated with a grammatical role of a verb, a meaningful term that is associated with a grammatical role of a preposition relative to the verb, and a meaningful term that is associated with a grammatical role of a verb modifier relative to the verb; and a noun modifier table having a set of meaningful term pairs, each pair having a meaningful term that is associated with a grammatical role of a noun and a meaningful term that is associated with a grammatical role of an noun modifier relative to the noun.
-
-
86. A computer-readable memory medium containing instructions for controlling a computer processor to store in a data repository a normalized data structure representing at least one sentence of a document or a query, each sentence having a plurality of terms, by:
-
determining a set of meaningful terms of each sentence and at least one grammatical role for each meaningful term; and storing sets of grammatical relationships between a plurality of meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains a subject table having a set of meaningful term pairs that are subjects relative to the governing verb, an object table having a set of meaningful term pairs that are objects relative to the governing verb, a subject-object table representing an association between the subject table and the object table, a preposition table having a set of meaningful terms that are verb modifiers of prepositional phrases that contain the governing verb, and a noun modifier table having a set of meaningful terms that are noun modifiers of noun phrases that relate to the governing verb.
-
-
87. A computer system for storing a normalized data structure representing at least one sentence of a document or a query, each sentence having a plurality of terms, comprising:
-
enhanced parsing mechanism that determines a set of meaningful terms for each sentence and at least one grammatical role for each meaningful term; a data repository; and storage mechanism structured to store in the data repository sets of grammatical relationships between a plurality of the determined meaningful terms based upon the determined grammatical role of each meaningful term relative to a meaningful term that is being used as a governing verb, wherein, for each meaningful term that is being used as a governing verb, the normalized data structure contains entries in a subject table having a set of meaningful term pairs that are subjects relative to the governing verb, an object table having a set of meaningful term pairs that are objects relative to the governing verb, a subject-object table representing a set of associations between the subject table and the object table, a preposition table having a set of meaningful term groups, each group having verb modifiers of prepositional phrases that contain the governing verb, and a noun modifier table having a set of meaningful term pairs that are noun modifiers of noun phrases that relate to the governing verb. - View Dependent Claims (88, 89)
-
Specification