Method and system for enhanced data searching
First Claim
1. A method in a computer system for transforming at least one sentence of a document or a query into a canonical representation using entity tags, comprising:
- receiving a designation of a plurality of entity tags, each entity tag having a type and a value, wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and
for each sentence,parsing the sentence to generate a parse structure having a plurality of syntactic elements that correspond to parts-of-speech;
determining from the syntactic elements of the parse structure a set of terms that correspond to one or more of the designated entity tags; and
for each term of one or more terms of the determined set of terms, storing in an enhanced data representation data structure a representation of an association between the term and a corresponding entity tag, the representation including the term and an indication of the type of the corresponding entity tag, wherein the term is the value of the corresponding entity tag, such that the sentence is represented in the data structure by at least one entity tag.
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for syntactically indexing and searching data sets to achieve more accurate search results and for indexing and searching data sets using entity tags alone or in combination therewith are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set, as well as processes natural language queries subsequently submitted against the data set. The SQE comprises a Query Preprocessor, a Data Set Preprocessor, a Query Builder, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface. After preprocessing the data set, the SQE parses the data set according to a variety of levels of parsing and determines as appropriate the entity tags and syntactic and grammatical roles of each term to generate enhanced data representations for each object in the data set. The SQE indexes and stores these enhanced data representations in the data set repository. Upon subsequently receiving a query, the SQE parses the query also using a variety of parsing levels and searches the indexed stored data set to locate data that contains similar terms used in similar grammatical roles and/or with similar entity tag types as indicated by the query. In this manner, the SQE is able to achieve more contextually accurate search results more frequently than using traditional search engines.
225 Citations
94 Claims
-
1. A method in a computer system for transforming at least one sentence of a document or a query into a canonical representation using entity tags, comprising:
-
receiving a designation of a plurality of entity tags, each entity tag having a type and a value, wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements that correspond to parts-of-speech; determining from the syntactic elements of the parse structure a set of terms that correspond to one or more of the designated entity tags; and for each term of one or more terms of the determined set of terms, storing in an enhanced data representation data structure a representation of an association between the term and a corresponding entity tag, the representation including the term and an indication of the type of the corresponding entity tag, wherein the term is the value of the corresponding entity tag, such that the sentence is represented in the data structure by at least one entity tag. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer-readable memory medium containing content that, when executed, cause a computing system to transform at least one sentence of a document or a query into a canonical representation using entity tags by performing a method comprising:
-
receiving a designation of a plurality of entity tags, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements that represent syntactical attributes of the sentence; determining from the syntactic elements of the parse structure a set of meaningful terms that correspond to one or more of the designated entity tags; and for each of one or more of the meaningful terms, storing in an enhanced data representation data structure a representation of an association between the term and a corresponding entity tag, the representation including the term and an indication of the type of the corresponding entity tag, wherein the term is the value of the corresponding entity tag, such that the sentence is represented in the data structure by at least one entity tag. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
-
50. A syntactic query engine for transforming at least one sentence of a document or a query into a canonical representation using entity tags, comprising:
a memory medium containing a parser that is configured to, when executed on a computer processor, receive a designation of a plurality of entity tags, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and decompose the at least one sentence to generate a parse structure for the sentence having a plurality of syntactic elements that correspond to parts-of-speech; determine from the syntactic elements of the parse structure a set of meaningful terms that correspond to one or more of the designated entity tags; and for each of one or more of the meaningful terms, store, in an enhanced data representation data structure, a representation of an association between the term and the corresponding entity tag, the representation including the term and an indication of the type of the corresponding entity tag, wherein the term is the value of the corresponding entity tag, such that the at least one sentence is represented in the data structure by at least one entity tag. - View Dependent Claims (51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73)
-
74. A method in a computer system for transforming at least one sentence of a document or a query into a canonical representation using entity tags, comprising:
-
receiving a designation of a plurality of entity tags and a designation of at least one grammatical role, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements; determining a set of meaningful terms of the sentence from the syntactic elements of the parse structure; determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term; determining from the set of meaningful terms a first set of terms that corresponds to one or more of the designated entity tags and a second set of terms that corresponds to the designated grammatical role; and storing in an enhanced data representation data structure a representation of an association between a term of the first set, a designated entity tag that corresponds to the term of the first set, and a term of the second set that corresponds to the designated grammatical role, the representation including the term of the first set, an indication of the type of the corresponding entity tag, wherein the term of the first set is the value of the corresponding entity tag, and the term of the second set, such that the sentence is represented by at least one entity tag and one meaningful term having a grammatical role. - View Dependent Claims (75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
-
-
87. A computer-readable medium containing content that, when executed, controls a computing system to transform at least one sentence of a document or a query into a canonical representation using entity tags, by performing a method comprising:
-
receiving a designation of a plurality of entity tags and a designation of at least one grammatical role, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of a sentence that does not represent a part-of-speech and does not represent a grammatical role; and for each sentence, parsing the sentence to generate a parse structure having a plurality of syntactic elements; determining a set of meaningful terms of the sentence from these syntactic elements; determining from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term; determining from the set of meaningful terms a first set of terms that corresponds to one or more of the designated entity tags and a second set of terms that corresponds to the designated grammatical role; and storing in an enhanced data representation data structure a representation of an association between a term of the first set, a designated entity tag that corresponds to the term of the first set, and a term of the second set that corresponds to the designated grammatical role, the representation including the term of the first set, an indication of the type of the corresponding entity tag, wherein the term of the first set is the value of the corresponding entity tag, and the term of the second set, such that the sentence is represented by at least one entity tag and one meaningful term having a grammatical role. - View Dependent Claims (88)
-
-
89. A syntactic query engine for transforming at least one sentence of a document or query into a canonical representation using entity tags, comprising:
a memory containing a parser that is configured to, when executed, receive a designation of a plurality of entity tags and a designation of at least one grammatical role, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of the at least one sentence that does not represent a part-of-speech and does not represent a grammatical role; decompose the at least one sentence to generate a parse structure for the sentence having a plurality of syntactic elements that correspond to parts-of-speech; determine a set of meaningful terms of the at least one sentence from the syntactic elements; determine from the structure of the parse structure and the syntactic elements a grammatical role for each meaningful term; determine from the set of meaningful terms a first set of terms that corresponds to one or more of the designated entity tags and a second set of terms that corresponds to the designated grammatical role; and store, in an enhanced data representation data structure a representation of an association between a term of the first set, a designated entity tag that corresponds to the term of the first set, and a term of the second set that corresponds to the designated grammatical role, the representation including the term of the first set, an indication of the type of the corresponding entity tag, wherein the term of the first set is the value of the corresponding entity tag, and the term of the second set, such that the sentence is represented by at least one entity tag and one meaningful term having a grammatical role. - View Dependent Claims (90)
-
91. A data processing system comprising a computer processor and a memory, the memory containing structured data that stores a normalized representation of sentence data, the structured data being manipulated by the computer processor under the control of program code and stored in the memory as:
an entity table having a set of entity tag pairs, each pair having a term of a sentence that is also a value of a corresponding entity tag and an indication of an entity tag type of the corresponding entity tag, wherein the entity tag type is a possible attribute of the sentence that does not represent a part-of-speech and does not represent a grammatical role.
-
92. A computer-readable memory medium containing instructions that, when executed, control a computing system to store a normalized data structure representing at least one sentence of a document of a data set or a query, by performing a method comprising:
for each sentence, determining a set of terms of the sentence that correspond to a designated set of entity tags, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of the sentence that does not represent a part-of-speech and does not represent a grammatical role; and storing sets of relationships between each determined term and its corresponding entity tag type in the normalized data structure, wherein each determined term is the value of its corresponding entity tag, so as to represent the entire sentence as entity tags.
-
93. A computing system for storing a normalized data structure representing a document of a data set or a query having at least one sentence with a plurality of terms, comprising:
-
enhanced parsing mechanism that determines a set of terms of the sentence that correspond to a designated set of entity tags, each entity tag having a type and a value, and wherein the type of each entity tag is a possible attribute of the sentence that does not represent a part-of-speech and does not represent a grammatical role; and storage mechanism structured to store sets of relationships between each determined term and its corresponding entity tag type in the normalized data structure, wherein each determined term is the value of its corresponding entity tag, so as to represent the entire at least one sentence as entity tags. - View Dependent Claims (94)
-
Specification