METHOD AND SYSTEM FOR ENHANCED DATA SEARCHING
First Claim
1. 2.(New)A method in a computer system for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for syntactically indexing and searching data sets to achieve more accurate search results are provided. Example embodiments provide a Syntactic Query Engine ("SQE") that parses, indexes, and stores a data set, as well as processes natural language queries subsequently submitted against the data set. The SQE comprises a Query Preprocessor, a Data Set Preprocessor, a Query Builder, a Data Set Indexer, an Enhanced Natural Language Parser ("ENLP"), a data set repository, and, in some embodiments, a user interface. After preprocessing the data set, the SQE parses the data set and determines the syntactic and grammatical roles of each term to generate enhanced data representations for each object in the data set. The SQE indexes and stores these enhanced data representations in the data set repository. Upon subsequently receiving a query, the SQE parses the query similarly and searches the indexed stored data set to locate data that contains similar terms used in similar grammatical roles. In this manner, the SQE is able to achieve more contextually accurate search results more frequently than using traditional search engines.
230 Citations
93 Claims
-
1. 2.(New)A method in a computer system for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
- 2. 3.(New)The method of claim 2 wherein heuristics are used to determine the additional grammatical role for the at least one of the meaningful terms.
- 3. 4.(New)The method of claim 3 wherein a meaningful term is associated with a verb modifier as the determined grammatical role and is associated with an object as the additional grammatical role.
- 17. 18.(New)The method of claim 17 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
- 18. 19.(New)The method of claim 18 wherein the objects in the corpus are sentences and indications of sentences that satisfy the query are returned.
-
23. 24.(New)The method of claim 23 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
-
25. 26.(New)A computer-readable memory medium containing instructions for controlling a computer processor to transform a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, by:
-
26. 27. (New)A syntactic query engine for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
- 27. 28.(New)The query engine of claim 27 wherein the postprocessor uses heuristics to determine the additional grammatical role for the at least one of the meaningful terms.
- 28. 29.(New)The query engine of claim 28 wherein the postprocessor associates a meaningful term with a verb modifier as the determined grammatical role and with an object as the additional grammatical role.
- 42. 43.(New)The query engine of claim 42 wherein the query processor returns results that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
- 43. 44.(New)The query engine of claim 43 wherein the objects in the corpus are sentences and the query processor returns indications of sentences that satisfy the query.
-
49. 50.(New)A method in a computer system for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
- 50. 51.(New)The method of claim 50, further comprising storing the full grammar of the sentence.
- 59. 60.(New)The method of claim 59 wherein results are returned that satisfy the query when an object in the corpus contains similar terms associated with similar grammatical roles to the terms and their associated roles as stored in the enhanced data representation.
- 60. 61.(New)The method of claim 60 wherein the objects in the corpus are sentences and indications of sentences that satisfy the query are returned.
-
65. 66.(New)The method of claim 65 wherein at least one of entailed verbs and related verbs are used to add additional grammatical relationships.
-
67. 68. (New)A computer-readable memory medium containing instructions for controlling a computer processor to transform a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, by:
-
68. 69. (New)A syntactic query engine for transforming a document of a data set into a canonical representation, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
- 69. 70.(New)The query engine of claim 69 wherein the postprocessor stores the full grammar of the sentence.
- 76. 77.(New)The query engine of claim 76 wherein the objects in the corpus are sentences and the query processor returns indications of sentences that satisfy the query.
-
80. 81.(New)A method in a computer system for storing a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
-
81. 82.(New)The method of claim 81, further comprising storing meaningful terms that correspond to a designated attribute.
-
82. 83.(New)The method of claim 82 wherein the designated attribute is at least one of country name, date, money, amount, number, location, person, corporate name, and organization.
-
83. 84.(New)A data processing system comprising a computer processor and a memory, the memory containing structured data that stores a normalized representation of sentence data, the structured data being manipulated by the computer processor under the control of program code and stored in the memory as:
-
84. 85.(New)A computer-readable memory medium containing instructions for controlling a computer processor to store a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
-
85. 86.(New)A computer system for storing a normalized data structure representing a document of a data set, the document having a plurality of sentences, each sentence having a plurality of terms, comprising:
-
86. 87.(New)The system of claim 86, the storage mechanism further structured to store meaningful terms that correspond to a designated attribute.
-
87. 88.(New)The system of claim 87 wherein the designated attribute is at least one of country name, date, money, amount, number, location, person, corporate name, and organization.
-
88. 89.(New)A method in a computer system for transforming an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, comprising:
- 89. 90.(New)The method of claim 89 wherein the objects are audio data and the units of objects are portions of audio data.
-
92. 93. (New)A computer-readable memory medium containing instructions for controlling a computer processor to transform an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, by:
-
93. 94.(New)A query engine in a computer system for transforming an object of a data set into a canonical representation for use in indexing the objects of the data set and in querying the data set, the object being other than a text-only document and having a plurality of units that are specified according to an object-specific grammar, comprising:
Specification