Complex queries for corpus indexing and search
First Claim
1. A computer implemented method, comprising:
- receiving in a memory a complex-query pattern, wherein the complex-query pattern identifies a relationship between a plurality of words using a query language;
receiving a corpus;
transforming with a processor the complex-query pattern into a region matching transducer, wherein the transducer is a form of a finite state network;
determining whether a corpus index exists;
in response to determining the corpus index does not exist;
combining a corpus-level transducer and the region matching transducer; and
applying the combined transducer to the corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in the corpus index;
in response to determining the corpus index exists;
applying the region matching transducer to the corpus to identify strings therein that satisfy the complex-query pattern, with each identified pattern being recorded in an augmented index; and
merging the corpus index with the augmented index specifying locations in the corpus satisfying the complex-query pattern;
storing in the memory the corpus index that records a query tag for indexing locations in the corpus satisfying the complex-query pattern.
6 Assignments
0 Petitions
Accused Products
Abstract
Computer methods, apparatus and articles of manufacture therefor, are disclosed for developing a complex-query pattern that is transformed into a region-matching transducer. A corpus-level transducer and the region matching transducer are combined. The combined transducer is applied to a corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in a corpus index. The corpus and the corpus index are made available for receiving a query with the query tag for querying the corpus and applying the query using the corpus index to identify locations in the corpus that satisfy the query.
33 Citations
19 Claims
-
1. A computer implemented method, comprising:
-
receiving in a memory a complex-query pattern, wherein the complex-query pattern identifies a relationship between a plurality of words using a query language; receiving a corpus; transforming with a processor the complex-query pattern into a region matching transducer, wherein the transducer is a form of a finite state network; determining whether a corpus index exists; in response to determining the corpus index does not exist; combining a corpus-level transducer and the region matching transducer; and applying the combined transducer to the corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in the corpus index; in response to determining the corpus index exists; applying the region matching transducer to the corpus to identify strings therein that satisfy the complex-query pattern, with each identified pattern being recorded in an augmented index; and merging the corpus index with the augmented index specifying locations in the corpus satisfying the complex-query pattern; storing in the memory the corpus index that records a query tag for indexing locations in the corpus satisfying the complex-query pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer apparatus, comprising:
-
a memory for storing processing instructions of the apparatus; and a processor coupled to the memory for executing the processing instructions of the apparatus;
the processor in executing the processing instructions;receiving in the memory a complex-query pattern, wherein the complex-query pattern identifies a relationship between a plurality of words using a query language; receiving a corpus; transforming the complex-query pattern into a region matching transducer, wherein the transducer is a form of a finite state network; determining whether a corpus index exists; in response to determining the corpus index does not exist; combining a corpus-level transducer and the region matching transducer; and applying the combined transducer to the corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in the corpus index; in response to determining the corpus index exists; applying the region matching transducer to the corpus to identify strings therein that satisfy the complex-query pattern, with each identified pattern being recorded in an augmented index; and merging the corpus index with the augmented index specifying locations in the corpus satisfying the complex-query pattern; storing in the memory the corpus index that records a query tag for indexing locations in the corpus satisfying the complex-query pattern. - View Dependent Claims (9, 10, 11)
-
-
12. A computer apparatus, comprising:
-
a means for storing in a memory a complex-query pattern, wherein the complex-query pattern identifies a relationship between a plurality of words using a query language; means for receiving a corpus; means for transforming the complex-query pattern into a region matching transducer, wherein the transducer is a form of a finite state network; means for determining whether a corpus index exists; in response to determining the corpus index does not exist; a means for combining a corpus-level transducer and the region matching transducer; and a means for applying the combined transducer to a corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in a corpus index; in response to determining the corpus index exists; a means for applying the region matching transducer to the corpus to identify strings therein that satisfy the complex-query pattern, with each identified pattern being recorded in an augmented index; and a means for merging the corpus index with the augmented index specifying locations in the corpus satisfying the complex-query pattern; a means for storing in the memory the corpus index that records a query taq for indexing locations in the corpus satisfying the complex query pattern. - View Dependent Claims (13, 14, 15)
-
-
16. An article of manufacture comprising a non-transitory media including computer readable instructions embedded therein that causes a computer to perform a method, wherein the method comprises:
-
receiving in a memory a complex-query pattern, wherein the complex-query pattern identifies a relationship between a plurality of words using a query language; transforming the complex-query pattern into a region matching transducer, wherein the transducer is a form of a finite state network; combining a corpus-level transducer and the region matching transducer; applying the combined transducer to a corpus to identify strings therein that satisfy patterns defined in the corpus-level transducer, including the complex-query pattern, with each identified pattern being recorded in a corpus index; applying the region matching transducer to the corpus to identify strings therein that satisfy the complex-query pattern, with each identified pattern being recorded in an augmented index; merging the corpus index with the augmented index specifying locations in the corpus satisfying the complex-query pattern; and storing in the memory the corpus index that records a query tag for indexing locations in the corpus satisfying the complex-query pattern. - View Dependent Claims (17, 18, 19)
-
Specification