Method and system for document indexing and data querying
First Claim
1. A method for generating a document index, comprising:
- generating a preset filter character list, wherein generating includes;
determining monadic partitions from a sample set of documents, wherein monadic partitions comprise character text;
determining an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and
including a subset of the monadic partitions into the preset filter character list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions;
obtaining a document to be indexed;
performing a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document;
for a first monadic partition in the plurality of monadic partitions associated with the document;
determining that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and
in response to the determination that the first monadic partition is the first filter character monadic partition;
not adding a first entry in the document index corresponding to the first filter character monadic partition;
forming a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the plurality of monadic partitions associated with the document, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the document; and
adding the first entry in the document index corresponding to the polynary partition; and
for a second monadic partition in the plurality of monadic partitions associated with the document;
determining that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; and
in response to the determination that the second monadic partition is not the second filter character monadic partition, adding a second entry in the document index corresponding to the second monadic partition.
1 Assignment
0 Petitions
Accused Products
Abstract
Generating a document index comprises: obtaining a document to be indexed; determining whether each monadic partition obtained from the document is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and indexing the polynary partition, otherwise, indexing the monadic partition. Querying data comprising: receiving a data query, determining whether each monadic partition obtained from the data query is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and using the polynary partition to obtain search results, otherwise, using the monadic partition to obtain search results; and combining search results to form a final query search result.
-
Citations
12 Claims
-
1. A method for generating a document index, comprising:
-
generating a preset filter character list, wherein generating includes; determining monadic partitions from a sample set of documents, wherein monadic partitions comprise character text; determining an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and including a subset of the monadic partitions into the preset filter character list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions; obtaining a document to be indexed; performing a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document; for a first monadic partition in the plurality of monadic partitions associated with the document; determining that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and in response to the determination that the first monadic partition is the first filter character monadic partition; not adding a first entry in the document index corresponding to the first filter character monadic partition; forming a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the plurality of monadic partitions associated with the document, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the document; and adding the first entry in the document index corresponding to the polynary partition; and for a second monadic partition in the plurality of monadic partitions associated with the document; determining that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; and in response to the determination that the second monadic partition is not the second filter character monadic partition, adding a second entry in the document index corresponding to the second monadic partition. - View Dependent Claims (2, 3, 4)
-
-
5. A method for querying data, comprising:
-
generating a preset filter characters list, wherein generating includes; determining monadic partitions from a sample set of documents, wherein monadic partitions comprise character text; determining an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and including a subset of the monadic partitions into the preset filter characters list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions; receiving a data query; performing a monadic partition operation on the data query to obtain a first plurality of monadic partitions associated with the data query; for a first monadic partition in the first plurality of monadic partitions associated with the data query; determining that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and in response to the determination that the first monadic partition is the first filter character monadic partition; not searching a preset index using the first filter character monadic partition; forming a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the first plurality of monadic partitions associated with the data query, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the data query; and searching the preset index using the polynary partition to obtain a search result corresponding to the polynary partition; and for a second monadic partition in the first plurality of monadic partitions associated with the data query; determining that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; in response to the determination that the second monadic partition is not the second filter character monadic partition, searching the preset index using the second monadic partition to obtain a search result corresponding to the second monadic partition; and combining the search results to form a final query search result. - View Dependent Claims (6, 7)
-
-
8. A document indexing system, comprising:
-
one or more processors coupled to an interface, configured to; generate a preset filter characters list, wherein to generate includes to; determine monadic partitions from a sample set of documents, wherein monadic partitions comprise character text; determine an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and include a subset of the monadic partitions into the preset filter characters list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions; obtain a document to be indexed; perform a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document; for a first monadic partition in the plurality of monadic partitions; determine that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and in response to the determination that the monadic partition is the first filter character monadic partition; do not add a first entry in a document index corresponding to the first filter character monadic partition; form a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the plurality of monadic partitions associated with the document, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the document; and add the first entry in the document index corresponding to the polynary partition; and for a second monadic partition in the plurality of monadic partitions associated with the document; determine that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; and in response to the determination that the second monadic partition is not the second filter character monadic partition, add a second entry in the document index corresponding to the second monadic partition; and one or more memories coupled to the one or more processors, configured to provide the processors with instructions. - View Dependent Claims (9)
-
-
10. A data querying system, comprising:
-
one or more processors coupled to an interface, configured to; generate a preset filter characters list, wherein to generate includes to; determine monadic partitions from a sample set of documents, wherein monadic partitions comprise character text; determine an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and include a subset of the monadic partitions into the preset filter characters list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions; receive a data query; perform a monadic partition operation on the data query to obtain a first plurality of monadic partitions associated with the data query; for a first monadic partition in the first plurality of monadic partitions associated with the data query; determine that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and in response to the determination that the first monadic partition is the first filter character monadic partition; not search a preset index using the first filter character monadic partition; form a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the first plurality of monadic partitions associated with the data query, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the data query; and search the preset index using the polynary partition to obtain a search result corresponding to the polynary partition; for a second monadic partition in the first plurality of monadic partitions associated with the data query; determine that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; and in response to the determination that the second monadic partition is not the second filter character monadic partition, search the preset index using the second monadic partition to obtain a search result corresponding to the second monadic partition; and combine the search results to form a final query search result; and one or more memories coupled to the one or more processors, configured to provide the processors with instruction. - View Dependent Claims (11, 12)
-
Specification