Method and system for document indexing and data querying
First Claim
1. A system, comprising:
- a processor; and
a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor to;
generate a preset filter characters list based at least in part on a sample set of documents and appearance frequencies of monadic partitions that are present in the sample set of documents, wherein the monadic partitions comprise character text;
obtain a document to be indexed;
perform a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document;
determine whether a first monadic partition of the plurality of monadic partitions associated with the document should be indexed directly or indexed with at least one other monadic partition from the plurality of monadic partitions as at least one polynary partition, wherein the determination comprises to;
determine that the first monadic partition matches a filter character monadic partition included in the preset filter characters list;
in response to the determination that the first monadic partition matches the filter character monadic partition, index the first monadic partition as the at least one polynary partition, including to;
determine whether the first monadic partition precedes a second monadic partition in the plurality of monadic partitions associated with the document, wherein the second monadic partition is adjacent to the first monadic partition in the document;
in response to a first determination that the first monadic partition precedes the second monadic partition, form a first binary partition by combining the first monadic partition with the second monadic partition;
determine whether the first monadic partition succeeds a third monadic partition in the plurality of monadic partitions associated with the document, wherein the third monadic partition is adjacent to the first monadic partition in the document;
in response to a second determination that the first monadic partition succeeds the third monadic partition, form a second binary partition by combining the first monadic partition with the third monadic partition; and
add a first entry in a document index corresponding to the first binary partition and a second entry in the document index corresponding to the second binary partition, without directly indexing the first monadic partition in the document index.
0 Assignments
0 Petitions
Accused Products
Abstract
Generating a document index comprises: obtaining a document to be indexed; determining whether each monadic partition obtained from the document is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and indexing the polynary partition, otherwise, indexing the monadic partition. Querying data comprising: receiving a data query, determining whether each monadic partition obtained from the data query is a filter character and if so, forming a polynary partition with the monadic partition and at least one adjacent monadic partition and using the polynary partition to obtain search results, otherwise, using the monadic partition to obtain search results; and combining search results to form a final query search result.
-
Citations
12 Claims
-
1. A system, comprising:
-
a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor to; generate a preset filter characters list based at least in part on a sample set of documents and appearance frequencies of monadic partitions that are present in the sample set of documents, wherein the monadic partitions comprise character text; obtain a document to be indexed; perform a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document; determine whether a first monadic partition of the plurality of monadic partitions associated with the document should be indexed directly or indexed with at least one other monadic partition from the plurality of monadic partitions as at least one polynary partition, wherein the determination comprises to; determine that the first monadic partition matches a filter character monadic partition included in the preset filter characters list; in response to the determination that the first monadic partition matches the filter character monadic partition, index the first monadic partition as the at least one polynary partition, including to; determine whether the first monadic partition precedes a second monadic partition in the plurality of monadic partitions associated with the document, wherein the second monadic partition is adjacent to the first monadic partition in the document; in response to a first determination that the first monadic partition precedes the second monadic partition, form a first binary partition by combining the first monadic partition with the second monadic partition; determine whether the first monadic partition succeeds a third monadic partition in the plurality of monadic partitions associated with the document, wherein the third monadic partition is adjacent to the first monadic partition in the document; in response to a second determination that the first monadic partition succeeds the third monadic partition, form a second binary partition by combining the first monadic partition with the third monadic partition; and add a first entry in a document index corresponding to the first binary partition and a second entry in the document index corresponding to the second binary partition, without directly indexing the first monadic partition in the document index. - View Dependent Claims (2, 3, 4)
-
-
5. A method, comprising:
-
generating a preset filter characters list based at least in part on a sample set of documents and appearance frequencies of monadic partitions that are present in the sample set of documents, wherein the monadic partitions comprise character text; obtaining a document to be indexed; performing a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document; determining whether a first monadic partition of the plurality of monadic partitions associated with the document should be indexed directly or indexed with at least one other monadic partition from the plurality of monadic partitions as at least one polynary partition, wherein the determination comprises; determining that the first monadic partition matches a filter character monadic partition included in the preset filter characters list; in response to the determination that the first monadic partition matches the filter character monadic partition, indexing the first monadic partition as the at least one polynary partition, including; determining whether the first monadic partition precedes a second monadic partition in the plurality of monadic partitions associated with the document, wherein the second monadic partition is adjacent to the first monadic partition in the document; in response to a first determination that the first monadic partition precedes the second monadic partition, forming a first binary partition by combining the first monadic partition with the second monadic partition; determining whether the first monadic partition succeeds a third monadic partition in the plurality of monadic partitions associated with the document, wherein the third monadic partition is adjacent to the first monadic partition in the document; in response to a second determination that the first monadic partition succeeds the third monadic partition, forming a second binary partition by combining the first monadic partition with the third monadic partition; and adding a first entry in a document index corresponding to the first binary partition and a second entry in the document index corresponding to the second binary partition, without directly indexing the first monadic partition. - View Dependent Claims (6, 7, 8)
-
-
9. A system, comprising:
-
a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions which when executed cause the processor to; generate a preset filter characters list based at least in part on a sample set of documents and appearance frequencies of monadic partitions that are present in the sample set of documents, wherein the monadic partitions comprise character text; receive a data query; perform a monadic partition operation on the data query to obtain a plurality of monadic partitions associated with the data query; determine whether a first monadic partition of the plurality of monadic partitions associated with the data query should be searched directly or searched with at least one other monadic partition from the plurality of monadic partitions as at least one polynary partition, wherein the determination comprises to; determine that the first monadic partition matches a filter character monadic partition included in the preset filter characters list; in response to the determination that the first monadic partition matches the filter character monadic partition, searching the first monadic partition as the at least one polynary partition, including to; determine whether the first monadic partition precedes a second monadic partition in the plurality of monadic partitions associated with the data query, wherein the second monadic partition is adjacent to the first monadic partition in the data query; in response to a first determination that the first monadic partition precedes the second monadic partition, form a first binary partition by combining the first monadic partition with the second monadic partition; determine whether the first monadic partition succeeds a third monadic partition in the plurality of monadic partitions associated with the data query, wherein the third monadic partition is adjacent to the first monadic partition in the data query; in response to a second determination that the first monadic partition succeeds the third monadic partition, form a second binary partition by combining the first monadic partition with the third monadic partition; and search a preset index using the first binary partition and the second binary partition, without searching the preset index directly using the first monadic partition. - View Dependent Claims (10, 11, 12)
-
Specification