System and method for word indexing in a capture system and querying thereof
First Claim
Patent Images
1. A method, comprising:
- receiving a query to search a plurality of objects captured by a capture system, the query including a search term;
generating a search token from the search term using a context-aware parser, wherein the context-aware parser uses a list of patterns associated with a content type indicated by the query to generate sub-tokenized search tokens from the search token;
hashing the sub-tokenized search tokens to one or more term bit positions using a hash function;
searching a first word index associated with a first object;
eliminating the first object from the query if a bit is not set in each of the one or more term bit positions of a first bit vector of the first word index, wherein each bit that is set in the first bit vector represents at least one token generated from the first object, and wherein the hashing of the sub-tokenized search tokens includes truncating several of the sub-tokenized search tokens such that they stem to a same token.
11 Assignments
0 Petitions
Accused Products
Abstract
Searching of objects captured by a capture system can be improved by eliminating irrelevant objects from a query. In one embodiment, the present invention includes receiving such a query for objects captured by a capture system, the query including at least one search term. This search term is then hashed to a term bit position using a hash function. Then objects can be eliminated if, in a word index associated with the object, the term bit position is not set.
381 Citations
14 Claims
-
1. A method, comprising:
-
receiving a query to search a plurality of objects captured by a capture system, the query including a search term; generating a search token from the search term using a context-aware parser, wherein the context-aware parser uses a list of patterns associated with a content type indicated by the query to generate sub-tokenized search tokens from the search token; hashing the sub-tokenized search tokens to one or more term bit positions using a hash function; searching a first word index associated with a first object; eliminating the first object from the query if a bit is not set in each of the one or more term bit positions of a first bit vector of the first word index, wherein each bit that is set in the first bit vector represents at least one token generated from the first object, and wherein the hashing of the sub-tokenized search tokens includes truncating several of the sub-tokenized search tokens such that they stem to a same token. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. One or more non-transitory computer readable media containing logic encoded therein for performing operations when the logic is executed by one or more processors, the operations comprising:
-
receiving a query to search a plurality of objects captured by a capture system, the query including a search term; generating a search token from the search term using a context-aware parser, wherein the context-aware parser uses a list of patterns associated with a content type indicated by the query to generate sub-tokenized search tokens from the search token; hashing the sub-tokenized search tokens to one or more term bit positions using a hash function; searching a first word index associated with a first object; eliminating the first object from the query if a bit is not set in each of the one or more term bit positions of a first bit vector of the first word index, wherein each bit that is set in the first bit vector represents at least one token generated from the first object, and wherein the hashing of the sub-tokenized search tokens includes truncating several of the sub-tokenized search tokens such that they stem to a same token. - View Dependent Claims (8, 9, 10)
-
-
11. An apparatus, comprising:
-
a query module; and one or more processors configured to execute instructions associated with the query module such that the apparatus is configured for; receiving a query to search a plurality of objects captured by a capture system, the query including a search term; generating a search token from the search term using a context-aware parser, wherein the context-aware parser uses a list of patterns associated with a content type indicated by the query to generate sub-tokenized search tokens from the search token; hashing the sub-tokenized search tokens to one or more term bit positions using a hash function; searching a first word index associated with a first object; eliminating the first object from the query if a bit is not set in each of the one or more term bit positions of a first bit vector of the first word index, wherein each bit that is set in the first bit vector represents at least one token generated from the first object, and wherein the hashing of the sub-tokenized search tokens includes truncating several of the sub-tokenized search tokens such that they stem to a same token. - View Dependent Claims (12, 13, 14)
-
Specification