Method and apparatus for concept searching using a Boolean or keyword search engine
First Claim
1. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
- determining whether one of the word tokens in the document is contained in a concept database;
in response to determining that one of the word token s is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database; and
in response to reading the concept identifier, assigning the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching.
2 Assignments
0 Petitions
Accused Products
Abstract
Concept searching using a Boolean or keyword search engine. Documents are preprocessed before being passed to a search engine by identifying, on a word-by-word basis, the “word tokens” contained in the document. Once the word tokens have been extracted, each word token is referenced in a concept database that maps word tokens to concept identifiers. The concept identifiers associated with the word tokens are converted into unique non-word concept tokens and arranged into a list. The list is then inserted into the document as invisible but searchable text. The document is then transferred to the server monitored by the search engine. Search queries are preprocessed before being passed to the search engine in the same manner. The query is first broken into word tokens and the word tokens are then referenced in the concept database. All associated concept identifiers are retrieved and converted to unique concept tokens. The concept tokens are then combined into a string and sent to the search engine as an ordinary query.
277 Citations
16 Claims
-
1. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
-
determining whether one of the word tokens in the document is contained in a concept database;
in response to determining that one of the word token s is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database; and
in response to reading the concept identifier, assigning the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching. - View Dependent Claims (2)
determining whether the document contains additional word tokens; and
in response to determining that the document contains additional word tokens, incrementing to the next word token contained in said document and repeating from the first determining step.
-
-
3. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
-
determining whether one of the word tokens is contained in a concept database;
in response to determining that the word token is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database, and reading a numerical weight associated with the word token from the concept database;
in response to reading the concept identifiers and weights, adding the numerical weights to the sum of any numerical weights for previous word tokens associated with the concept identifiers to create a sum of word token weights for each of the plurality of concept identifiers;
in response to adding the weights, determining whether the document contains additional word tokens;
in response to determining that the document contains additional word tokens, incrementing to the next word token contained in said document and repeating from the first determining step; and
in response to determining that the document does not contain additional word tokens, normalizing the sums of word token weights for each of the plurality of concept identifiers, arranging each of the plurality of concept identifiers according to the value of said normalized sums of word token weights, converting each of the plurality of concept identifiers to unique concept tokens, and embedding the concept tokens in the document.
-
-
4. A computer-readable medium on which is stored a computer program for preprocessing a query comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
-
determining whether one of the word tokens in the query is contained in a concept database;
in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database; and
in response to reading concept identifiers, assigning the concept identifiers to unique non-word concept tokens and passing the concept identifiers to a search engine not otherwise capable of concept searching as search parameters. - View Dependent Claims (5)
determining whether the query contains additional word tokens; and
in response to determining that the query contains additional word tokens, selecting the next word token contained in the query and repeating from the first determining step.
-
-
6. A computer-readable medium on which is stored a computer program for preprocessing a query comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
-
determining whether one of the word tokens in the query is contained in a concept database;
in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database;
in response to reading concept identifiers, assigning the concept identifiers to unique concept tokens, and determining whether the query contains additional word tokens;
in response to determining that the query contains additional word tokens, selecting the next word token contained in the query and repeating from the first determining step; and
in response to determining that the query does not contain additional word tokens, assigning each concept token a normalized weight based upon the number of occurrences of each of the concept tokens, arranging each of the concept tokens according to the value of the normalized weights associated with said concept tokens, and passing the concept tokens and normalized weights to the search engine. - View Dependent Claims (7)
-
-
8. A method for preprocessing a document comprising one or more word tokens, the method comprising the steps of:
-
determining whether one of the word tokens in the document is contained in a concept database; and
in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database, converting the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching. - View Dependent Claims (9)
determining whether the document contains additional word tokens; and
in response to determining that the document contains additional word tokens, selecting the next word token in the document and repeating from the first determining step.
-
-
10. A method for preprocessing a document comprising one or more word tokens, the method comprising the steps of:
-
determining whether one of the word tokens in the document is contained in a concept database;
in response to determining one of the word tokens is contained in the concept database, reading concept identifiers associated with the word token from the concept database, and reading a numerical weight associated with the word token from the concept database;
in response to reading concept identifiers and a numerical weight, adding the numerical weight to the sum of any numerical weights for any previous word tokens associated with the plurality of concept identifiers to create a sum of word token weights for each of said plurality of concept identifiers and determining whether said document contains additional word tokens;
in response to determining that the document contains additional word tokens, selecting the next word token contained in the document and repeating from the determining step; and
in response to determining that the document does not contain additional word tokens, normalizing the sums of word token weights for each of the concept identifiers, arranging each of the concept identifiers according to the value of the normalized sums of word token weights, converting each of the concept identifiers to unique concept tokens, and embedding the concept tokens in the document.
-
-
11. A method for preprocessing a query comprising one or more word tokens, the method comprising the steps of:
-
determining whether one of the word tokens in the query is contained in a concept database;
in response to determining that the word token is contained in the concept database, reading concept identifiers associated with said word token from said concept database; and
in response to reading concept identifiers, assigning the concept identifiers to unique non-word concept tokens said passing the concept identifiers to the search engine for use by a search engine not otherwise capable of concept searching. - View Dependent Claims (12)
determining whether the query contains additional word tokens; and
in response to determining that the query contains additional word tokens, selecting the next word token in the query and repeating from the first determining step.
-
-
13. A method for preprocessing a query comprising a one or more word tokens, the method comprising the steps of:
-
determining whether one of the word tokens in the query is contained in a concept database;
in response to determining that the word token is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database, assigning each of the concept identifiers to concept tokens, and determining whether the query contains additional word tokens;
in response to determining that the query contains additional word tokens, selecting the next word token in the query and repeating from the first determining step; and
in response to determining that the query does not contain additional word tokens, assigning each concept token a normalized weight based upon the number of occurrences of each of the concept tokens, arranging each of he concept tokens according to the value of the normalized weights associated with the concept tokens, and passing the concept tokens and normalized weights to the search engine. - View Dependent Claims (14)
-
-
15. A computer apparatus for preprocessing a document comprising one or more word tokens, the computer apparatus comprising:
-
a processor;
a storage unit coupled to the processor, the storage unit maintaining the document and a concept database comprising a plurality of word tokens associated with a plurality of concept identifiers;
a memory coupled to the processor;
the processor being operative to read one of the word tokens from the document;
determine whether the word token is contained in the concept database;
in response to determining that the word token is contained in the concept database, said processor operative to read concept identifiers associated with the word token from the concept database, to read a numerical weight associated with the word token from said concept database, to add the numerical weight to the sum of any numerical weights for any previous word tokens associated with said plurality of concept identifiers to create a sum of word token weights for each of said plurality of concept identifiers, and to determine whether the document contains additional word tokens;
in response to determining that the document contains additional word tokens, said processor operative to read the next word token from said document and repeat from the first determining step; and
in response to determining that the document does not contain additional word tokens, said processor operative to normalize the sums of word token weights for each of the plurality of concept identifiers, to arrange each of said plurality of concept identifiers according to the value of said normalized sums of word token weights, to convert each of said plurality of concept identifiers to unique concept tokens, and to embed the concept tokens in the document.
-
-
16. A computer apparatus for preprocessing a query comprising one or more word tokens, the computer apparatus comprising:
-
a processor;
a storage unit coupled to the processor, the storage unit maintaining the query and a concept database comprising a plurality of word tokens associated with a plurality of concept identifiers;
a memory coupled to the processor;
the processor being operative to read one of the plurality of word tokens from the query;
determine whether the word token is contained in the concept database;
in response to determining that the word token is contained in the concept database, said processor operative to read concept identifiers associated with the word token from the concept database, to assign each of the concept identifiers to unique concept tokens, and to determine whether the query contains additional word tokens;
in response to determining that the query contains additional word tokens, said processor operative to read the next word token contained in said query and repeat from the first determining step; and
in response to determining that the query does not contain additional word tokens, said processor operative to assign each of the concept tokens a normalized weight based upon the number of occurrences of each of the concept tokens, to arrange each of the concept tokens according to the value of the normalized weights associated with the concept tokens, and to transmit the concept tokens and the normalized weights to the search engine.
-
Specification