Method and apparatus for concept searching using a Boolean or keyword search engine

US 6,363,373 B1
Filed: 10/01/1998
Issued: 03/26/2002
Est. Priority Date: 10/01/1998
Status: Expired due to Term

First Claim

Patent Images

1. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:

determining whether one of the word tokens in the document is contained in a concept database;

in response to determining that one of the word token s is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database; and

in response to reading the concept identifier, assigning the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Concept searching using a Boolean or keyword search engine. Documents are preprocessed before being passed to a search engine by identifying, on a word-by-word basis, the “word tokens” contained in the document. Once the word tokens have been extracted, each word token is referenced in a concept database that maps word tokens to concept identifiers. The concept identifiers associated with the word tokens are converted into unique non-word concept tokens and arranged into a list. The list is then inserted into the document as invisible but searchable text. The document is then transferred to the server monitored by the search engine. Search queries are preprocessed before being passed to the search engine in the same manner. The query is first broken into word tokens and the word tokens are then referenced in the concept database. All associated concept identifiers are retrieved and converted to unique concept tokens. The concept tokens are then combined into a string and sent to the search engine as an ordinary query.

277 Citations

16 Claims

1. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
- determining whether one of the word tokens in the document is contained in a concept database;
  
  in response to determining that one of the word token s is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database; and
  
  in response to reading the concept identifier, assigning the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching.
- View Dependent Claims (2)
- - 2. The computer-readable medium of claim 1, further comprising the following steps after the assigning step:

3. A computer-readable medium on which is stored a computer program for preprocessing a document comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
- determining whether one of the word tokens is contained in a concept database;
  
  in response to determining that the word token is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database, and reading a numerical weight associated with the word token from the concept database;
  
  in response to reading the concept identifiers and weights, adding the numerical weights to the sum of any numerical weights for previous word tokens associated with the concept identifiers to create a sum of word token weights for each of the plurality of concept identifiers;
  
  in response to adding the weights, determining whether the document contains additional word tokens;
  
  in response to determining that the document contains additional word tokens, incrementing to the next word token contained in said document and repeating from the first determining step; and
  
  in response to determining that the document does not contain additional word tokens, normalizing the sums of word token weights for each of the plurality of concept identifiers, arranging each of the plurality of concept identifiers according to the value of said normalized sums of word token weights, converting each of the plurality of concept identifiers to unique concept tokens, and embedding the concept tokens in the document.

4. A computer-readable medium on which is stored a computer program for preprocessing a query comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
- determining whether one of the word tokens in the query is contained in a concept database;
  
  in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database; and
  
  in response to reading concept identifiers, assigning the concept identifiers to unique non-word concept tokens and passing the concept identifiers to a search engine not otherwise capable of concept searching as search parameters.
- View Dependent Claims (5)
- - 5. The computer-readable medium of claim 4, further comprising the following steps after the reading step and before the assigning step:

6. A computer-readable medium on which is stored a computer program for preprocessing a query comprising one or more word tokens, the computer program comprising instructions which, when executed by a computer, perform the steps of:
- determining whether one of the word tokens in the query is contained in a concept database;
  
  in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database;
  
  in response to reading concept identifiers, assigning the concept identifiers to unique concept tokens, and determining whether the query contains additional word tokens;
  
  in response to determining that the query contains additional word tokens, selecting the next word token contained in the query and repeating from the first determining step; and
  
  in response to determining that the query does not contain additional word tokens, assigning each concept token a normalized weight based upon the number of occurrences of each of the concept tokens, arranging each of the concept tokens according to the value of the normalized weights associated with said concept tokens, and passing the concept tokens and normalized weights to the search engine.
- View Dependent Claims (7)
- - 7. The computer-readable medium of claim 6, wherein the arranging step further comprises removing concept tokens whose normalized weights are less than a threshold value.

8. A method for preprocessing a document comprising one or more word tokens, the method comprising the steps of:
- determining whether one of the word tokens in the document is contained in a concept database; and
  
  in response to determining that the word token is contained in the concept database, reading concept identifiers associated with the word token from the concept database, converting the concept identifiers to unique non-word concept tokens, and embedding the concept tokens in the document for use by a search engine not otherwise capable of concept searching.
- View Dependent Claims (9)
- - 9. The method of claim 8, further comprising the following steps after the embedding step:

10. A method for preprocessing a document comprising one or more word tokens, the method comprising the steps of:
- determining whether one of the word tokens in the document is contained in a concept database;
  
  in response to determining one of the word tokens is contained in the concept database, reading concept identifiers associated with the word token from the concept database, and reading a numerical weight associated with the word token from the concept database;
  
  in response to reading concept identifiers and a numerical weight, adding the numerical weight to the sum of any numerical weights for any previous word tokens associated with the plurality of concept identifiers to create a sum of word token weights for each of said plurality of concept identifiers and determining whether said document contains additional word tokens;
  
  in response to determining that the document contains additional word tokens, selecting the next word token contained in the document and repeating from the determining step; and
  
  in response to determining that the document does not contain additional word tokens, normalizing the sums of word token weights for each of the concept identifiers, arranging each of the concept identifiers according to the value of the normalized sums of word token weights, converting each of the concept identifiers to unique concept tokens, and embedding the concept tokens in the document.

11. A method for preprocessing a query comprising one or more word tokens, the method comprising the steps of:
- determining whether one of the word tokens in the query is contained in a concept database;
  
  in response to determining that the word token is contained in the concept database, reading concept identifiers associated with said word token from said concept database; and
  
  in response to reading concept identifiers, assigning the concept identifiers to unique non-word concept tokens said passing the concept identifiers to the search engine for use by a search engine not otherwise capable of concept searching.
- View Dependent Claims (12)
- - 12. The method of claim 11, further comprising the following steps after the reading step:

13. A method for preprocessing a query comprising a one or more word tokens, the method comprising the steps of:
- determining whether one of the word tokens in the query is contained in a concept database;
  
  in response to determining that the word token is contained in the concept database, reading a plurality of concept identifiers associated with the word token from the concept database, assigning each of the concept identifiers to concept tokens, and determining whether the query contains additional word tokens;
  
  in response to determining that the query contains additional word tokens, selecting the next word token in the query and repeating from the first determining step; and
  
  in response to determining that the query does not contain additional word tokens, assigning each concept token a normalized weight based upon the number of occurrences of each of the concept tokens, arranging each of he concept tokens according to the value of the normalized weights associated with the concept tokens, and passing the concept tokens and normalized weights to the search engine.
- View Dependent Claims (14)
- - 14. The method of claim 13, wherein the arranging step further comprises removing concept tokens whose normalized weights are less than a threshold value.

15. A computer apparatus for preprocessing a document comprising one or more word tokens, the computer apparatus comprising:
- a processor;
  
  a storage unit coupled to the processor, the storage unit maintaining the document and a concept database comprising a plurality of word tokens associated with a plurality of concept identifiers;
  
  a memory coupled to the processor;
  
  the processor being operative to read one of the word tokens from the document;
  
  determine whether the word token is contained in the concept database;
  
  in response to determining that the word token is contained in the concept database, said processor operative to read concept identifiers associated with the word token from the concept database, to read a numerical weight associated with the word token from said concept database, to add the numerical weight to the sum of any numerical weights for any previous word tokens associated with said plurality of concept identifiers to create a sum of word token weights for each of said plurality of concept identifiers, and to determine whether the document contains additional word tokens;
  
  in response to determining that the document contains additional word tokens, said processor operative to read the next word token from said document and repeat from the first determining step; and
  
  in response to determining that the document does not contain additional word tokens, said processor operative to normalize the sums of word token weights for each of the plurality of concept identifiers, to arrange each of said plurality of concept identifiers according to the value of said normalized sums of word token weights, to convert each of said plurality of concept identifiers to unique concept tokens, and to embed the concept tokens in the document.

16. A computer apparatus for preprocessing a query comprising one or more word tokens, the computer apparatus comprising:
- a processor;
  
  a storage unit coupled to the processor, the storage unit maintaining the query and a concept database comprising a plurality of word tokens associated with a plurality of concept identifiers;
  
  a memory coupled to the processor;
  
  the processor being operative to read one of the plurality of word tokens from the query;
  
  determine whether the word token is contained in the concept database;
  
  in response to determining that the word token is contained in the concept database, said processor operative to read concept identifiers associated with the word token from the concept database, to assign each of the concept identifiers to unique concept tokens, and to determine whether the query contains additional word tokens;
  
  in response to determining that the query contains additional word tokens, said processor operative to read the next word token contained in said query and repeat from the first determining step; and
  
  in response to determining that the query does not contain additional word tokens, said processor operative to assign each of the concept tokens a normalized weight based upon the number of occurrences of each of the concept tokens, to arrange each of the concept tokens according to the value of the normalized weights associated with the concept tokens, and to transmit the concept tokens and the normalized weights to the search engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Steinkraus, David W.
Primary Examiner(s)
Alam, Hosain T.
Assistant Examiner(s)
Kindred, Alford W.

Application Number

US09/164,284
Time in Patent Office

1,272 Days
Field of Search

707/3, 707/4-7, 707/500, 707/526, 707/530, 704/4, 704/9, 704/8
US Class Current

1/1
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/3334   Selection or weighting of t...

G06F 16/38   Retrieval characterised by ...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Method and apparatus for concept searching using a Boolean or keyword search engine

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

277 Citations

16 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for concept searching using a Boolean or keyword search engine

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

277 Citations

16 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others