Methods and/or systems for selecting data sets

US 6,353,827 B1
Filed: 09/22/1998
Issued: 03/05/2002
Est. Priority Date: 09/04/1997
Status: Expired due to Term

First Claim

Patent Images

1. Apparatus for determining a measure of similarity between at least a first and a second data set, said apparatus comprising:

i) input means for receiving at least said first and second data sets;

ii) processing means for identifying a set of keywords in at least the first of the data sets, the processing means having access to at least one rule set and identifying the set of keywords by use of said at least one rule set, the processing means further determining said measure of similarity; and

iii) output means to output said measure of similarity;

wherein said rule set includes a rule concerning relative location of data items in a respective data set, and wherein said processing means determines the measure of similarity by comparing at least one set of key words, identified by said processing means in the first data set, with a set of keywords comprising or derived from said second data set;

said relative location of data items in a respective data set comprises adjacent location of at least two potential key words with respect to each other in the data set, the processing means identifying such adjacent potential key words as together providing a single key word in an identified set of key words; and

said at least one rule set comprises at least one of the following criteria;

1) a noun followed by a noun or a predetermined set of indicia;

2) a verb followed by a noun or a predetermined set of indicia;

3) an adjective followed by a noun or a predetermined set of indicia; and

4) a predetermined set of indicia followed by a noun or a verb or a further predetermined set of indicia;

the processing means identifying adjacent potential key words as together providing a single key word in an identified set of key words only when they meet said at least one criterion.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for identifying associated key words in a data set. Associated key words are identified by a parser which firstly operates to extract key words from a data set. These key words are then analyzed by the parser to identify which key words, if any, have an association as determined by a predefined set of rules. These rules are grammatical and include, for example, two key words both being nouns that occur one after the other without intervening low value words. A similar rule applies to nouns followed by verbs but does not extend to verbs followed by nouns. These rules allow terms and phrases such as “information technology” and “wide area network” to be identified as associated key words rather than as individual and unrelated key words.

Citations

6 Claims

1. Apparatus for determining a measure of similarity between at least a first and a second data set, said apparatus comprising:
- i) input means for receiving at least said first and second data sets;
  
  ii) processing means for identifying a set of keywords in at least the first of the data sets, the processing means having access to at least one rule set and identifying the set of keywords by use of said at least one rule set, the processing means further determining said measure of similarity; and
  
  iii) output means to output said measure of similarity;
  
  wherein said rule set includes a rule concerning relative location of data items in a respective data set, and wherein said processing means determines the measure of similarity by comparing at least one set of key words, identified by said processing means in the first data set, with a set of keywords comprising or derived from said second data set;
  
  said relative location of data items in a respective data set comprises adjacent location of at least two potential key words with respect to each other in the data set, the processing means identifying such adjacent potential key words as together providing a single key word in an identified set of key words; and
  
  said at least one rule set comprises at least one of the following criteria;
  
  1) a noun followed by a noun or a predetermined set of indicia;
  
  2) a verb followed by a noun or a predetermined set of indicia;
  
  3) an adjective followed by a noun or a predetermined set of indicia; and
  
  4) a predetermined set of indicia followed by a noun or a verb or a further predetermined set of indicia;
  
  the processing means identifying adjacent potential key words as together providing a single key word in an identified set of key words only when they meet said at least one criterion.
- View Dependent Claims (2, 3, 4, 5)
- - 2. Apparatus as claimed in claim 1 further comprising information retrieval means and a data store, said first data set comprising data retrieved from an information base by said information retrieval means and said second data set comprising a set of key words stored in said data store.
  - 3. Apparatus as claimed in claim 2, wherein said second data set defines a target data set for use in data retrieval, by said information retrieval means, from said information base whereby said first data set is identified by said processing means as containing said target data set when said measure of similarity exceeds a predetermined threshold.
  - 4. Apparatus as claimed in claim 2, wherein said data store comprises a plurality of keyword sets identified, by said processing means, from a plurality of data sets retrieved, by said information retrieval means, from said information base, wherein said processing means defines a plurality of relationships between said data sets dependent on the measure of similarity calculated for each pair of data sets.
  - 5. Apparatus as claimed in claim 1 further comprising information retrieval means, said first and second data sets comprising data retrieved from an information base by said information retrieval means, the processing means identifying a set of keywords in each of said first and second data sets and determining the measure of similarity by comparing the respective sets of key words.

6. A method of determining a level of similarity between first and second data sets, wherein said method comprises the steps of:
- i) applying identifying tags to selected data items in at least the first of the data sets in accordance with at least a first rule;
  
  ii) identifying a set of potential key words by reference to either the presence or the absence of said identifying tags;
  
  iii) selecting sets of two or more potential keywords which are adjacent by applying at least a second rule;
  
  iv) classifying each selected set of potential keywords as a single keyword;
  
  v) generating a set of keywords which comprises each classified set of potential keywords as a single keyword, together with the remaining keywords from the identified set of potential keywords;
  
  vi) comparing the generated set of keywords with a set of keywords either comprising or derived form the second data set; and
  
  said first rule relates at least in part to the grammatical category of the data items;
  
  said at least a second rule comprises one or more rules from the following set;
  
  1) a noun followed by a noun or a predetermined set of indicia;
  
  2) a verb followed by a noun or a predetermined set of indicia;
  
  3) an adjective followed by a noun or a predetermined set of indicia; and
  
  4) a predetermined set of indicia followed by a noun or a verb or a further predetermined set of indicia.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Davies, Nicholas John, Weeks, Richard
Primary Examiner(s)
Black, Thomas
Assistant Examiner(s)
Wang, Mary

Application Number

US09/155,172
Time in Patent Office

1,260 Days
Field of Search

707/1, 707/4, 707/6
US Class Current

707/769
CPC Class Codes

G06F 16/313   Selection or weighting of t...

Y10S 707/918   Location

Y10S 707/99936   Pattern matching access

Methods and/or systems for selecting data sets

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and/or systems for selecting data sets

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links