Method of and system for disambiguating syntactic word multiples

US 6,260,008 B1
Filed: 01/08/1998
Issued: 07/10/2001
Est. Priority Date: 01/08/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of disambiguating first and second words occurring in a first predetermined syntactic relationship, which comprises collocation of the first and second words, the method comprising the sequential steps:

(a) forming a plurality of first sets, each of which comprises;

a first subset containing a plurality of senses of the first word and a second subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the first word and which are semantically similar senses, (b) forming a plurality of second sets, each of which comprises;

a third subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the second word and which are semantically similar senses; and

a fourth subset containing a plurality of senses of the second word, and (c) selecting an output set comprising each sense of the first word and each sense of the second word in which the senses of the first and second words occur together in at least one of the first sets comprising the first and second subsets and in at least one of the second sets comprising the third and fourth subsets once all pairwise combinations of associated words in the sets and associated subsets have been used, wherein the first and second words occur in the first predetermined syntactic relationship which comprises collocation of the first and second words in a sample of text;

at least one of the words whose senses are contained in the second subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the first word in the sample of text; and

at least one of the words whose senses are contained in the third subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the second word in the sample of text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided for disambiguating multiples of syntactically related words automatically using the notion of semantic similarity between words. Based on syntactically related words derived from a sample text, a set is formed containing each associating word and the words associated in the syntactic relationship with it. The associating words are expanded to all word senses. Pair wise intersections of the resulting sets are formed so as to form pairs of semantically compatible word clusters which may be stored as pairs of cooccurrence restriction codes.

167 Citations

9 Claims

1. A method of disambiguating first and second words occurring in a first predetermined syntactic relationship, which comprises collocation of the first and second words, the method comprising the sequential steps:
- (a) forming a plurality of first sets, each of which comprises;
  
  a first subset containing a plurality of senses of the first word and a second subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the first word and which are semantically similar senses, (b) forming a plurality of second sets, each of which comprises;
  
  a third subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the second word and which are semantically similar senses; and
  
  a fourth subset containing a plurality of senses of the second word, and (c) selecting an output set comprising each sense of the first word and each sense of the second word in which the senses of the first and second words occur together in at least one of the first sets comprising the first and second subsets and in at least one of the second sets comprising the third and fourth subsets once all pairwise combinations of associated words in the sets and associated subsets have been used, wherein the first and second words occur in the first predetermined syntactic relationship which comprises collocation of the first and second words in a sample of text;
  
  at least one of the words whose senses are contained in the second subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the first word in the sample of text; and
  
  at least one of the words whose senses are contained in the third subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the second word in the sample of text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method as claimed in claim 1, wherein each of the sequential steps (a) to (c) are repeated for all first and second words occurring in the first predetermined syntactic relationship which comprises collocation of the first and second words in the sample of text.
  - 3. A method as claimed in claim 1, wherein each of the sequential steps (a) to (c) are repeated for at least one further predetermined syntactic relationship which comprises collocation of the first and second words.
  - 4. A method as claimed in claim 1, further comprising:
5. A method as claimed in claim 1, further comprising at least one of steps of:
- (i) removing statistically inconspicuous word senses from said first subset before carrying out step (c); and
  
  (ii) removing statistically inconspicuous word senses from said third subset before carrying out step (c).
6. A method as claimed in claim 5, wherein at least one of said statistically inconspicuous word senses is added to said output set and is placed with one or more words in said output set with which it is semantically similar.
7. A method as claimed in claim 1, wherein, in cases where said output set contains more than one of said first word senses, a preference is given to the first word sense having the greatest semantic similarity with other first word senses.
8. A storage medium containing a program for controlling a programmable data processor to perform a method as claimed in claim 1.

9. A system for disambiguating first and second words occurring in a first predetermined syntactic relationship which comprises collocation of the first and second words, the system comprising a data processor programmed to perform the steps of:
- (a) forming a plurality of first sets, each of which comprises;
  
  a first subset containing a plurality of senses of the first word and a second subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the first word and which are semantically similar senses, (b) forming a plurality of second sets, each of which comprises;
  
  a third subset containing a plurality of word senses which are capable of being in the first predetermined syntactic relationship with the second word and which are semantically similar senses and a fourth subset containing a plurality of senses of the second word, and (c) selecting an output set comprising each sense of the first word and each sense of the second word in which the senses of the first and second words occur together in at least one of the first sets comprising the first and second subsets and in at least one of the second sets comprising the third and fourth subsets once all pairwise combinations of associated words in the sets and associated subsets have been used, wherein the first and second words occur in the first predetermined syntactic relationship which comprises collocation of the first and second words in a sample of text;
  
  at least one of the words whose senses are contained in the second subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the first word in the sample of text; and
  
  at least one of the words whose senses are contained in the third subset occur in the first predetermined syntactic relationship which comprises collocation of the first and second words to the second word in the sample of text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Original Assignee
Sharp Kabushiki Kaisha (Hon Hai Precision Industry Co., Ltd.)
Inventors
Sanfilippo, Antonio Pietro
Primary Examiner(s)
Isen, Forester W.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US09/004,480
Time in Patent Office

1,279 Days
Field of Search

704/1, 704/9, 704/10, 707/530, 707/531
US Class Current

704/9
CPC Class Codes

G06F 40/253 Grammatical analysis; Style...

Method of and system for disambiguating syntactic word multiples

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

167 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method of and system for disambiguating syntactic word multiples

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

167 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links