Bootstrapping sense characterizations of occurrences of polysemous words in dictionary representations of a lexical knowledge base in computer memory
First Claim
1. A computer memory containing a dictionary representation representing a fully sense disambiguated lexical knowledge base containing a multiplicity of nodes each representing an occurrence of a word in a dictionary from which the lexical knowledge base was derived, the dictionary representation containing for each node a sense characterization, and being used for natural language processing, the dictionary representation comprising:
- a sense characterization for each of a first plurality of the multiplicity of nodes, representing a first plurality of words, the sense characterization being based upon dictionary-provided sense characterizations;
a sense characterization for each of a second plurality of the multiplicity of nodes representing a second plurality of words, which occur in dictionary text segments, the sense characterization being copied from a third plurality of the multiplicity of nodes representing occurrences of the second plurality of words found in the dictionary text segments which include a common word also found in the dictionary text segments associated with the second plurality of nodes; and
a sense characterization for each of a fourth plurality of the multiplicity of nodes, not among the first, second or third plurality of nodes, assigned in accordance with default senses identified for each word, the sense characterizations providing a computer readable indication of word sense.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the second word has a word sense characterization. The embodiment then attributes to the first occurrence of the second word sense characterization of the second occurrence of the second word.
-
Citations
6 Claims
-
1. A computer memory containing a dictionary representation representing a fully sense disambiguated lexical knowledge base containing a multiplicity of nodes each representing an occurrence of a word in a dictionary from which the lexical knowledge base was derived, the dictionary representation containing for each node a sense characterization, and being used for natural language processing, the dictionary representation comprising:
-
a sense characterization for each of a first plurality of the multiplicity of nodes, representing a first plurality of words, the sense characterization being based upon dictionary-provided sense characterizations;
a sense characterization for each of a second plurality of the multiplicity of nodes representing a second plurality of words, which occur in dictionary text segments, the sense characterization being copied from a third plurality of the multiplicity of nodes representing occurrences of the second plurality of words found in the dictionary text segments which include a common word also found in the dictionary text segments associated with the second plurality of nodes; and
a sense characterization for each of a fourth plurality of the multiplicity of nodes, not among the first, second or third plurality of nodes, assigned in accordance with default senses identified for each word, the sense characterizations providing a computer readable indication of word sense.
-
-
2. A computer memory having stored thereon a lexical knowledge base, the lexical knowledge base containing a multiplicity of nodes each representing an occurrence of a word in a dictionary from which the lexical knowledge base was derived, the lexical knowledge base containing for each node a sense characterization, the lexical knowledge base comprising:
-
a sense characterization for each of a first plurality of the multiplicity of nodes based upon dictionary-provided sense characterizations; and
a sense characterization for each of a second plurality of the multiplicity of nodes copied from a third plurality of the multiplicity of nodes representing a different occurrence of the same word as that represented by the second plurality of nodes and wherein the words represented by the second and third plurality of nodes occur in dictionary text segments which contain a common word, the sense characterizations providing a computer recognizable indication of word sense. - View Dependent Claims (3)
-
-
4. A computer readable memory which, when read, provides a data signal conveying data representing a fully sense disambiguated lexical knowledge base containing a multiplicity of nodes each representing an occurrence of a word in a dictionary from which the lexical knowledge base was derived, the lexical knowledge base containing for each node a sense characterization, the lexical knowledge base comprising:
-
a sense characterization for each of a first plurality of the multiplicity of nodes based upon dictionary-provided sense characterizations;
a sense characterization for each of a second plurality of the multiplicity of nodes copied from a third plurality of nodes representing the same word as those represented by the second plurality of nodes, wherein occurrences of words represented by the second and third plurality of nodes occur in different dictionary text segments which include a common word; and
a sense characterization for each of a fourth plurality of the multiplicity of nodes, not among the first, second or third plurality of nodes, assigned in accordance with default senses identified for each word, the sense characterizations providing a computer recognizable indication of word sense.
-
-
5. A computer readable medium which, when read, provides a data signal conveying a lexical knowledge base, the lexical knowledge base containing a multiplicity of nodes each representing an occurrence of a word in a dictionary from which the lexical knowledge base was derived, the lexical knowledge base containing for each node a sense, characterization and further comprising:
-
a sense characterization for each of a first plurality of the multiplicity of nodes representing a first plurality of words, the sense characterization being based upon dictionary-provided sense characterizations; and
a sense characterization for each of a second plurality of the multiplicity of nodes representing a second plurality of words, which occur in dictionary text segments, the sense characterization being copied from a third plurality of the multiplicity of nodes the third plurality of nodes representing occurrences of the second plurality of words found in dictionary text segments which include a common word also found in the dictionary text segments associated with the second plurality of nodes, the sense characterizations providing a computer recognizable indication of word sense. - View Dependent Claims (6)
-
Specification