Bootstrapping sense characterizations of occurrences of polysemous words
First Claim
1. A method in a computer system, the method performed in a lexical knowledge base derived from one or more corpora, the lexical knowledge base comprising a network of nodes each representing a word occurrence in the corpora, the lexical knowledge base having word subgraphs each corresponding to one word and containing text segment subgraphs derived from text segments containing the word, the method characterizing the sense of an occurrence of a polysemous word represented as a node of the lexical knowledge base and comprising the steps of:
- selecting a word subgraph of the lexical knowledge base corresponding to a first word;
identifying within the selected word subgraph a first node representing a first occurrence of a second word, the first node having no word sense characterization;
identifying within the selected word subgraph a second node representing a second occurrence of the second word, the second node having a word sense characterization; and
copying the word sense characterization of the second node to the first node.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the second word has a word sense characterization. The embodiment then attributes to the first occurrence of the second word sense characterization of the second occurrence of the second word.
-
Citations
41 Claims
-
1. A method in a computer system, the method performed in a lexical knowledge base derived from one or more corpora, the lexical knowledge base comprising a network of nodes each representing a word occurrence in the corpora, the lexical knowledge base having word subgraphs each corresponding to one word and containing text segment subgraphs derived from text segments containing the word, the method characterizing the sense of an occurrence of a polysemous word represented as a node of the lexical knowledge base and comprising the steps of:
-
selecting a word subgraph of the lexical knowledge base corresponding to a first word; identifying within the selected word subgraph a first node representing a first occurrence of a second word, the first node having no word sense characterization; identifying within the selected word subgraph a second node representing a second occurrence of the second word, the second node having a word sense characterization; and copying the word sense characterization of the second node to the first node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable medium whose contents cause a computer system to characterize the sense of an occurrence of a polysemous word in a lexical knowledge base derived from one or more corpora each comprising a plurality of text segments, by performing the steps of:
-
selecting a plurality of text segments each containing a first word; identifying among the selected text segments a first and second occurrence of a second word, the second occurrence of the second word having a word sense characterization; and attributing to the first occurrence of the second word the word sense characterization of the second occurrence of the second word. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method in a computer system, the method performed in a lexical knowledge base derived from one or more dictionaries, the lexical knowledge base comprising a network of nodes each representing a word occurrence in the dictionaries, the lexical knowledge base containing text segment subgraphs each comprising a plurality of nodes and derived from dictionary text segments, the method characterizing the sense of an occurrence of a polysemous word represented as a node of the lexical knowledge base and comprising the steps of:
-
(a) selecting a pair of words having a high level of semantic coherency; (b) identifying in the lexical knowledge base a plurality of text segment subgraphs between the words of the pair; (c) identifying within the identified plurality of text segment subgraphs a first node having no word sense characterization and representing a first occurrence of a first word; (d) identifying within the identified plurality of text segment subgraphs a second node having a word sense characterization and representing a second occurrence of the second word; and (e) copying the word sense characterization of the second node to the first node. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method in a computer system for bootstrapping the sense characterization of some nodes of a lexical knowledge base to additional nodes of the lexical knowledge base, the lexical knowledge base comprising a network of nodes each representing a word occurrence in the dictionaries, the lexical knowledge base having word subgraphs each corresponding to one word and containing text segment subgraphs derived from dictionary text segments containing the word, the method comprising the steps of
(a) for each of the word subgraphs: -
(1) selecting a proper subset of the text segment subgraphs of the word subgraph having the highest weights; (2) selecting within the selected subset of text segment subgraphs each node, other than nodes representing the word to which the word subgraph corresponds, not having a sense characterization; (3) for each selected node; (A) identifying within the selected subset of text segment subgraphs each node that represents the same word as the selected node and has a sense characterization; (B) rejecting any identified nodes having distinguishing features; (C) choosing one node from the unrejected identified nodes; and (D) copying to the selected node the sense characterization of the chosen node; and (b) for each selected node; (1) copying the new sense characterization of the selected node to a node corresponding to the selected node in each reoccurrence within the lexical knowledge base of the text segment subgraph containing the selected node. - View Dependent Claims (37, 38, 39, 40, 41)
-
Specification