System and method for semantic categorization
First Claim
1. A method comprising:
- a) creating a set of text descriptions, wherein the set of text descriptions comprises, for each category in a category set, a corresponding text description for the category;
b) accepting the set of text descriptions;
c) identifying each word from a lexical data source which is related to a word in the set of text descriptions by less than a threshold number of semantic relations;
d) creating a build time set of word pairs, each word pair from the build time set of word pairs comprising a word from the identified words from the lexical data source and a word from the set of text descriptions;
e) using, without human intervention, a processor to assign lexical chaining confidence scores to each word pair from the build time set of word pairs;
f) accepting a text statement from an input source;
g) creating a run time set of word pairs, each word pair from the run time set of word pairs comprising a word from the accepted text statement, and a word from the set of text descriptions; and
h) determining at least one category corresponding to the accepted text statement based, at least in part, on said assigned lexical chaining confidence scores for word pairs from the build time set of word pairs corresponding to word pairs from the run time set of word pairs.
4 Assignments
0 Petitions
Accused Products
Abstract
There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.
42 Citations
16 Claims
-
1. A method comprising:
-
a) creating a set of text descriptions, wherein the set of text descriptions comprises, for each category in a category set, a corresponding text description for the category; b) accepting the set of text descriptions; c) identifying each word from a lexical data source which is related to a word in the set of text descriptions by less than a threshold number of semantic relations; d) creating a build time set of word pairs, each word pair from the build time set of word pairs comprising a word from the identified words from the lexical data source and a word from the set of text descriptions; e) using, without human intervention, a processor to assign lexical chaining confidence scores to each word pair from the build time set of word pairs; f) accepting a text statement from an input source; g) creating a run time set of word pairs, each word pair from the run time set of word pairs comprising a word from the accepted text statement, and a word from the set of text descriptions; and h) determining at least one category corresponding to the accepted text statement based, at least in part, on said assigned lexical chaining confidence scores for word pairs from the build time set of word pairs corresponding to word pairs from the run time set of word pairs. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a) a configurator configured to; i) accept at least one text description pertaining to a category set, wherein the at least one text descriptions comprises, for each category in the category set, a corresponding text description for the category, wherein the corresponding text description comprises one or more sentences describing the category; ii) accept words that are anticipated to be provided as input pertaining to said category set; iii) receive lexical data from a lexical data source, said lexical data pertaining to received ones of said accepted words and received ones of words in said at least one text description; and b) a database for storing lexical confidence scores for use by said configurator based upon word pairs created by said configurator between said anticipated words and said words from said at least one text description, said lexical confidence scores based, at least in part, on lexical data received from said lexical data source; c) a categorizer configured to; i) accept a text statement from an input source; ii) accept at least one category set pertaining to said accepted statement, said category set having a plurality of possible categories; iii) creating a run time set of word pairs by pairing words in the accepted text statement with words in the corresponding text description for a category from category set; and iv) calculate a category confidence score for the category from the category set by summing up lexical confidence scores stored in said database for the word pairs from the run time set of word pairs. - View Dependent Claims (8, 9)
-
-
10. A system comprising:
-
a) a categorizer configured to; i) accept a text statement from an input source; ii) accept at least one category set pertaining to said accepted statement, said category set having a plurality of possible categories; iii) use a means for determining at least one category corresponding to said accepted statement based, at least in part, on assigned lexical chaining confidence scores between a set of created word pairs as obtained from a database, said confidence scores derived under control of a configurator; b) the configurator, wherein the configurator is configured to; i) accept; at least one text description pertaining to a category in a category set; and words that are anticipated to be provided as input pertaining to said category set; ii) receive lexical data from a lexical data source, said lexical data pertaining to received ones of said accepted words and received ones of words in said text description; and iii) use a means for assigning said lexical chaining confidence scores based upon word pairs between said anticipated words and said words from said text description, said scores based, at least in part, on lexical data received from said lexical data source; c) the database for storing lexical confidence scores for use by said configurator based upon said word pairs between said anticipated words and said words from said text description, said lexical confidence scores based, at least in part, on lexical data received from said lexical data source.
-
-
11. A system comprising:
-
a) a database; b) one or more processors; and c) a configurator, wherein the configurator is programmed to, when run using the one or more processors, perform a plurality of tasks comprising; i) accepting one or more of text descriptions, wherein each text description from the one or more text descriptions pertains to a category from a category set, and wherein each text description from the one or more text descriptions comprises one or more sentences describing the category that the text description pertains to; ii) using a lexical data source to derive a plurality of related word pairs and a plurality of confidence scores, wherein; 1) each related word pair from the plurality of related word pairs is associated with a confidence score from the plurality of confidence scores; 2) the lexical data source comprises a corpus of words, which corpus of words comprises each word from each text description from the one or more text descriptions; 3) the lexical data source maintains data identifying semantic relations between words from the corpus of words; 4) using the lexical data source comprises, for each word from the one or more text descriptions; identifying each word from the corpus of words which the data identifying semantic relations indicates is separated from the word from the one or more text descriptions by no more than a predetermined number of semantic relations; for each word from the corpus of words separated from the word from the one or more of text descriptions by no more than the predetermined number of semantic relations, creating a related word pair comprising the word from the one or more text descriptions and the word from the corpus; and associating a confidence score based at least in part on the number of semantic relations separating the word from the corpus and the word from the one or more text descriptions with the related word pair; iii) storing the plurality of related word pairs and the plurality of confidence scores in the database; d) a categorizer, wherein the categorizer is programmed to, when run using the one or more processors, perform a plurality of run time tasks comprising; i) receiving a text representing an input from an external source, wherein the input is associated with a plurality of categories from the category set, and wherein the text comprises one or more input words; ii) creating one or more input word pairings, wherein each input word pairing comprises an input word from the one or more input words, and a word from a textual description pertaining to one of the plurality of categories associated with the input; iii) using the plurality of related word pairs stored in the database to determine an input confidence score for each input word pairing; iv) based at least in part on the input confidence scores, determining a plurality of category confidence scores, wherein each category confidence score corresponds to a category associated with the input; and v) identifying the input from the external source as matching one of the categories associated with the input, or as being a no match based on the plurality of category confidence scores. - View Dependent Claims (12, 13)
-
-
14. A system comprising:
-
a) a database b) one or more processors; c) a categorizer, wherein the categorizer is programmed to, when run using the one or more processors, perform a plurality of tasks comprising; i) receive one or more text versions of an input from an external source, wherein the input is associated with a plurality of categories, and wherein each of the text versions of the input from the external source comprises one or more input words; ii) creating one or more input word pairings, wherein each input word pairing comprises an input word from one of the one or more text versions of the input from the external source and a word from a textual description pertaining to one of the plurality of categories associated with the input from the external source, wherein the textual description comprises one or more sentences describing the category which it pertains to; iii) for each of the one or more input word pairings, determining a maximum lexical chain confidence score based at least in part on a semantic distance between the words from the word pairing as indicated in a lexical data source;
wherein the maximum lexical confidence score is determined by finding a best lexical chain from plural lexical chains obtained from the lexical data source;iv) based at least in part on the maximum lexical chain confidence scores, determining a plurality of category confidence scores, wherein each category confidence score corresponds to a category associated with the input from the external source; and v) identifying the input from the external source as matching one or more of the associated categories, or as being a no match based at least in part on the plurality of category confidence scores. - View Dependent Claims (15, 16)
-
Specification