System and method for semantic categorization

US 8,380,511 B2
Filed: 02/20/2007
Issued: 02/19/2013
Est. Priority Date: 02/20/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

a) creating a set of text descriptions, wherein the set of text descriptions comprises, for each category in a category set, a corresponding text description for the category;

b) accepting the set of text descriptions;

c) identifying each word from a lexical data source which is related to a word in the set of text descriptions by less than a threshold number of semantic relations;

d) creating a build time set of word pairs, each word pair from the build time set of word pairs comprising a word from the identified words from the lexical data source and a word from the set of text descriptions;

e) using, without human intervention, a processor to assign lexical chaining confidence scores to each word pair from the build time set of word pairs;

f) accepting a text statement from an input source;

g) creating a run time set of word pairs, each word pair from the run time set of word pairs comprising a word from the accepted text statement, and a word from the set of text descriptions; and

h) determining at least one category corresponding to the accepted text statement based, at least in part, on said assigned lexical chaining confidence scores for word pairs from the build time set of word pairs corresponding to word pairs from the run time set of word pairs.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.

42 Citations

View as Search Results

16 Claims

1. A method comprising:
- a) creating a set of text descriptions, wherein the set of text descriptions comprises, for each category in a category set, a corresponding text description for the category;
  
  b) accepting the set of text descriptions;
  
  c) identifying each word from a lexical data source which is related to a word in the set of text descriptions by less than a threshold number of semantic relations;
  
  d) creating a build time set of word pairs, each word pair from the build time set of word pairs comprising a word from the identified words from the lexical data source and a word from the set of text descriptions;
  
  e) using, without human intervention, a processor to assign lexical chaining confidence scores to each word pair from the build time set of word pairs;
  
  f) accepting a text statement from an input source;
  
  g) creating a run time set of word pairs, each word pair from the run time set of word pairs comprising a word from the accepted text statement, and a word from the set of text descriptions; and
  
  h) determining at least one category corresponding to the accepted text statement based, at least in part, on said assigned lexical chaining confidence scores for word pairs from the build time set of word pairs corresponding to word pairs from the run time set of word pairs.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said assigned confidence scores are stored in a natural language processing (NLP) database.
  - 3. The method of claim 2 wherein said determining comprises:
    - accessing said NLP database for said confidence scores.
  - 4. The method of claim 1 wherein:
    - a) creating the run time set of word pairs comprises, for each category in the category set, pairing each word from the accepted text statement with each word from the text description corresponding to the category set;
      
      b) determining at least one category corresponding to the accepted text statement comprises;
      
      i) for each category in the category set, creating a categorization confidence score by combining the lexical chain confidence scores for word pairs from the build time set of word pairs which correspond to word pairs from the run time set of word pairs created by pairing a word from the accepted text statement with a word from the text description corresponding to the category; and
      
      ii) identifying the category from the category set with the highest categorization confidence score.
  - 5. The method of claim 1 wherein using the processor to assign lexical chaining confidence scores comprises:
    - accessing a lexical database.
  - 6. The method of claim 1 wherein said text statement is derived from an audio response using automatically generated statistical language models (SLMs).

7. A system comprising:
- a) a configurator configured to;
  
  i) accept at least one text description pertaining to a category set, wherein the at least one text descriptions comprises, for each category in the category set, a corresponding text description for the category, wherein the corresponding text description comprises one or more sentences describing the category;
  
  ii) accept words that are anticipated to be provided as input pertaining to said category set;
  
  iii) receive lexical data from a lexical data source, said lexical data pertaining to received ones of said accepted words and received ones of words in said at least one text description; and
  
  b) a database for storing lexical confidence scores for use by said configurator based upon word pairs created by said configurator between said anticipated words and said words from said at least one text description, said lexical confidence scores based, at least in part, on lexical data received from said lexical data source;
  
  c) a categorizer configured to;
  
  i) accept a text statement from an input source;
  
  ii) accept at least one category set pertaining to said accepted statement, said category set having a plurality of possible categories;
  
  iii) creating a run time set of word pairs by pairing words in the accepted text statement with words in the corresponding text description for a category from category set; and
  
  iv) calculate a category confidence score for the category from the category set by summing up lexical confidence scores stored in said database for the word pairs from the run time set of word pairs.
- View Dependent Claims (8, 9)
- - 8. The system of claim 7 wherein the categorizer is configured touse a means for determining a category corresponding to said accepted statement based, at least in part, on said assigned lexical chaining confidence scores between said created word pairs as obtained from said database.
  - 9. The system of claim 8 further comprising the means for determining a category corresponding to said accepted statement based, at least in part, on said assigned lexical chaining confidence scores between said created word pairs as obtained from said database.

10. A system comprising:
- a) a categorizer configured to;
  
  i) accept a text statement from an input source;
  
  ii) accept at least one category set pertaining to said accepted statement, said category set having a plurality of possible categories;
  
  iii) use a means for determining at least one category corresponding to said accepted statement based, at least in part, on assigned lexical chaining confidence scores between a set of created word pairs as obtained from a database, said confidence scores derived under control of a configurator;
  
  b) the configurator, wherein the configurator is configured to;
  
  i) accept;
  
  at least one text description pertaining to a category in a category set; and
  
  words that are anticipated to be provided as input pertaining to said category set;
  
  ii) receive lexical data from a lexical data source, said lexical data pertaining to received ones of said accepted words and received ones of words in said text description; and
  
  iii) use a means for assigning said lexical chaining confidence scores based upon word pairs between said anticipated words and said words from said text description, said scores based, at least in part, on lexical data received from said lexical data source;
  
  c) the database for storing lexical confidence scores for use by said configurator based upon said word pairs between said anticipated words and said words from said text description, said lexical confidence scores based, at least in part, on lexical data received from said lexical data source.

11. A system comprising:
- a) a database;
  
  b) one or more processors; and
  
  c) a configurator, wherein the configurator is programmed to, when run using the one or more processors, perform a plurality of tasks comprising;
  
  i) accepting one or more of text descriptions, wherein each text description from the one or more text descriptions pertains to a category from a category set, and wherein each text description from the one or more text descriptions comprises one or more sentences describing the category that the text description pertains to;
  
  ii) using a lexical data source to derive a plurality of related word pairs and a plurality of confidence scores, wherein;
  
  1) each related word pair from the plurality of related word pairs is associated with a confidence score from the plurality of confidence scores;
  
  2) the lexical data source comprises a corpus of words, which corpus of words comprises each word from each text description from the one or more text descriptions;
  
  3) the lexical data source maintains data identifying semantic relations between words from the corpus of words;
  
  4) using the lexical data source comprises, for each word from the one or more text descriptions;
  
  identifying each word from the corpus of words which the data identifying semantic relations indicates is separated from the word from the one or more text descriptions by no more than a predetermined number of semantic relations;
  
  for each word from the corpus of words separated from the word from the one or more of text descriptions by no more than the predetermined number of semantic relations, creating a related word pair comprising the word from the one or more text descriptions and the word from the corpus; and
  
  associating a confidence score based at least in part on the number of semantic relations separating the word from the corpus and the word from the one or more text descriptions with the related word pair;
  
  iii) storing the plurality of related word pairs and the plurality of confidence scores in the database;
  
  d) a categorizer, wherein the categorizer is programmed to, when run using the one or more processors, perform a plurality of run time tasks comprising;
  
  i) receiving a text representing an input from an external source, wherein the input is associated with a plurality of categories from the category set, and wherein the text comprises one or more input words;
  
  ii) creating one or more input word pairings, wherein each input word pairing comprises an input word from the one or more input words, and a word from a textual description pertaining to one of the plurality of categories associated with the input;
  
  iii) using the plurality of related word pairs stored in the database to determine an input confidence score for each input word pairing;
  
  iv) based at least in part on the input confidence scores, determining a plurality of category confidence scores, wherein each category confidence score corresponds to a category associated with the input; and
  
  v) identifying the input from the external source as matching one of the categories associated with the input, or as being a no match based on the plurality of category confidence scores.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11, wherein the categorizer identifies the input from external source as matching one of the categories associated with the input, or as being a no match using a means for categorizing input using the database.
  - 13. The system of claim 11, wherein:
    - (a) the data identifying semantic relations maintained by the lexical data source comprises semantic relations which can be used to link any two words from the corpus of words into a lexical chain;
      
      (b) identifying each word from the corpus of words which the data identifying semantic relations indicates is separated from the word from the one or more text descriptions by no more than a predetermined number of semantic relations comprises identifying each word from the corpus of words which the data identifying semantic relations indicates can be linked to the word from the one or more text descriptions by a lexical chain having a length of no more than the predetermined number of semantic relations; and
      
      (c) the predetermined number of semantic relations is three.

14. A system comprising:
- a) a databaseb) one or more processors;
  
  c) a categorizer, wherein the categorizer is programmed to, when run using the one or more processors, perform a plurality of tasks comprising;
  
  i) receive one or more text versions of an input from an external source, wherein the input is associated with a plurality of categories, and wherein each of the text versions of the input from the external source comprises one or more input words;
  
  ii) creating one or more input word pairings, wherein each input word pairing comprises an input word from one of the one or more text versions of the input from the external source and a word from a textual description pertaining to one of the plurality of categories associated with the input from the external source, wherein the textual description comprises one or more sentences describing the category which it pertains to;
  
  iii) for each of the one or more input word pairings, determining a maximum lexical chain confidence score based at least in part on a semantic distance between the words from the word pairing as indicated in a lexical data source;
  
  wherein the maximum lexical confidence score is determined by finding a best lexical chain from plural lexical chains obtained from the lexical data source;
  
  iv) based at least in part on the maximum lexical chain confidence scores, determining a plurality of category confidence scores, wherein each category confidence score corresponds to a category associated with the input from the external source; and
  
  v) identifying the input from the external source as matching one or more of the associated categories, or as being a no match based at least in part on the plurality of category confidence scores.
- View Dependent Claims (15, 16)
- - 15. The system of claim 14, wherein:
    - a) receiving one or more text versions of the input from the external source comprises receiving a plurality of text versions of a user utterance;
      
      b) the plurality of text versions of the user utterance comprise different potential transcriptions of the user utterance;
      
      c) determining the plurality of category confidence scores comprises, for each category from the plurality of categories;
      
      i) determining a lexical chain confidence score corresponding to that category for each of the one or more text versions of the user utterance; and
      
      ii) using a majority voting algorithm and the lexical chain confidence scores for each of the one or more text versions of the user utterance to determine the category confidence score.
  - 16. The system of claim 14, wherein the categorizer is configured to use a means for categorizing an input without use of a natural language processing database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intervoice Limited Partnership A Nevada Limited Partnership Composed of As Its SOLE Partner Intervoice GP Incorporated, Lymba Corp.
Original Assignee
Intervoice Limited Partnership, Lymba Corp.
Inventors
Cave, Ellis K., Balakrishna, Mithun, Mo, Vincent
Primary Examiner(s)
He, Jialong

Application Number

US11/676,704
Publication Number

US 20080201133A1
Time in Patent Office

2,191 Days
Field of Search

704/270, 704/270.1, 704/275, 704/9, 704/238, 704/240, 704/251
US Class Current

704/270
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

System and method for semantic categorization

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for semantic categorization

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links