Statistical language model trained with semantic variants

US 7,912,702 B2
Filed: 10/31/2007
Issued: 03/22/2011
Est. Priority Date: 11/12/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of generating a statistical language model (SLM) grammar for a task domain which includes semantically variant words and phrases, the method comprising the steps of:

(a) providing a set of content words which can be associated with user questions in the task domain; and

using a computer system;

(b) determining semantic variants for each word in said set of content words;

wherein said semantic variants include at least synonyms;

(c) forming a semantic set of questions related to said user questions based on said synonyms;

(d) performing semantic decoding on said semantic set of questions, to identify a disambiguated set of questions; and

(e) configuring n-gram probabilities for words and phrases in said SLM grammar based on said set of disambiguated questions;

wherein said SLM grammar is configured to recognize semantic variants of questions posed to a natural language speech recognition engine.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An intelligent query system for processing voiced-based queries is disclosed, which uses a combination of both statistical and semantic based processing to identify the question posed by the user by understanding the meaning of the user'"'"'s utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user'"'"'s query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.

Citations

18 Claims

1. A method of generating a statistical language model (SLM) grammar for a task domain which includes semantically variant words and phrases, the method comprising the steps of:
- (a) providing a set of content words which can be associated with user questions in the task domain; and
  
  using a computer system;
  
  (b) determining semantic variants for each word in said set of content words;
  
  wherein said semantic variants include at least synonyms;
  
  (c) forming a semantic set of questions related to said user questions based on said synonyms;
  
  (d) performing semantic decoding on said semantic set of questions, to identify a disambiguated set of questions; and
  
  (e) configuring n-gram probabilities for words and phrases in said SLM grammar based on said set of disambiguated questions;
  
  wherein said SLM grammar is configured to recognize semantic variants of questions posed to a natural language speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein said set of content words are extracted automatically from files containing text transcriptions of user utterances.
  - 3. The method of claim 1, further including a step:
    - adjusting n-gram probabilities of the SLM grammar for the task domain based on observation count data from a second larger SLM grammar with a different set of words and phrases.
  - 4. The method of claim 1, wherein said n-gram probabilities are based on bi-grams.
  - 5. The method of claim 1, wherein said semantic variants also include at least hyponyms and/or hypernyms.
  - 6. The method of claim 1, wherein said semantic variants are determined from WORDNET.
  - 7. The method of claim 1, wherein phrasal synonyms are also determined and used for said set of disambiguated questions.
  - 8. The method of claim 1, wherein said semantic decoding considers term frequency, coverage and semantic similarity to compute a semantic distance.
  - 9. The method of claim 1, wherein said topic questions are derived automatically by analyzing said content of said task domain.
  - 10. The method of claim 1, further including a step:
    - embedding steps (a) through (c) as routines into a software data preparation tool.
  - 11. The method of claim 1, further including a step:
    - processing a new user question and correlating the same to one or more of said set of disambiguated questions.
  - 12. The method of claim 1, further including a step:
    - providing one or more answers to said new user question.
  - 13. The method of claim 1, wherein the semantic variants are determined using a lexical dictionary.

14. A speech processing system that implements a statistical language model (SLM) grammar for a task domain which includes semantically variant words and phrases, comprising:
- a computing system; and
  
  one or more data repositories associated with the computing system, the one or more data repositories storing the SLM grammar, wherein the SLM grammar includes(a) a set of content words which can be associated with user questions in the task domain;
  
  (b) a set of semantic variants for each word in said set of content words;
  
  wherein said semantic variants include at least synonyms; and
  
  (c) a disambiguated set of questions which are based on a semantic set of questions related to said user questions based on said synonyms;
  
  wherein the SLM grammar includes n-gram probabilities for words and phrases which are configured based on said set of disambiguated questions;
  
  and further wherein said SLM grammar is configured to recognize semantic variants of questions posed to a natural language speech recognition engine.

15. A method of generating a statistical language model (SLM) grammar for a task domain which includes semantically variant words and phrases, comprising:
- using a computing system;
  
  determining semantic variants for each word in a set of content words associated with user questions in a task domain using a lexical dictionary, the semantic variants including at least synonyms;
  
  forming a semantic set of questions related to the user questions based on the semantic variants;
  
  performing semantic decoding on the semantic set of questions to identify a disambiguated set of questions; and
  
  configuring n-gram probabilities for words and phrases in the SLM grammar based on the set of disambiguated questions.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein the semantic variants further include one or both of hyponyms or hypernyms.
  - 17. The method of claim 15, further comprising determining semantic variants for one or more phrases in the set of content words.
  - 18. The method of claim 15, wherein the lexical dictionary is WORDNET.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian
Primary Examiner(s)
Lerner, Martin

Application Number

US11/932,773
Publication Number

US 20080052078A1
Time in Patent Office

1,238 Days
Field of Search

704/9, 704/10, 704/243, 704/255, 704/257, 704/275, 707/706, 707/759, 707/769
US Class Current

704/9
CPC Class Codes

G06F 16/24522   Translation of natural lang...

G06F 16/3329   Natural language query form...

G06F 16/3344   using natural language anal...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

G06F 40/42   Data-driven translation

G09B 19/06   Foreign languages with audi...

G09B 5/04   with audible presentation o...

G09B 7/00   Electrically-operated teach...

G09B 7/02   of the type wherein the stu...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

H04M 2250/74 : with voice recognition means

View All

Statistical language model trained with semantic variants

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Statistical language model trained with semantic variants

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links