Method and system for semantic speech recognition

US 6,937,983 B2
Filed: 10/15/2001
Issued: 08/30/2005
Est. Priority Date: 12/20/2000
Status: Expired due to Term

First Claim

Patent Images

1. A computer implemented speech recognition method for performing Natural Language Understanding (NLU) functions, comprising the steps of:

(a) converting a user utterance directly into a plurality of basic speech units without convening the utterance into a sequence of textually represented words, said user utterance being a sequence of words expressing a query or a command;

(b) matching said plurality of basic speech units against a plurality of combinations of items, wherein each item is tagged data or is a concept code; and

(c) generating a combination of items likely to be representative of said user utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention discloses a computer-implemented method to understand queries or commands spoken by users when they use natural language utterances similar to those that people use spontaneously to communicate. More precisely, the invention discloses a method that identifies user queries or commands from the general information involved in spoken utterances directly by the speech recognition system, and not by a post-process as is conventionally used. In a phase of preparation of the system, a vocabulary of items representing data and semantic identifiers is created as well as a syntax module having valid combinations of items. When the system is in use, a user utterance is first discretized into a plurality of basic speech units which are compared to the items in the vocabulary and a combination of items is selected according to the evaluation from the syntax module in order to generate the most likely sequence of items representative of the user utterance. Finally the semantic identifiers and the data extracted from the user utterance are used to call the appropriate function that process the user request.

Citations

36 Claims

1. A computer implemented speech recognition method for performing Natural Language Understanding (NLU) functions, comprising the steps of:
- (a) converting a user utterance directly into a plurality of basic speech units without convening the utterance into a sequence of textually represented words, said user utterance being a sequence of words expressing a query or a command;
  
  (b) matching said plurality of basic speech units against a plurality of combinations of items, wherein each item is tagged data or is a concept code; and
  
  (c) generating a combination of items likely to be representative of said user utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, said step (b) further comprising:
    - (d) a first step of matching said plurality of basic speech units against a vocabulary of items to generate a first list of items likely to be representative of said user utterance.
  - 3. The method of claim 2, wherein said step (d) is performed using Hidden Markov Models.
  - 4. The method of claim 2, said step (b) further comprising:
    - (e) a second step of matching said first list of items against said plurality of combinations of items to generate said combination of items likely, to be representative of said user utterance in said step (c).
  - 5. The method of claim 4, wherein said step (e) is processed using a conceptual language model.
  - 6. The method of claim 5, wherein said conceptual language model is an n-gram conceptual language model.
  - 7. The method of claim 6, further comprising an initial step of training said conceptual language model.
  - 8. The method of claim 4, wherein said step (c) is processed using a conceptual grammar.
  - 9. The method of claim 2, further comprising:
    - a training step defining said vocabulary of items of said step (d).
  - 10. The method of claim 1, further comprising:
    - defining said plurality of combinations of items of said step (c) in a training step.
  - 11. The method of claim 9, further comprising:
    - defining said plurality of combinations of items of said step (c) in a training step.
  - 12. The method of claim 1, further comprising:
    - storing a set of prototype acoustic models obtained from a training phase, wherein each said acoustic model represents one or more possible basic speech units of an utterance of a word.
  - 13. The method of claim 12, further comprising:
    - assigning one of said acoustic models to each said basic speech unit.
  - 14. The method of claim 1, wherein said user utterance is in the form of isolated data.
  - 15. The method of claim 1, wherein said tagged data includes a plurality of segmentable data elements.
  - 16. The method of claim 1, further comprising:
    - sending said most likely combination of items to a function identification module to perform said user query or command.

17. A speech recognition system for performing Natural Language Understanding, said system comprising:
- (a) a converter, said converter directly converting a user utterance into a plurality of basic speech units without converting the utterance into a sequence of textually represented words, said user utterance being a sequence of words expressing a query or a command;
  
  (b) a processor, said processor matching said plurality of basic speech units against a plurality of combinations of items, wherein each item is tagged data or is a concept code; and
  
  (c) a generator, said generator generating a combination of items likely to be representative of said user utterance.

18. A speech recognition system for performing Natural Language Understanding, said system comprising:
- an acoustic processor, said acoustic processor for receiving a user spoken utterance and directly determining a string of labels identifying a corresponding sound of said user spoken utterance without converting the utterance into a sequence of textually represented words;
  
  a decoder communicatively linked to said acoustic processor, said decoder determining a likely sequence of items corresponding to said determined string of labels;
  
  a conceptual pronunciation dictionary providing said decoder with a pronunciation of said items;
  
  a conceptual syntax module providing said decoder with a set of allowable combined items; and
  
  a target function identification module communicatively linked to said decoder, said target function identification module executing a function corresponding to said likely sequence of items.
- View Dependent Claims (19, 20)
- - 19. The system of claim 18, wherein said decoder comprises a fast acoustic match and a detailed acoustic match.
  - 20. The system of claim 18, wherein said conceptual syntax module comprises a conceptual language model or a conceptual grammar.

21. A machine-readable storage, having stored thereon a computer program having, a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- (a) converting a user utterance directly into a plurality of basic speech units without converting the utterance into a sequence of textually represented words, said user utterance being a sequence of words expressing a query or a command;
  
  (b) matching said plurality of basic speech units against a plurality of combinations of items, wherein each item is tagged data or is a concept code; and
  
  (c) generating a combination of items likely to be representative of said user utterance.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 22. The machine-readable storage of claim 21, said step (b) further comprising:
    - (d) a first step of matching said plurality of basic speech units against a vocabulary of terms to generate a first list of items likely to be representative of said user utterance.
  - 23. The machine-readable storage of claim 22, wherein said step (d) is performed using Hidden Markov Models.
  - 24. The machine-readable storage of claim 22, said step (b) further comprising:
    - (e) a second step of matching said first list of items against said plurality at combinations of items to generate said combination of items likely to be representative of said user utterance in said step (c).
  - 25. The machine-readable storage of claim 24, wherein said step (e) is processed using a conceptual language model.
  - 26. The machine-readable storage of claim 25, wherein said conceptual language model is an n-gram conceptual language model.
  - 27. The machine-readable storage of claim 26, further comprising an initial step of training said conceptual language model.
  - 28. The machine-readable storage of claim 24, wherein said step (c) is processed using conceptual grammar.
  - 29. The machine-readable storage of claim 22, further comprising:
    - a training step defining said vocabulary of items of said step (d).
  - 30. The machine-readable storage of claim 21, further comprising:
    - defining said plurality of combinations of items of said step (c) in a training step.
  - 31. The machine-readable storage of claim 29, further comprising:
    - defining said plurality of combinations of items of said step (c) in a training step.
  - 32. The machine-readable storage of claim 21, further comprising:
    - storing a set of prototype acoustic models obtained from a training phase, wherein each said acoustic model represents one or more possible basic speech units of an utterance of a word.
  - 33. The machine-readable storage of claim 32, further comprising:
    - assigning one of said acoustic models to each said basic speech unit.
  - 34. The machine-readable storage of claim 21, wherein said user utterance is in the form of isolated data.
  - 35. The machine-readable storage of claim 21, wherein said tagged data includes a plurality of segmentable data elements.
  - 36. The machine-readable storage of claim 21, further comprising:
    - sending said most likely combination of items to a function identification module to perform said user query or command.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Romero, Juan Rojas
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/977,665
Publication Number

US 20020111803A1
Time in Patent Office

1,415 Days
Field of Search

704/252, 704/254, 704/256, 704/257
US Class Current

704/257
CPC Class Codes

G10L 15/1822 Parsing for meaning underst...

Method and system for semantic speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for semantic speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links