Method and apparatus for performing relational speech recognition

US 6,996,519 B2
Filed: 09/28/2001
Issued: 02/07/2006
Est. Priority Date: 09/28/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing an utterance that pertains to a sparse domain, the sparse domain having a linguistic structure and a plurality of components, objects or concepts, the method comprising the steps of:

acquiring a speech signal that represents an utterance;

performing a first recognition pass by applying a first language model to the speech signal;

selecting or generating a second language model based at least in part on results from the first recognition pass, on information regarding a linguistic structure of a domain within the speech signal, and on information regarding relationships among the domain components, objects or concepts within the speech signal; and

performing a second recognition pass by applying the second language model to at least a portion of the speech signal to recognize the utterance containing the speech signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for performing speech recognition using observable relationships between words. Results from a speech recognition pass can be combined with information about the observable word relationships to constrain or simplify subsequent recognition passes. This iterative process greatly reduces the search space required for each recognition pass, making the speech recognition process more efficient, faster and accurate.

Citations

17 Claims

1. A method for recognizing an utterance that pertains to a sparse domain, the sparse domain having a linguistic structure and a plurality of components, objects or concepts, the method comprising the steps of:
- acquiring a speech signal that represents an utterance;
  
  performing a first recognition pass by applying a first language model to the speech signal;
  
  selecting or generating a second language model based at least in part on results from the first recognition pass, on information regarding a linguistic structure of a domain within the speech signal, and on information regarding relationships among the domain components, objects or concepts within the speech signal; and
  
  performing a second recognition pass by applying the second language model to at least a portion of the speech signal to recognize the utterance containing the speech signal.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein the first and second language models are probabilistic finite state grammars.
  - 3. The method of claim 1 wherein the first and second language models are statistical language models.
  - 4. The method of claim 1 further comprising the step of selecting or generating acoustic models based at least in part on results from the first recognition pass, on information regarding the linguistic structure of the domain, and on information regarding relationships among the domain components, objects or concepts.

5. A method for recognizing an utterance pertaining to an address or location, each address or location having a plurality of components, the method comprising the steps of:
- acquiring a speech signal that represents an utterance;
  
  performing a first recognition pass by applying a first language model to the speech signal;
  
  selecting or generating a second language model based at least in part on results from the first recognition pass and on information regarding relationships among the address or location components; and
  
  performing a second recognition pass by applying the second language model to at least a portion of the speech signal to recognize the utterance contained in the speech signal.
- View Dependent Claims (6)
- - 6. The method of claim 5 further comprising the step of selecting or generating acoustic models, the selection or generation based at least in part on results from the first recognition pass and on information regarding relationships among the address or location components.

7. In a speech recognition system, a method for recognizing an utterance comprising the steps of:
- acquiring a speech signal that represents the utterance; and
  
  performing a series of recognition passes, a second and subsequent recognition passes processing at least a portion of the speech signal using a language model that is constrained by a result of a previous recognition pass.
- View Dependent Claims (8)
- - 8. The method of claim 7 wherein the second and subsequent recognition passes use acoustic models that are constrained by a result of a previous recognition pass.

9. A method for generating language models between speech recognition passes, the language models based on a domain having a linguistic structure and a plurality of components, objects or concepts, the method comprising the steps of:
- generating or acquiring a database containing information regarding the linguistic structure of the domain and information regarding relationships among the domain components, objects or concepts;
  
  acquiring a result from a speech recognition pass, the result including a domain component, object or concept; and
  
  generating a language model that includes a subset of the domain by using the result from the speech recognition pass to select information from the database.

10. In a speech recognition system, a method for generating language models based on a domain having a plurality of components, objects or concepts, the method comprising the steps of:
- acquiring a result from a speech recognition pass, the result including a domain component, object or concept;
  
  using the result from the speech recognition pass to perform a search on a database that contains information regarding relationships among the domain components, objects or concepts; and
  
  generating a language model using a result from the database search.

11. A method for recognizing an address or location expressed as a single utterance, the method comprising the steps of:
- acquiring a speech signal that represents the single utterance; and
  
  performing a series of recognition passes, a second and subsequent recognition passes processing at least a portion of the speech signal using a language model that is constrained by a result of a previous recognition pass.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11 wherein each address or location has a plurality of components.
  - 13. The method of claim 12 wherein the first recognition pass processes the speech signal using a first language model.
  - 14. The method of claim 13 wherein the first language model may be used to recognize only a subset of the address or location components.
  - 15. The method of claim 14 wherein the language models used in the second and subsequent recognition passes may be used to recognize only a subset of the address or location components.
  - 16. The method of claim 15 wherein the second and subsequent language models are selected or generated by using the result from a previous recognition pass to perform a search on a database that contains information regarding relationships among the address or location components.
  - 17. The method of claim 11 wherein the second and subsequent recognition passes uses acoustic models that are constrained by a result of a previous recognition pass.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
SRI International, Inc.
Inventors
Franco, Horacio E., Israel, David J., Myers, Gregory K.
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/967,228
Publication Number

US 20030065511A1
Time in Patent Office

1,593 Days
Field of Search

704/251, 704/255, 704/256, 704/258, 704/240, 704/236, 704/231, 704/270
US Class Current

704/9
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/183 using context dependencies,...

Method and apparatus for performing relational speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for performing relational speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links