Speech recognition system employing multiple grammar networks

US 5,991,720 A
Filed: 04/16/1997
Issued: 11/23/1999
Est. Priority Date: 05/06/1996
Status: Expired due to Fees

First Claim

Patent Images

1. A single pass method of processing acoustic speech data for word recognition, comprising:

processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates;

processing said acoustic speech data using a recognizer based on a second grammar model different that said first grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of recognition candidates;

aligning said first plurality of recognition candidates with a dictionary of predetermined words to generate a first list of word candidates;

aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of word candidates;

building a dynamic grammar model from said first and second lists of word candidates; and

processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.

Citations

21 Claims

1. A single pass method of processing acoustic speech data for word recognition, comprising:
- processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates;
  
  processing said acoustic speech data using a recognizer based on a second grammar model different that said first grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of recognition candidates;
  
  aligning said first plurality of recognition candidates with a dictionary of predetermined words to generate a first list of word candidates;
  
  aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of word candidates;
  
  building a dynamic grammar model from said first and second lists of word candidates; and
  
  processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models.
  - 3. The method of claim 2 wherein said letter models are represented by Hidden Markov Models.
  - 4. The method of claim 2 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models and said second grammar model includes at least one filler model to represent utterances not defined by said letter models.
  - 5. The method of claim 2 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models and said second grammar model includes at least one filler model to represent utterances not defined by said letter models and one silence model to represent a pause in said acoustic speech data.
  - 6. The method of claim 1 wherein said second grammar model defines a letter spotting grammar.
  - 7. The method of claim 1 wherein said first and second grammar models comprise a plurality of different nodes and wherein said first and second plurality of recognition candidates are extracted by a recognition process that scores said nodes according to how closely said acoustic speech data matches said nodes;
    - andwherein said first and second plurality of recognition candidates are extracted by selecting those nodes scored as having the closest match to said acoustic data.
  - 8. The method of claim 1 wherein said second grammar model comprises at least one node for representing noise.
  - 9. The method of claim 1 further comprising using said recognized word to route a telephone call.

10. A single pass method of processing acoustic speech data for spelled name recognition, comprising:
- processing said acoustic speech data using a recognizer based on a first grammar model to segment said acoustic speech data in a first way and thereby extract a first plurality of letter candidates;
  
  processing said acoustic speech data using a recognizer based on a second grammar model to segment said acoustic speech data in a second way different than said first way and thereby extract a second plurality of letter candidates;
  
  aligning said first plurality of letter candidates with a dictionary of predetermined names to generate a first list of name candidates;
  
  aligning said second plurality of recognition candidates with said dictionary of predetermined words to generate a second list of name candidates; and
  
  processing said acoustic speech data using a recognizer based on said dynamic grammar model to extract the recognized name.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The method of claim 10 wherein said first and second grammar models are different.
  - 12. The method of claim 10 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models.
  - 13. The method of claim 12 wherein said letter models are represented by Hidden Markov Models.
  - 14. The method of claim 10 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models and said second grammar model includes at least one filler model to represent utterances not defined by said letter models.
  - 15. The method of claim 10 wherein said first and second grammar models are network models comprising a plurality of interconnected letter models and said second grammar model includes at least one filler model to represent utterances not defined by said letter models and one silence model to represent a pause in said acoustic speech data.
  - 16. The method of claim 10 wherein said second grammar model defines a letter spotting grammar.
  - 17. The method of claim 10 wherein said first and second grammar models comprise a plurality of different nodes and wherein said first and second plurality of recognition candidates are extracted by a recognition process that scores said nodes according to how closely said acoustic speech data matches said nodes;
    - andwherein said first and second plurality of recognition candidates are extracted by selecting those nodes scored as having the closest match to said acoustic data.
  - 18. The method of claim 17 wherein said second grammar model comprises at least one node for representing noise.
  - 19. The method of claim 10 further comprising using said recognized word to route a telephone call.

20. A single pass method of processing acoustic speech data for recognition, comprising:
- processing said acoustic speech data using a recognizer based on a first grammar network to segment said acoustic speech data in a first way and thereby extract a first plurality of recognition candidates according to a first speech input criteria;
  
  processing said acoustic speech data using a recognizer based on a second grammar network to segment said acoustic speech data on a second way different than said first way and thereby extract a second plurality of recognition candidates according to a second speech input criteria;
  
  transforming said first and second plurality of recognition candidates into transformed candidates based on at least one set of a priori constraints on said acoustic speech data;
  
  making recognition decision based on said transformed candidates.

21. A single pass method of processing acoustic speech data for recognition, comprising:
- separately processing said acoustic speech data using different first and second grammar networks that result in different segmentation of said acoustic speech data to extract speech that has utility from speech that does not;
  
  generating a first plurality of recognition candidates using said first grammar network and a second plurality of recognition candidates using said second grammar network;
  
  transforming said first and second plurality of recognition candidates based on at least one set of a priori constraints about the speech that has utility to generate transformed recognition candidates; and
  
  making a recognition decision based on said transformed recognition candidates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-Claude, Galler, Michael
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
SAX, ROBERT L

Application Number

US08/834,358
Time in Patent Office

951 Days
Field of Search

704/256, 704/251, 704/275
US Class Current

704/256.5
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

H04M 2201/40   using speech recognition

H04M 3/42204   Arrangements at the exchang...

H04M 3/42314   in private branch exchanges

H04M 3/51   Centralised call answering ...

H04M 3/527   Centralised call answering ...

Speech recognition system employing multiple grammar networks

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system employing multiple grammar networks

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links