Apparatus for reducing spurious insertions in speech recognition

US 7,783,484 B2
Filed: 07/16/2008
Issued: 08/24/2010
Est. Priority Date: 04/04/2003
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus for automatically generating a phonetic baseform from a spoken utterance, the apparatus comprising:

at least one storage medium; and

at least one processor coupled to the at least one storage medium and operative to (i) obtain a stream of acoustic observations representing the spoken utterance;

(ii) generate a sequence of subphone units representing candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and

(iii) convert the sequence of subphone units into a phonetic baseform;

wherein the generating operation comprises merging candidate subphone units that relate to a same speech event and that overlap in time into a single candidate subphone unit, and associating with the single candidate subphone unit a score equal to a sum of scores associated with the merged candidate subphone units such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.

13 Citations

View as Search Results

11 Claims

1. Apparatus for automatically generating a phonetic baseform from a spoken utterance, the apparatus comprising:
- at least one storage medium; and
  
  at least one processor coupled to the at least one storage medium and operative to (i) obtain a stream of acoustic observations representing the spoken utterance;
  
  (ii) generate a sequence of subphone units representing candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
  
  (iii) convert the sequence of subphone units into a phonetic baseform;
  
  wherein the generating operation comprises merging candidate subphone units that relate to a same speech event and that overlap in time into a single candidate subphone unit, and associating with the single candidate subphone unit a score equal to a sum of scores associated with the merged candidate subphone units such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1, wherein the generating operation further comprises building a lattice from the stream of acoustic observations using acoustic models and a phone graph.
  - 3. The apparatus of claim 2, wherein the lattice is a subphone graph specifying a starting time, an ending time and an acoustic score associated to each subphone in a candidate sequence of subphones.
  - 4. The apparatus of claim 2, wherein the generating operation further comprises transforming the lattice to produce the generated sequence of subphone units.
  - 5. The apparatus of claim 4, wherein the transforming operation further comprises rescoring the lattice by using a transition model between the subphone units.
  - 6. The apparatus of claim 5, wherein the lattice comprises arcs and the transforming operation further comprises computing a posterior probability for each arc in the lattice as a sum of posterior probabilities of paths which go through that particular arc.
  - 7. The apparatus of claim 6, wherein the transforming operation further comprises modifying a topology of the lattice by merging arcs that bear a same subphone label and that overlap in time, while maintaining an arc order of the original lattice.
  - 8. The apparatus of claim 7, wherein the transforming operation further comprises assigning a new score to each new arc resulting from the merging of overlapping arcs by summing posterior probabilities of the merged arcs.
  - 9. The apparatus of claim 8, wherein the transforming operation further comprises identifying the generated sequence of subphone units as the sequence with the highest cumulative score in the transformed lattice.

10. An article of manufacture comprising at least one computer-readable storage medium encoded with one or more computer-executable programs which, when executed by at least one computer system, implement steps of:
- obtaining a stream of acoustic observations representing a spoken utterance;
  
  generating a sequence of subphone units representing candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations; and
  
  converting the sequence of subphone units into a phonetic baseform;
  
  wherein the generating step comprises merging candidate subphone units that relate to a same speech event and that overlap in time into a single candidate subphone unit, and associating with the single candidate subphone unit a score equal to a sum of scores associated with the merged candidate subphone units such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event.

11. A speech recognition system, comprising:
- a speech recognition engine; and
  
  a recognition lexicon associated with the speech recognition engine, the recognition lexicon including at least one phonetic baseform automatically generated by;
  
  (i) obtaining a stream of acoustic observations representing a spoken utterance;
  
  (ii) generating a sequence of subphone units representing candidate subphone units substantially maximizing a likelihood associated with the stream of acoustic observations, wherein the generating comprises merging candidate subphone units that relate to a same speech event and that overlap in time into a single candidate subphone unit, and associating with the single candidate subphone unit a score equal to a sum of scores associated with the merged candidate subphone units such that a score associated with a longer-lasting speech event is enhanced as compared with a score associated with a shorter-lasting speech event; and
  
  (iii) converting the sequence of subphone units into the at least one phonetic baseform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Deligne, Sabine V., Mangu, Lidia L.
Primary Examiner(s)
Smits; Talivaldis Ivars
Assistant Examiner(s)
KADOORIE, ISABEL YUAN

Application Number

US12/174,344
Publication Number

US 20080281593A1
Time in Patent Office

769 Days
Field of Search

704/249, 704/254, 704/231, 704/240, 704/241
US Class Current

704/254
CPC Class Codes

G10L 15/063 Training

G10L 2015/025 Phonemes, fenemes or fenone...

Apparatus for reducing spurious insertions in speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

13 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus for reducing spurious insertions in speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links