Method for recognizing speech using linguistically-motivated hidden Markov models

US 5,268,990 A
Filed: 01/31/1991
Issued: 12/07/1993
Est. Priority Date: 01/31/1991
Status: Expired due to Term

First Claim

Patent Images

1. A method for modeling word-initial acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetic networks, said modeling method comprising the steps, for every word network w, of:

for every initial arc a_i, setting a variable p_i equal to a phone label on said arc a_i ; and

for every phone label p_j in a phonetic inventory,a) counting a number of occurrences c in said training database of the word w preceded by any word ending with said phone p_j ;

thereafterb) if c is greater than a preselected threshold,i) adding an initial arc a_k to said word network w with phone label p_i which connects to a common "to-node"[as said arc a_i ;

thereafterii) constraining said arc a_k to only connect to arcs with label p_j ; and

iii) constraining said arc a_i not to connect to arcs having phone label p_j.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition methodology takes advantage of linguistic constraints wherein words are modeled as probabilistic networks of phonetic segments (herein phones), and each phone is represented as a context-independent hidden Markov phone model mixed with a number of context-dependent phone models. Recognition is based on use of methods to design phonological rule sets based on measures of coverage and overgeneration of pronunciations which achieves high coverage of pronunciations with compact representations. Further, a method estimates probabilities of the different possible pronunciations of words. A further method models cross-word coarticulatory effects. In a specific embodiment of the system, a specific method determines the single most-likely pronunciation of words. In further specific embodiments of the system, methods generate speaker-dependent pronunciation networks.

302 Citations

6 Claims

1. A method for modeling word-initial acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetic networks, said modeling method comprising the steps, for every word network w, of:
- for every initial arc a_i, setting a variable p_i equal to a phone label on said arc a_i ; and
  
  for every phone label p_j in a phonetic inventory,a) counting a number of occurrences c in said training database of the word w preceded by any word ending with said phone p_j ;
  
  thereafterb) if c is greater than a preselected threshold,i) adding an initial arc a_k to said word network w with phone label p_i which connects to a common "to-node"[as said arc a_i ;
  
  thereafterii) constraining said arc a_k to only connect to arcs with label p_j ; and
  
  iii) constraining said arc a_i not to connect to arcs having phone label p_j.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein said word networks comprise phones modeled by Hidden Markov Models (HMMs), further comprising the step of thereafter training said HMMs of said word net works with said training database.
  - 3. The method of claim 2, wherein said preselected threshold is selected according to a minimum number of training samples required for probabilities of said added arcs to be estimated.

4. A method for modeling word-final acoustic cross-word effects for use in generating word pronunciation networks from a set of pronunciation rules and a training database on a data processing apparatus, wherein said training database comprises representations of speech signals of words being pronounced in a continuous speech manner, said speech signal representations being linked to textual representations of said words, wherein word pronunciation networks include textual representations of words and corresponding phonetics networks, said modeling method comprising the steps, for every word network w, of:
- for every final arc a_i, setting a variable p_i equal to a phone label on said arc a_i ; and
  
  for every phone label p_j in a phonetic inventory,a) counting a number of occurrences c in said training database of the word w followed by any word beginning with said phone p_j ;
  
  thereafterb) if c is greater than a preselected threshold,i) adding a final arc a_k to said word network w with phone label p_i which connects to a common "to-node" as said arc a_i ;
  
  thereafterii) constraining said arc a_k to only connect to arcs with label p_j ; and
  
  iii) constraining said arc a_i not to connect to arcs having phone label p_j.
- View Dependent Claims (5, 6)
- - 5. The method of claim 4, wherein said word networks comprise phone modeled by Hidden Markov Models (HMMs), further comprising the step of thereafter training said HMMs of said word networks with said training database.
  - 6. The method of claim 5, wherein said preselected threshold is selected according to a minimum number of training samples required for probabilities of said added arcs to be estimated.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Bernstein, Jared C., Cohen, Michael H., Weintraub, Mitchel, Price, Patti J., Murveit, Hy
Primary Examiner(s)
Knepper, David D.

Application Number

US07/648,097
Time in Patent Office

1,041 Days
Field of Search

395/2, 381/41-43
US Class Current

704/200
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/187   Phonemic context, e.g. pron...

Method for recognizing speech using linguistically-motivated hidden Markov models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

302 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method for recognizing speech using linguistically-motivated hidden Markov models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

302 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links