Computer system and computer-implemented process for phonology-based automatic speech recognition

US 5,623,609 A
Filed: 09/02/1994
Issued: 04/22/1997
Est. Priority Date: 06/14/1993
Status: Expired due to Fees

First Claim

Patent Images

1. A machine-implemented process for recognition of a spoken word in an utterance, comprising the steps of:

receiving a speech signal containing the utterance;

detecting acoustic cues in the speech signal;

detecting, using the detected acoustic cues, the presence of one or more of a set of a small number of phonological elements, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues;

identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated;

associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and

comparing the representation to a lexicon of predetermined representations of words to identify a best match, thereby recognizing the spoken word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is based on the use of linguistic, especially phonological, knowledge to guide the speech recognition process. A speech signal containing an utterance is received and linguistic cues in the speech signal are detected. From these detected linguistic cues, a symbolic representation of the contents of the speech signal is generated. This symbolic representation comprises at least one word division, wherein each word division consists of an onset-rhyme pair and associated phonological elements. These phonological elements are univalent, may appear in all languages and are distinguishable from each other and directly interpretable in the speech signal. A lexicon of predetermined symbolic representations is provided for words in a particular language. A best match to the generated symbolic representation in found in the lexicon, thereby recognizing the spoken word.

Citations

16 Claims

1. A machine-implemented process for recognition of a spoken word in an utterance, comprising the steps of:
- receiving a speech signal containing the utterance;
  
  detecting acoustic cues in the speech signal;
  
  detecting, using the detected acoustic cues, the presence of one or more of a set of a small number of phonological elements, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues;
  
  identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated;
  
  associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and
  
  comparing the representation to a lexicon of predetermined representations of words to identify a best match, thereby recognizing the spoken word.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The process of claim 1, wherein the step of detecting acoustic cues comprises the steps of:
    - detecting voiced and voiceless regions in the speech signal;
      
      detecting sonorant and nucleus regions in the voiced regions of the speech signal; and
      
      wherein the step of detecting presence of phonological elements, includes detecting phonological elements within each of any detected silence, fricative, sonorant and nucleus regions detected, and identifying each of the regions as defining either an onset or a rhyme.
  - 3. The process of claim 1, further comprising the step of:
    - determining how the phonological elements associated with a position are combined to form a phonological expression according to language dependent constraints.
  - 4. The process of claim 1, wherein the set of phonological elements consists of less than ten elements.
  - 5. The process of claim 1, wherein the set of phonological elements includes at least A, I, U, N, H, L and ?.
  - 6. The process of claim 1, wherein the pair of structural units is an onset-rhyme pair as defined according to the theory of government phonology.
  - 7. The process of claim 1, wherein the structural units are defined by language dependent parameters.

8. An apparatus for recognition of a spoken word in an utterance, comprising:
- means for receiving an speech signal containing the utterance;
  
  means for detecting acoustic cues in the speech signal;
  
  means for detecting, using the detected acoustic cues, the presence of one ore more of a set of a small number of phonological elements, wherein a phonological elements is a language independent atomic unit derived from one or more acoustic cues; and
  
  means for identifying, using the detected acoustic cues, a location of one or more sub-word units in the speech signal, each sub-word unit consisting of a pair of structural units, each structural unit having at most two positions to which a combination of one or more phonological elements may be associated;
  
  means for associating each of the detected phonological elements with a position in one of the identified sub-word units and generating a representation of the spoken word in the speech signal indicating the sub-word units and the combination of phonological elements associated with each position in the sub-word units; and
  
  means for comparing the representation to a lexicon of predetermined representations of words, to identify a best match thereby recognizing the spoken word.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, wherein the means for detecting acoustic cues comprises:
    - means for detecting voiced and voiceless regions in the speech signal;
      
      means for detecting sonorant and nucleus regions in the voiced regions of the speech signal; and
      
      wherein the means for detecting presence of phonological elements detects phonological elements within each of any detected silence, fricative, sonorant and nucleus regions, and identifies each of the detected regions as defining either an onset or a rhyme.
  - 10. The process of claim 8, further comprising the step of:
    - determining how the phonological elements associated with a position are combined to form a phonological expression according to language dependent constraints.
  - 11. The process of claim 8, wherein the set of phonological elements consists of less than ten elements.
  - 12. The process of claim 8, wherein the set of phonological elements includes at least A. I, U, N, H, L and ?.
  - 13. The process of claim 8, wherein the pair of structural units is an onset-rhyme pair as defined according to the theory of government phonology.
  - 14. The process of claim 8, wherein the structural units are defined by language dependent parameters.

15. An apparatus for recognition of a spoken word in an utterance, comprising:
- a phonological element and structure detector having an input for receiving a speech signal containing the utterance and an output providing a representation of the spoken word detected in the speech signal, wherein the representation comprises indications of at least one sub-word unit, each sub-word unit consisting of an a pair of structural units, each structural unit having at most two positions to which a combination of phonological elements may be associated, and indications of the combination of only phonological elements present in and associated with each position, wherein a phonological element is a language independent atomic unit derived from one or more acoustic cues and is either present or not at a point in time in the speech signal;
  
  a lexicon of predetermined representations of words, anda lexical matching system having a first input for receiving the representation from the output of the phonological element and structure detector, a second input for receiving predetermined representations from the lexicon and an output providing an indication of the predetermined representation which best matches the representation output by the phonological element and structure detector.
- View Dependent Claims (16)
- - 16. The apparatus of claim 15, wherein the phonological element and structure detector comprises:
    - a phonetic classifier and segmenter having an input for receiving acoustic cues detected in the speech signal and an output for providing a string of tokens indicative of presence of phonological elements in the speech signal according to the detected acoustic cues; and
      
      a word parser having an input for receiving the string of tokens from the phonetic classifier and segmenter and which associates each detected phonological element with one of the structural units of a sub-word unit so as to provide the representation of the spoken word in the speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
HAL Trust
Original Assignee
HAL Trust
Inventors
Kaye, Jonathan, Williams, Geoffrey
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Dorvil, Richemond

Application Number

US08/300,929
Time in Patent Office

963 Days
Field of Search

395/2, 395/2.63, 395/2.71, 395/2.28, 395/2.14, 395/2.58, 395/2.59, 395/2.64
US Class Current

704/1
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/09   the extracted parameters be...

G10L 25/15   the extracted parameters be...

G10L 25/93   Discriminating between voic...

H04N 21/8405   represented by keywords

Computer system and computer-implemented process for phonology-based automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Computer system and computer-implemented process for phonology-based automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links