Statistical acoustic processing method and apparatus for speech recognition using a toned phoneme system

US 5,751,905 A
Filed: 03/15/1995
Issued: 05/12/1998
Est. Priority Date: 03/15/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing words of speech comprising at least one syllable having tonal content, the method comprising the steps of:

decomposing said at least one syllable into a preme and a toneme, the toneme having a tone value; and

recognizing the words of speech based on the preme and toneme of said at least one syllable including the steps of;

continuously detecting a pitch value for the toneme of said at least one syllable;

creating at least one pitch contour based on the detected pitch value;

determining whether a discontinuity representing an un-toned portion of said at least one syllable exists between adjacent pitch contours and if so producing at least one simulated tone value to mask the discontinuity;

obtaining parameters from the pitch value for the toneme and from a derivative of the at least one pitch contour; and

determining the tone value of the toneme of said at least one syllable using the parameters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for acoustic signal processing of speech recognition, the method comprising the following components: 1) Decompose each syllable into two phonemes of comparable length and complexity, the first one being a preme, and the second one being a toneme; 2) Each toneme is assigned a tone value such as high, rising, low, falling, and untoned; 3) No tone value is assigned to premes; 4) Pitch is detected continuously and treated the same way as energy and cepstrals in a Hidden Markov Model to predict the tone of a toneme; 5) The tone of a syllable is defined as the tone of its component toneme.

Citations

12 Claims

1. A method for recognizing words of speech comprising at least one syllable having tonal content, the method comprising the steps of:
- decomposing said at least one syllable into a preme and a toneme, the toneme having a tone value; and
  
  recognizing the words of speech based on the preme and toneme of said at least one syllable including the steps of;
  
  continuously detecting a pitch value for the toneme of said at least one syllable;
  
  creating at least one pitch contour based on the detected pitch value;
  
  determining whether a discontinuity representing an un-toned portion of said at least one syllable exists between adjacent pitch contours and if so producing at least one simulated tone value to mask the discontinuity;
  
  obtaining parameters from the pitch value for the toneme and from a derivative of the at least one pitch contour; and
  
  determining the tone value of the toneme of said at least one syllable using the parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the preme and toneme for said at least one syllable are of approximately equal duration.
  - 3. The method of claim 1, wherein the preme of a syllable is a phoneme representing a first portion of each syllable.
  - 4. The method of claim 1, wherein the toneme of said at least one syllable is a phoneme representing an end portion of said at least one syllable.
  - 5. The method of claim 1, wherein the tone value of said at least one syllable is defined as the tone value of said at least one syllable'"'"'s toneme.
  - 6. (Amended) The method of claim 1, wherein any tonal content of the preme of said at least one syllable is ignored for purposes of determining the tone value of the toneme of said at least one syllable.
  - 7. The method of claim 4, wherein values of the tone of a toneme include high, rising, low, falling, untoned, and neutral.
  - 8. The method of claim 1, wherein the step of producing said at least one simulated tone value to mask the discontinuity includes the steps of:
    - deriving the logarithm of the pitch value for the toneme of said at least one syllable; and
      
      extrapolating the at least one pitch contour at the discontinuity with an exponential decaying towards a running average combined with a random signal.

9. A system for recognizing words of speech comprising at least one syllable having tonal content, comprising:
- means for decomposing said at least one syllable into a preme and a toneme, the toneme having a tone value; and
  
  means for recognizing the words of speech based on the preme and toneme of said at least one syllable comprising;
  
  means for converting the words of speech into an electrical signal;
  
  pitch extraction means for extracting a pitch value for the toneme of said at least one syllable if the signal energy is above a threshold;
  
  means for extrapolating the signal wherever the signal energy is below the threshold or the extracted pitch value is not within a pre-determined range to generate an extended pitch signal;
  
  storage means for storing data including the extended pitch signal and at least one derivative of the extended pitch signal; and
  
  means for determining the tone value of the toneme of said at least one syllable using the stored data.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9,wherein the means for decomposing said at least one syllable into a preme and a toneme comprises:
    - an A/D converter for receiving the signal of the converting means;
      
      means for detecting a beginning and an end of said at least one syllable; and
      
      means for designating a first portion of said at least one syllable as the preme, and for designating a second portion of said at least one syllable as the toneme, the preme and toneme of said at least one syllable being of comparable duration.
  - 11. The system of claim 10, wherein a hidden Markov model is used to represent the toneme and toneme of said at least one syllable.
  - 12. The system of claim 9, further comprising a low pass filter for passing low frequencies of the extended pitch signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Picheny, Michael Alan, Monkowski, Michael Daniel, Gopinath, Ramesh Ambat, Chen, Chengjun Julian
Primary Examiner(s)
Zele, Krista M.
Assistant Examiner(s)
WEAVER, SCOTT LOUIS

Application Number

US08/404,786
Time in Patent Office

1,154 Days
Field of Search

395/2, 395/2.6, 395/2.63, 395/2.64, 395/2.65, 395/2.66, 395/2.16, 381/39, 381/41, 381/43, 381/44, 381/45, 381/49, 381/50
US Class Current

704/254
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 25/06   the extracted parameters be...

G10L 25/15   the extracted parameters be...

G10L 25/90   Pitch determination of spee...

Statistical acoustic processing method and apparatus for speech recognition using a toned phoneme system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Statistical acoustic processing method and apparatus for speech recognition using a toned phoneme system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links