Speech analysis syllabic segmenter

US 4,665,548 A
Filed: 10/07/1983
Issued: 05/12/1987
Est. Priority Date: 10/07/1983
Status: Expired due to Term

First Claim

Patent Images

1. Apparatus for partitioning a speech pattern into syllabic subunits comprising:

means for generating a frame sequence of autocorrelation signals corresponding to said speech pattern;

means responsive to said autocorrelation signal sequence for forming a sequence of signals representative of speech energy in the successive frames of the speech pattern;

means responsive to said speech pattern energy signals for generating a sequence of speech pattern peak energy frame signals;

means responsive to said speech energy signals sequence and said peak frame signal sequence for generating a signal representative of the minimum speech energy frame between each pair of successive peak energy frames;

means responsive to said peak and minimum energy frame signals and said autocorrelation signals for producing a sequence of candidate peak and minimum energy signals;

means responsive to said candidate peak and minimum energy frame signal sequences for forming a set of candidate syllabic subunit characteristic signals; and

means responsive to said candidate syllabic subunit characteristic signals for selecting a set of speech pattern syllabic subunits.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech pattern is partitioned into its syllabic subunits by generating signals representative of the speech energy and autocorrelation features of the time frames portions thereof. The peak energy time frames are identified from the frame energy signals and the minimum energy time frames between each pair of successive peak energy frames of the speech pattern are determined from the time frame energy and autocorrelation feature signals. Candidate syllabic subunits are formed responsive to the peak and minimum energy frame characteristics and the autocorrelation feature signals. Signals corresponding to the duration and the energy of each candidate syllabic subunit peak energy frame relative to the energy of the other peak energy frames and the maximum peak energy frame of the speech pattern are formed and these signals are combined to produce a figure of merit for each candidate syllabic subunit. The sequence of syllabic subunits for the speech pattern are selected from the candidates by comparing the figure of merit signals of the candidate subunits.

25 Citations

View as Search Results

23 Claims

1. Apparatus for partitioning a speech pattern into syllabic subunits comprising:
- means for generating a frame sequence of autocorrelation signals corresponding to said speech pattern;
  
  means responsive to said autocorrelation signal sequence for forming a sequence of signals representative of speech energy in the successive frames of the speech pattern;
  
  means responsive to said speech pattern energy signals for generating a sequence of speech pattern peak energy frame signals;
  
  means responsive to said speech energy signals sequence and said peak frame signal sequence for generating a signal representative of the minimum speech energy frame between each pair of successive peak energy frames;
  
  means responsive to said peak and minimum energy frame signals and said autocorrelation signals for producing a sequence of candidate peak and minimum energy signals;
  
  means responsive to said candidate peak and minimum energy frame signal sequences for forming a set of candidate syllabic subunit characteristic signals; and
  
  means responsive to said candidate syllabic subunit characteristic signals for selecting a set of speech pattern syllabic subunits.
- View Dependent Claims (10, 14, 23)
- - 10. Apparatus for partitioning a speech pattern into syllabic intervals according to claim 1 wherein:
    - said means for forming a frame sequence of autocorrelation signals corresponding to the speech pattern comprises means for forming a frame sequence of zeroth order autocorrelation signals and a frame sequence of first order autocorrelation signals;
      
      said means for generating a frame sequence of speech energy signals comprises means responsive to said zeroth order autocorrelation signals for generating speech energy signals;
      
      said candidate peak and minimum energy signal producing means comprises means responsive to said peak energy frame signal sequence and said first order autocorrelation signal sequence for producing a sequence of candidate peak and minimum signals.
  - 14. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 10 wherein said peak energy frame signal sequence generating means comprises means for low pass filtering said frame sequence of speech energy signals, means for determining the peak filtered energy frame signals, and means for selecting speech energy frames corresponding to said determined low pass filtered peak frames.
  - 23. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 10 wherein said candidate peak and minimum energy frame sequence producing means comprises means for generating a predetermined threshold signal and means for comparing the first order autocorrelation signal corresponding to each successive peak energy frame to said predetermined threshold signal.

2. A method for partitioning a speech pattern into syllabic subunits comprising the steps of:
- generating a frame sequence of autocorrelation signals responsive to said speech pattern;
  
  forming a sequence of signals representative of the speech energy in successive frames of the speech pattern responsive to said frame sequence of autocorrelation signals;
  
  generating a sequence of signals representative of the speech pattern peak energy frames responsive to said speech pattern energy signals;
  
  generating a signal representative of the minimum speech energy frame between each pair of successive peak energy frames responsive to said speech energy signal sequence and said peak energy frame signal sequence;
  
  producing a sequence of candidate syllabic subunit signals responsive to said peak and minimum energy frame signals and said autocorrelation signals;
  
  forming a first signal representative of the speech energy of each candidate syllabic subunit peak energy frame relative to the speech energy of the adjacent candidate syllabic subunit peak energy frames responsive to the said peak and minimum energy frame signals;
  
  forming a second signal representative of the energy of each candidate syllabic subunit peak energy frame relative to the energy of the maximum speech energy frame responsive to the said peak and minimum energy frame signals;
  
  forming a third signal representative of the duration of each candidate syllabic responsive to the said peak and minimum energy frame signals;
  
  combining said first, second and third signals of each candidate syllabic subunit to form a signal corresponding to a figure of merit for said syllabic subunit; and
  
  selecting a sequence of speech pattern syllabic subunits responsive to said candidate syllabic subunit figure of merit signals.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
- - 3. A method for partitioning a speech pattern into syllabic subunits according to claim 2 wherein:
    - said autocorrelation signal sequence generating step comprises forming a frame sequence of zeroth order autocorrelation signals;
      
      said speech energy signal sequence formation comprises generating a sequence of speech energy representative signals responsive to said zeroth order autocorrelation signals; and
      
      said peak energy frame signal sequence generating step comprises low pass filtering said frame sequence of speech energy signals, determining peak low pass filtered speech energy signals, and selecting speech energy signal frames corresponding to said determined peak low pass filtered speech energy signals jointly responsive to said peak low pass filtered energy signals and said energy signal sequence.
  - 4. A method for partitioning a speech pattern into syllabic subunits according to claim 3 wherein said said step of generating said frame sequence of autocorrelation signals comprises forming a sequence of first order autocorrelation signals responsive to said speech pattern.
  - 5. A method for partitioning a speech pattern into syllabic subunits according to claim 4 wherein:
    - said candidate syllabic unit signal producing step comprises selecting candidate peak and minimum energy frames jointly responsive to said peak energy signals, said minimum energy signals and said first order autocorrelation signals.
  - 6. A method for partitioning a speech pattern into syllabic subunits according to claim 5 wherein:
    - said first signal forming step comprises generating for each candidate syllabic subunit a signal representative of the difference between the speech energy of each candidate peak energy frame and the average speech energy of the preceding and succeeding candidate peak energy frames responsive to the said peak and minimum energy frame signals;
      
      said second signal forming step comprises generating a signal representative of the difference between the energy of each candidate peak energy frame and the energy of the maximum speech energy frame responsive to the said peak and minimum energy frame signals; and
      
      said third signal forming step comprises generating a signal representative of the duration of each syllabic subunit responsive to the candidate syllabic subunit peak energy frame and the adjacent minimum energy frames.
  - 7. A method for partitioning a speech pattern into syllabic subunits according to claim 6 wherein combining said first, second and third signals to form a figure of merit signal for each candidate syllabic subunit comprises summing signals proportional to said first, second and third signals.
  - 8. A method for partitioning a speech pattern into syllabic subunits according to claims 2, 3, 4, 5, 6, or 7 wherein said syllabic subunits are syllables.
  - 9. A method for partitioning a speech pattern into syllabic subunits according to claims 2, 3, 4, 5, 6, or 7 wherein said syllabic subunits are demisyllables.

11. A method for partitioning a speech pattern into syllabic subunits comprising the steps of:
- generating a frame sequence of zeroth order autocorrelation signals and a frame sequence of first order autocorrelation signals corresponding to said speech pattern;
  
  forming a sequence of signals representative of speech energy in the successive frames of the speech pattern responsive to said zeroth order autocorrelation signal sequence;
  
  generating a sequence of speech pattern peak energy frame signals responsive to said speech pattern energy signals;
  
  generating a signal representative of the minimum speech energy frame between each pair of successive peak energy frames responsive to said speech energy signals sequence and said peak energy frame signal sequence;
  
  producing a sequence of candidate peak and minimum energy signals responsive to said peak energy frame signal sequence, minimum energy frame signal sequence and said first order autocorrelation signal sequence;
  
  forming a set of candidate syllabic subunit characteristic signals including forming a first signal representative of the speech energy of each candidate syllabic subunit peak energy frame relative to the speech energy of the adjacent candidate syllabic subunit peak energy frames responsive to the said peak and minimum energy frame signals, forming a second signal representative of the energy of each candidate syllabic subunit peak energy frame relative to the energy of the maximum speech energy frame response to the said peak and minimum energy frame signals, and forming a third signal representative of the duration of each candidate syllabic subunit responsive to the said peak and minimum energy frame signals;
  
  combining said first, second and third signals of each candidate syllabic subunit to form a signal corresponding to a figure of merit for said candidate syllabic subunit; and
  
  selecting a sequence of speech pattern syllabic subunits responsive to said candidate syllabic subunit figure of merit signals.
- View Dependent Claims (12, 13)
- - 12. A method for partitioning a speech pattern into syllabic subunits according to claim 11 wherein said peak energy frame signal sequence generating step comprises low pass filtering said frame sequence of speech energy signals, determining the peak filtered energy frame signals, and selecting speech energy frames corresponding to said determined peak low pass filtered frames.
  - 13. A method for partitioning a speech pattern into syllabic subunits according to claim 11 wherein said candidate peak and minimum energy frame sequence producing step comprises generating a predetermined threshold signal and comparing the first order autocorrelation signal corresponding to each successive peak energy frame to said predetermined threshold signal.

15. Apparatus for partitioning a speech pattern into syllabic subunits comprising:
- means responsive to said speech pattern for generating a frame sequence of autocorrelation signals;
  
  means for forming a sequence of signals representation of the speech energy in successive frames of the speech pattern responsive to said frame sequence of autocorrelation signals;
  
  means responsive to said speech pattern energy signals for generating a sequence of signals representative of the speech pattern peak energy frames;
  
  means responsive to said speech energy signals sequence and said peak energy frame signal sequence for generating a signal representative of the minimum speech energy frame between each pair of successive peak energy frames;
  
  means responsive to said peak and minimum energy frame signals and said autocorrelation signals for producing a sequence of candidate syllabic subunit signals;
  
  means responsive to the said peak and mimimum energy frame signals for forming a first signal representative of the speech energy for each candidate syllabic subunit energy frame relative to the speech energy of the adjacent candidate syllabic subunit peak energy frames;
  
  means responsive to the said peak and minimum energy frame signals for forming a second signal representative of the energy of each candidate syllabic subunit peak energy frame relative to the energy of the maximum speech energy frame;
  
  means responsive to the peak and minimum energy frame signals for forming a third signal representative of the duration of each candidate syllabic subunit;
  
  means for combining said first, second and third signals of each candidate syllabic subunit to form a signal corresponding to a figure of merit for said candidate syllabic subunit; and
  
  means responsive to said candidate syllabic subunit figure of merit signals for selecting a sequence of speech pattern syllabic subunits.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 15 wherein:
    - said autocorrelation signal sequence generating means comprises means for forming a frame sequence of zeroth order autocorrelation signals; and
      
      said speech energy signal sequence forming means comprises means responsive to said zeroth order autocorrelation signals for generating a sequence of speech energy representative signals; and
      
      said peak energy frame signal sequence generating means comprises means for low pass filtering said frame sequence of speech energy signals, means for determining peak low pass filtered speech energy signals, and means jointly responsive to said peak low pass filtered energy signals and said energy signal sequence for selecting speech energy signal frames corresponding to said determined peak low pass filtered speech energy signals.
  - 17. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 16 wherein said means for generating said frame sequence of autocorrelation signals comprises means responsive to said speech pattern for forming a sequence of first order autocorrelation signals.
  - 18. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 17 wherein:
    - said candidate syllabic unit signal producing means comprises means jointly responsive to said peak energy signals, said minimum energy signals and said first order autocorrelation signals for selecting candidate peak and minimum energy frames.
  - 19. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 18 wherein:
    - said first signal forming means comprises means responsive to said candidate peak and minimum energy frame signals for generating for each candidate syllabic subunit a signal representative of the difference between the speech energy of each candidate peak energy frame and the average speech energy of the preceding and succeeding candidate peak energy frames;
      
      said second signal forming means comprises means responsive to candidate peak and minimum energy frame signals for generating for each candidate syllabic subunit a signal representative of the difference between the energy of each candidate peak energy frame and the energy of the maximum speech energy frame; and
      
      said third signal forming means comprises means responsive to the candidate syllabic subunit peak energy frame and the adjacent minimum energy frames for generating a signal representative of the duration of each candidate syllabic subunit.
  - 20. Apparatus for partitioning a speech pattern into syllabic subunits according to claim 19 wherein said means for combining said first, second and third signals to form said figure of merit signal for each candidate syllabic subunit comprises summing signals proportional to said first, second and third signals.
  - 21. Apparatus for partitioning a speech pattern into syllabic subunits according to claims 15, 16, 17, 18, 19, or 20 wherein said syllabic subunits are syllables.
  - 22. Apparatus for partitioning a speech pattern into syllabic subunits according to claims 15, 16, 17, 18, 19, or 20 wherein said syllabic subunits are demisyllables.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bell Telephone Laboratories, Inc. (Nokia Corporation)
Original Assignee
American Telephone & Telegraph Company (AT&T, Inc.)
Inventors
Kahn, Daniel
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US06/539,792
Time in Patent Office

1,313 Days
Field of Search

381/29-53, 364/513.5, 364/513, 364/419
US Class Current

704/237
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 25/00 Speech or voice analysis te...

Speech analysis syllabic segmenter

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Speech analysis syllabic segmenter

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links