Word hypothesizer for continuous speech decoding using stressed-vowel centered bidirectional tree searches

US 5,349,645 A
Filed: 12/31/1991
Issued: 09/20/1994
Est. Priority Date: 12/31/1991
Status: Expired due to Term

First Claim

Patent Images

1. A high-speed continuous speech-decoding system decoding a speech sentence by utilizing two-pass decoding, said system comprising:

means for converting a speech utterance into a sequence of feature vectors;

means for detecting stressed vowel centers in said sequence of feature vectors;

means for generating a word lattice based on the detected stressed vowel centers; and

means for performing a time-synchronous Viterbi beam search using the word lattice from said generating means to constrain said Viterbi beam search.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A word hypothesis module for speech decoding consists of four submodules: vowel center detection, bidirectional tree searches around each vowel center, forward-backward pruning, and additional short words hypotheses. By detecting the strong energy vowel centers, a vowel-centered lexicon tree can be placed at each vowel center and searches can be performed in both the left and right directions, where only simple phone models are used for fast acoustic match. A stage-wise forward-backward technique computes the word-beginning and word-ending likelihood scores over the generated half-word lattice for further pruning of the lattice. To avoid potential miss of short words with weak energy vowel centers, a lexicon tree is compiled for these words and tree searches are performed between each pair of adjacent vowel centers. The integration of the word hypothesizer with a top-down Viterbi beam search in continuous speech decoding provides two-pass decoding which significantly reduces computation time.

300 Citations

30 Claims

1. A high-speed continuous speech-decoding system decoding a speech sentence by utilizing two-pass decoding, said system comprising:
- means for converting a speech utterance into a sequence of feature vectors;
  
  means for detecting stressed vowel centers in said sequence of feature vectors;
  
  means for generating a word lattice based on the detected stressed vowel centers; and
  
  means for performing a time-synchronous Viterbi beam search using the word lattice from said generating means to constrain said Viterbi beam search.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The continuous speech-decoding system of claim 1 wherein said means for generating a word lattice comprises:
    - means for performing a time-synchronous bidirectional tree search around each detected stressed vowel center producing word hypotheses; and
      
      means for pruning the word hypotheses resulting from said bidirectional tree search in forward and backward directions.
  - 3. The continuous speech-decoding system of claim 2 wherein said means for generating a word lattice further comprises:
    - means for performing a tree search for short words.
  - 4. The continuous speech-decoding system of claim 3 wherein said means for performing a short word tree search, searches between pairs of adjacent vowel centers using a short word dictionary tree.
  - 5. The continuous speech-decoding system of claim 2 wherein said bidirectional tree search means comprises means for calculating the likelihood scores of a match between the feature vectors around the stressed vowel centers and a predetermined set of phone models;
    - andsaid bidirectional tree search also comprises means for modeling each stressed vowel by a mixture density of a vowel phone model and a liquid-vowel syllable model.
  - 6. The continuous speech-decoding system of claim 5 further comprising means utilizing two threshold conditions to discard low likelihood scores.
  - 7. The continuous speech-decoding system of claim 6 wherein the first threshold condition of said two threshold discarding means is based on a maximum likelihood value from a current tree search at the current time frame.
  - 8. The continuous speech-decoding system of claim 7 wherein the second threshold condition of said two threshold discarding means is based on the accumulated maximum likelihood score without tree path constraint of each time frame.
  - 9. The continuous speech-decoding system of claim 8 wherein a new node word hypothesis is accepted if said first and second conditions are satisfied.
  - 10. The continuous speech-decoding system of claim 9 wherein said new node word hypothesis comprises a node occurrence time and associated likelihood scores for either a left or a right half-word hypothesis.
  - 11. The continuous speech-decoding system of claim 10 comprising means for comparing the node word hypothesis of each time frame from all vowel centers and keeping only the node word hypotheses with high likelihood scores.
  - 12. The continuous speech-decoding system of claim 11 further comprising a forward-backward pruning means which operates from the first vowel center in said sequence of feature vectors to the last vowel center in said sequence of feature vectors for the forward direction and from the last vowel center to the first vowel center for the backward direction.
  - 13. The continuous speech-decoding system of claim 12 wherein said forward-backward pruning means operates in the forward direction according to the steps:
    - (a) calculate the maximum likelihood scores of all possible paths that end with a stressed vowel for a certain word hypothesis;
      
      (b) calculate the maximum likelihood scores of all paths that end with said certain word hypothesis; and
      
      (c) sort all the maximum likelihood scores from step (b) into descending order and keep only the paths having the maximum likelihood scores that exceed a predetermined threshold.
  - 14. The continuous speech-decoding system of claim 1 wherein said time-synchronous Viterbi beam search comprises a top-down Viterbi beam search.
  - 15. The continuous speech-decoding system of claim 1 wherein said vowel center detecting means comprises means for selecting only those energy peaks represented by the feature vectors which are above a predetermined energy threshold or the feature vectors associated with the energy peaks that are acoustically matched to vowels and passing indices of the time locations of the selected energy peaks to said bidirectional tree search means.

16. In a high-speed continuous speech-decoding system for decoding speech sentences into decoded word strings, said system including means for converting a speech utterance into a sequence of feature vectors, the improvement therein comprising:
- means for detecting stressed vowel centers in said sequence of feature vectors; and
  
  means for generating a word lattice based on the detected stressed vowel centers.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. The improvement of claim 16 wherein said means for generating a word lattice comprises:
    - means for performing a time-synchronous bidirectional tree search around each detected stressed vowel center producing word hypotheses; and
      
      means for pruning the word hypotheses resulting from said bidirectional tree search in forward and backward directions.
  - 18. The improvement of claim 17 wherein said means for generating a word lattice further comprises:
    - means for performing a tree search for short words.
  - 19. The improvement of claim 18 wherein said short word tree search means searches only between each pairs of adjacent vowel centers using a short word dictionary tree.
  - 20. The improvement of claim 17 wherein said bidirectional tree search means comprises:
    - means for calculating the likelihood scores of a match between the feature vectors around the stressed vowel centers and a predetermined set of phone models; and
      
      said bidirectional tree search also comprises means for modeling each stressed vowel by a mixture density of a vowel phone model and a liquid-vowel syllable model.
  - 21. The improvement of claim 20 further comprising means utilizing threshold conditions to discard low likelihood scores.
  - 22. The improvement of claim 21 wherein a first threshold condition of said discarding means is based on a maximum likelihood value from a current tree search at the current time frame.
  - 23. The improvement of claim 22 wherein a second threshold condition of said discarding means is based on an accumulated maximum likelihood score of each time frame without tree path constraint.
  - 24. The improvement of claim 23 wherein a new node word hypothesis is accepted if said first and second conditions are satisfied.
  - 25. The improvement of claim 24 wherein said new node word hypothesis comprises a node occurrence time and an associated likelihood score for a left and a right half-word hypothesis.
  - 26. The improvement of claim 17 wherein said forward-backward pruning means comprises means for comparing the word hypothesis of each time frame from all vowel centers and keeping only the word hypotheses with high likelihood scores.
  - 27. The improvement of claim 26 wherein said forward-backward pruning means operates from the first vowel center in said sequence of feature vectors to the last vowel center in said sequence of feature vectors for the forward direction and from the last vowel center to the first vowel center for the backward direction.
  - 28. The improvement of claim 27 wherein said forward-backward pruning means operates in the forward direction including the steps:
    - (a) calculate the maximum likelihood scores of all possible paths that end with stressed vowel for a certain word hypothesis;
      
      (b) calculate the maximum likelihood scores of all paths that end with said certain word hypothesis; and
      
      (c) sort all the path ending likelihood scores from step (b) into descending order and keep only the path ending likelihood scores that exceed a predetermined threshold.
  - 29. The improvement of claim 16 wherein said stressed vowel center detecting means comprises means for selecting only those energy peaks represented by said feature vectors which are above a predetermined energy threshold or the feature vectors associated with the energy peaks that are acoustically matched to vowels and passing the indices of the time location of the selected energy peaks to said bidirectional tree search means.

30. An improved high-speed continuous speech-decoding system for decoding speech sentences into decoded word strings, said system including means for converting a speech utterance into feature vectors, a word hypothesizer for generating a word lattice, and means for performing a Viterbi beam search, the improvement therein comprising:
- said word hypothesizer including,means for representing a lexicon having a plurality of lexicon entries containing primary stressed vowels for said speech-decoding system in a vowel-centered tree which is rooted in the primary stressed-vowels of said lexicon entries.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Zhao, Yunxin
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/807,255
Time in Patent Office

994 Days
Field of Search

381/29-51, 395/2.53, 395/2.64
US Class Current

704/243
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

Word hypothesizer for continuous speech decoding using stressed-vowel centered bidirectional tree searches

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

300 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Word hypothesizer for continuous speech decoding using stressed-vowel centered bidirectional tree searches

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

300 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links