Word hypothesizer based on reliably detected phoneme similarity regions

US 5,825,977 A
Filed: 09/08/1995
Issued: 10/20/1998
Est. Priority Date: 09/08/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A word hypothesizer for processing an input speech utterance in a speech recognition system comprising:

a phoneme model database for storing phoneme model speech data corresponding to a plurality of phonemes;

a phoneme similarity module coupled to said phoneme model database and receptive of said input speech utterance for producing phoneme similarity data indicative of the correlation between said input speech utterance and said phoneme model speech data with respect to time;

a word prototype database for storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database;

a prototype comparator coupled to said word prototype database and to said phoneme similarity module for correlating said phoneme similarity data and said word prototype data to select at least one of said predetermined words as a word hypothesis for said input speech utterance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The word hypothesizer reduces the search space for more computationally expensive word recognizers. Each periodic interval of input speech is represented as a vector of phoneme similarity values from which the high similarity regions are selected and parameterized. The hypothesizer computes alignment parameters for each of a plurality of previously stored word prototypes, vis-a-vis the high similarity regions of the input speech utterance. Those word prototypes having the highest recognition scores are selected as word candidates for the fine match recognizer.

65 Citations

View as Search Results

23 Claims

1. A word hypothesizer for processing an input speech utterance in a speech recognition system comprising:
- a phoneme model database for storing phoneme model speech data corresponding to a plurality of phonemes;
  
  a phoneme similarity module coupled to said phoneme model database and receptive of said input speech utterance for producing phoneme similarity data indicative of the correlation between said input speech utterance and said phoneme model speech data with respect to time;
  
  a word prototype database for storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database;
  
  a prototype comparator coupled to said word prototype database and to said phoneme similarity module for correlating said phoneme similarity data and said word prototype data to select at least one of said predetermined words as a word hypothesis for said input speech utterance.

2. A method for hypothesizing word candidates based on an input speech utterance for use in a speech recognition system comprising:
- (a) providing a phoneme template representing a database of calibration speech;
  
  (b) comparing said input speech utterance with said phoneme template to produce speaker phoneme similarity data as a function of time;
  
  (c) processing said speaker phoneme similarity data to extract speech regions that exceed a predetermined similarity threshold, thereby defining extracted speaker features;
  
  (d) storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each target corresponding to a different phoneme, wherein each of said plurality of targets represents the occurrence of at least one phoneme similarity peak as compared with a predefined speech database;
  
  (e) aligning the extracted speaker features and word prototype data and selecting at least one word from said word prototype data which achieves a predetermined degree of correlation between the extracted speaker features and said word prototype data.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
- - 3. The method of claim 2 wherein the step of aligning the extracted speaker features and word prototype data, further including:
    - selecting the word prototype data that corresponds to said extracted speaker features;
      
      building an alignment structure that defines a set of aligned high similarity regions between said extracted speaker features and said selected word prototype data and defines data indicative of degree of correlation for each of said aligned regions.
  - 4. The method of claim 3 further including:
    - normalizing said selected word prototype data to same time scale as said corresponding extracted speaker features.
  - 5. The method of claim 3 further including:
    - performing linear regression between said selected word prototype data and said corresponding extracted speaker features.
  - 6. The method of claim 3 further including:
    - normalizing said selected word prototype data to same time scale as said corresponding extracted speaker features; and
      
      performing linear regression between said selected word prototype data and said corresponding extracted speaker features.
  - 7. The method of claim 2 further comprising the step of:
    - determining center height of speaker features.
  - 8. The method of claim 2 further comprising the step of:
    - determining center frame of speaker features.
  - 9. The method of claim 2 further comprising the step of:
    - determining left boundary frame of speaker features.
  - 10. The method of claim 2 further comprising the step of:
    - determining right boundary frame of speaker features.

11. A method for hypothesizing word candidates based on an input speech utterance for use in a speech recognition system comprising:
- providing a phoneme template representing a database of calibration speech;
  
  comparing said input speech utterance of said speaker with said phoneme template to produce speaker phoneme similarity data as a function of time;
  
  processing said speaker phoneme similarity data to extract speaker features that exceed a predetermined similarity threshold;
  
  storing word prototype data corresponding to a plurality of predetermined words, the word prototype data representing said predetermined words as a plurality of targets each corresponding to a different phoneme, wherein each of said targets represents the occurrence of at least one phoneme similarity region as compared with a predefined speech database;
  
  iteratively performing the steps;
  
  (a) determining speech match characteristics;
  
  (b) selecting said extracted speaker features which satisfy said speech match characteristics;
  
  (c) aligning the selected speaker features and word prototype data and selecting at least one from said word prototype data which achieves a predetermined degree of correlation between said selected speaker features and said word prototype data; and
  
  storing said selected word prototype data as hypothesized word candidates of said input speech utterance.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11 wherein the step of aligning the selected speaker features and word prototype data, further including:
    - selecting the word prototype data that corresponds to said selected speaker features;
      
      building an alignment structure that defines a set of aligned high similarity regions between said extracted speaker features and said selected word prototype data and defines data indicative of degree of correlation for each of said aligned regions.
  - 13. The method of claim 12 further including:
    - normalizing said selected word prototype data to same time scale as said corresponding selected speaker features.
  - 14. The method of claim 12 further including:
    - performing linear regression between said selected word prototype data and said corresponding selected speaker features.
  - 15. The method of claim 12 further including:
    - normalizing said selected word prototype data to same time scale as said corresponding extracted speaker features; and
      
      performing linear regression between said selected word prototype data and said corresponding extracted speaker features.
  - 16. The method of claim 11 further comprising the step of:
    - determining center height of speaker features.
  - 17. The method of claim 11 further comprising the step of:
    - determining center frame of speaker features.
  - 18. The method of claim 11 further comprising the step of:
    - determining left boundary frame of speaker features.
  - 19. The method of claim 11 further comprising the step of:
    - determining right boundary frame of speaker features.
  - 20. The method of claim 11 further including:
    - normalizing said selected word prototype data to same time scale as said corresponding extracted speaker features; and
      
      performing linear regression between said selected word prototype data and said corresponding extracted speaker features.

21. A method for aligning a first speech utterance and second speech utterance to determine a degree of correlation between said first and second speech utterance comprising:
- providing a phoneme template representing a database of calibration speech;
  
  for said first speech utterance, comparing the first speech utterance with said phoneme template to produce first speech utterance similarity data as a function of time;
  
  for said second speech utterance, comparing the second speech utterance with said phoneme template to produce a second speech utterance similarity data as a function of time;
  
  aligning regions of the first speech utterance phoneme similarity data and the second speech utterance phoneme similarity data that achieve a predetermined degree of correlation; and
  
  building an alignment structure that defines a set of aligned high similarity regions between said first speech utterance and said second speech utterance, and defines data indicative of degree of correlation for each of said aligned regions.
- View Dependent Claims (22, 23)
- - 22. The method of claim 21 further including:
    - normalizing said first speech utterance phoneme similarity data to same time scale as said second speech utterance phoneme similarity data.
  - 23. The method of claim 21 further including:
    - performing linear regression between said first speech utterance phoneme similarity data and said second speech utterance phoneme similarity data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Corporation Of America (Panasonic Holdings Corporation)
Original Assignee
Panasonic Technologies, Inc. (Panasonic Holdings Corporation)
Inventors
Applebaum, Ted H., Morin, Philippe R.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/526,718
Time in Patent Office

1,138 Days
Field of Search

395/2.64, 395/2.6-2.63, 395/2.48, 395/2.52, 395/2.5, 395/2.51, 395/2.55
US Class Current

704/255
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/10   using distance or distortio...

G10L 15/12   using dynamic programming t...

G10L 2015/025   Phonemes, fenemes or fenone...

Word hypothesizer based on reliably detected phoneme similarity regions

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

65 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Word hypothesizer based on reliably detected phoneme similarity regions

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

65 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links