System for recognizing speech
First Claim
1. A system for recognizing speech features in an input speech signal, said input speech signal changing over tame and containing tonotopic information, said system comprising:
- first means for filtering the input speech signal provide an output having amplitudes that are functions of both tonotopy and time in a first two dimensional representation, said output indicating the tonotopic information of said input speech signal over a time period; and
second means for filtering said output to provide an output that, over time, indicates a second two dimensional representation in tonotopy and time of one or more elementary tonotopic features of the input speech signal, said features including onset, rise and fall of any significant tones of the input speech signal over time.
1 Assignment
0 Petitions
Accused Products
Abstract
A pattern recognition system particularly useful for recognizing speech or handwriting. An input signal is first filtered by a filter bank having two stages where the outputs of the first stage is fed forward to the second stage of a significant number of filters and the output of the second stage is fed back to the first stage of a significant number of the filters. Such feedback enhances the signal-to-noise ratio and resembles the coupling between the different sections of the basilar membrane of the cochlear. The output of the filter bank is a two-dimensional frequency-time representation of the original signal. A second set of filters which takes as input two-dimensional signals, detects the presence of elementary tonotopic features such as the onset, rise, fall and frequency of any significant tones in a speech signal. A third set of filters detects any contrasts in the elementary features at various levels of resolution. After such filtering, a neural network is employed to learn patterns formed from the multi-resolution contrasts in the identified features so that the system recognizes symbols from an input signal that is continuous in time. In the case of speech, the system recognizes continuous speech in a speaker-independent manner, and is also tolerant of noise.
-
Citations
75 Claims
-
1. A system for recognizing speech features in an input speech signal, said input speech signal changing over tame and containing tonotopic information, said system comprising:
-
first means for filtering the input speech signal provide an output having amplitudes that are functions of both tonotopy and time in a first two dimensional representation, said output indicating the tonotopic information of said input speech signal over a time period; and second means for filtering said output to provide an output that, over time, indicates a second two dimensional representation in tonotopy and time of one or more elementary tonotopic features of the input speech signal, said features including onset, rise and fall of any significant tones of the input speech signal over time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A filter bank for improving signal to noise ratio in processing an input signal comprising:
-
a plurality of M filters arranged in parallel, M being a positive integer, each filter having a first stage and a second stage; wherein the first stage of each filter includes; (a) first delay means for delaying the input signal, and (b) means for subtracting from the input signal or a signal derived therefrom the delayed input signal or a signal derived therefrom and adding thereto feedback signals from at least some of the second stages of the M filters or signals derived therefrom to derive an output signal; wherein the second stage of each filter provides an output signal and includes; (c) first means for adding the output signals of the first stages of at least some of the filters or signals derived therefrom to obtain a first sum signal; (d) second means for delaying said first sum signal and supplying said delayed first sum signal or a signal derived therefrom to the first stages of at least some of the filters; (e) second means for adding the sum signal and the delayed sum signal or signals derived therefrom to obtain a second sum signal; (f) third means for delaying the output signal of the second stage and supplying said delayed output signal of the second stage or a signal derived therefrom to the first stages of at least some of the filters; and (g) means for adding to the second sum signal or a signal derived therefrom the delayed output signal of the second stage or a signal derived therefrom to derive the output signal of the second stage. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
-
- 32. A system for recognizing speech features in an input speech signal that has time and frequency dependent amplitudes, said system comprising means for filtering said input speech signal or a signal derived therefrom in a two dimensional representation in tonotopy and time to provide an output indicating contrast information in the representation, said contrast information in turn indicating the presence of any significant speech features in the input speech signal.
-
55. A system for recognizing speech features an input speech signal, said input speech signal containing tonotopic information, said system comprising:
-
means for filtering the input speech signal to provide a filtered output, said output indicating the tonotopic information of said input speech signal over a time period and identifying any significant speech features therein; and a neural network comprising; at least one pair of phoneme- or suprasegmental-related formation and deformation layers for processing the output of the filtering means to provide formation and deformation maps, said formation layer processing the output of the filtering means or of the deformation maps to identify phoneme- or suprasegmental-related features in said input speech signal, said deformation layer performing a local-averaging function on said formation map to provide the deformation map to enable the recognition of phonemes or suprasegmentals in said input speech signal irrespective of variability of speech of different speakers. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62)
-
-
63. A method for recognizing speech features in an input speech signal, said input speech signal changing over time and containing tonotopic information, said method comprising:
-
(a) filtering the input speech signal to provide an output having amplitudes that are functions of both tonotopy and time in a first two dimensional representation, said output indicating the tonotopic information of said input speech signal over a time period; and (b) filtering said output to provide an output that, over time, indicates a second two dimensional representation in tonotopy and time of one or more elementary tonotopic features of the input speech signal, said features including onset, rise and fall of any significant tones of the input signal over time. - View Dependent Claims (64, 65, 66, 67, 68)
-
- 69. A method for recognizing speech features in an input speech signal that has time and frequency dependent amplitudes, said method comprising filtering said signal or a signal derived therefrom in a two dimensional representation in tonotopy and time to provide an output indicating contrast information in the representation, said contrast information in turn indicating the presence of any significant speech features in the input speech signal.
-
74. A method for recognizing speech features in an input speech signal, said input speech signal containing tonotopic information, said system comprising:
-
filtering the input speech signal to provide a filtered output, said output indicating the tonotopic information of said input speech signal over a time period and identifying any significant speech features therein; and processing the output of the filtering step to provide a formation map to enable the identification of phoneme- or suprasegmental-related features in said input speech signal, and performing a local-averaging function on said formation map to provide a deformation map to enable the recognition of phonemes or suprasegmentals in said input speech signal irrespective of variability of speech of different speakers. - View Dependent Claims (75)
-
Specification