Speech recognition system and method for generating phonotic estimates
First Claim
1. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
- a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
wherein the first frequency range, the first time span, the second frequency range and the second time span are each a function of one or more of the novelty parameters.
11 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters. A vector pattern recognizer and a probability processor receives the gated coincidence output and produces a phonetic estimate stream representative of acoustic signal.
-
Citations
8 Claims
-
1. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
wherein the first frequency range, the first time span, the second frequency range and the second time span are each a function of one or more of the novelty parameters.
-
-
2. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
wherein the first predetermined frequency range is substantially centered about a frequency corresponding to DFT point, and the first predetermined time span is substantially centered about an instant in time corresponding to the DFT point.
-
-
3. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
wherein for each DFT point, the novelty processor further calculates one or more additional novelty outputs, and each additional novelty output is defined by characteristics including a distinct first frequency range, first time span, second frequency range and second time span, each characteristic being a function of one or more of the novelty parameters.
-
-
4. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
a novel processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
wherein the novelty parameters, the attention parameters and the coincidence parameters are selected via a genetic algorithm.
-
-
5. A speech recognition system for transforming a short-time frequency representation of an acoustic signal into a stream of coincidence vectors, comprising:
-
a novelty processor for receiving the short-time frequency representation of the audio signal, separating one or more background components of the signal from one or more region of interest components of the signal, and producing a novelty output including the region of interest components of the signal according to one or more novelty parameters; and
a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence vector that includes data describing co-occurrences between samples of the novelty output over time and frequency according to one or more coincidence parameters;
wherein the novelty parameters and the coincidence parameters are selected via a genetic algorithm.
-
-
6. A method of transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
receiving the acoustic signal and producing a short-time frequency representation of the acoustic signal;
separating one or more background components of the representation from one or more region of interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
producing a coincidence output that includes correlations between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters;
producing a phonetic estimate stream representative of acoustic signal as a function of the gated coincidence output; and
calculating, for each of a plurality of DFT points from the a short-time frequency representation of the acoustic signal, one or more additional novelty outputs, wherein each additional novelty output is defined by characteristics including a distinct first frequency range, first time span, second frequency range and second time span, each characteristic being a function of one or more of the novelty parameters. - View Dependent Claims (7)
-
-
8. A method of transforming an acoustic signal into a stream of phonetic estimates, comprising:
-
receiving the acoustic signal and producing a short-time frequency representation of the acoustic signal;
separating one or more background components of the representation from one or more region of interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
producing a coincidence output that includes correlations between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters;
producing a phonetic estimate stream representative of acoustic signal as a function of the gated coincidence output; and
selecting the novelty parameters, the attention parameters and the coincidence parameters via a genetic algorithm.
-
Specification