Speech recognition system and method for generating phonotic estimates

US 6,868,380 B2
Filed: 03/23/2001
Issued: 03/15/2005
Est. Priority Date: 03/24/2000
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:

a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;

a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;

an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;

a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and

a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;

wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;

wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and

wherein the first frequency range, the first time span, the second frequency range and the second time span are each a function of one or more of the novelty parameters.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters. A vector pattern recognizer and a probability processor receives the gated coincidence output and produces a phonetic estimate stream representative of acoustic signal.

Citations

8 Claims

1. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
- a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
  
  a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
  
  a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
  
  wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
  
  wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
  
  wherein the first frequency range, the first time span, the second frequency range and the second time span are each a function of one or more of the novelty parameters.

2. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
- a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
  
  a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
  
  a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
  
  wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
  
  wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
  
  wherein the first predetermined frequency range is substantially centered about a frequency corresponding to DFT point, and the first predetermined time span is substantially centered about an instant in time corresponding to the DFT point.

3. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
- a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
  
  a novelty processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
  
  a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
  
  wherein the short-time frequency representation of the audio signal includes a series of consecutive time instances, each consecutive pair separated by a sampling interval, and each of the time instances further includes a series of discrete Fourier transform (DFT) points, such that the short-time frequency representation of the audio signal includes a series of DFT points;
  
  wherein for each DFT point, the novelty processor (i) calculates a first average value across a first predetermined frequency range and a first predetermined time span, (ii) calculates a second average value across a second predetermined frequency range and a second predetermined time span, and (iii) subtracts the second average value from the first average value so as to produce the novelty output point; and
  
  wherein for each DFT point, the novelty processor further calculates one or more additional novelty outputs, and each additional novelty output is defined by characteristics including a distinct first frequency range, first time span, second frequency range and second time span, each characteristic being a function of one or more of the novelty parameters.

4. A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates, comprising:
- a frequency analyzer for receiving the acoustic signal and producing as an output a short-time frequency representation of the acoustic signal;
  
  a novel processor for receiving the short-time frequency representation of the acoustic signal, separating one or more background components of the representation from one or more region-of-interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  an attention processor for receiving the novelty output and producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence output that includes co-occurrences between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters; and
  
  a vector pattern recognizer and a probability processor for receiving the gated coincidence output and producing a phonetic estimate stream representative of acoustic signal;
  
  wherein the novelty parameters, the attention parameters and the coincidence parameters are selected via a genetic algorithm.

5. A speech recognition system for transforming a short-time frequency representation of an acoustic signal into a stream of coincidence vectors, comprising:
- a novelty processor for receiving the short-time frequency representation of the audio signal, separating one or more background components of the signal from one or more region of interest components of the signal, and producing a novelty output including the region of interest components of the signal according to one or more novelty parameters; and
  
  a coincidence processor for receiving the novelty output and the gating signal, and producing a coincidence vector that includes data describing co-occurrences between samples of the novelty output over time and frequency according to one or more coincidence parameters;
  
  wherein the novelty parameters and the coincidence parameters are selected via a genetic algorithm.

6. A method of transforming an acoustic signal into a stream of phonetic estimates, comprising:
- receiving the acoustic signal and producing a short-time frequency representation of the acoustic signal;
  
  separating one or more background components of the representation from one or more region of interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  producing a coincidence output that includes correlations between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters;
  
  producing a phonetic estimate stream representative of acoustic signal as a function of the gated coincidence output; and
  
  calculating, for each of a plurality of DFT points from the a short-time frequency representation of the acoustic signal, one or more additional novelty outputs, wherein each additional novelty output is defined by characteristics including a distinct first frequency range, first time span, second frequency range and second time span, each characteristic being a function of one or more of the novelty parameters.
- View Dependent Claims (7)
- - 7. A method according to claim 6, further including performing a sum of products of novelty outputs over two sets of novelty outputs according to one or more selectably variable coincidence parameters including time duration, frequency extent, base time, base frequency, delta time, delta frequency, and combinations thereof.

8. A method of transforming an acoustic signal into a stream of phonetic estimates, comprising:
- receiving the acoustic signal and producing a short-time frequency representation of the acoustic signal;
  
  separating one or more background components of the representation from one or more region of interest components of the representation, and producing a novelty output including the region of interest components of the representation according to one or more novelty parameters;
  
  producing a gating signal as a predetermined function of the novelty output according to one or more attention parameters;
  
  producing a coincidence output that includes correlations between samples of the novelty output over time and frequency, wherein the coincidence output is selectively gated as a predetermined function of the gating signal, so as to produce a gated coincidence output according to one or more coincidence parameters;
  
  producing a phonetic estimate stream representative of acoustic signal as a function of the gated coincidence output; and
  
  selecting the novelty parameters, the attention parameters and the coincidence parameters via a genetic algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Eliza Corporation (Gainwell Technologies LLC)
Original Assignee
Eliza Corporation (Gainwell Technologies LLC)
Inventors
Kroeker, John
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/815,768
Publication Number

US 20010051871A1
Time in Patent Office

1,453 Days
Field of Search

704/240, 704/236, 704/243, 704/258, 704/211, 704/214, 704216-218, 704/231, 704/233, 704/237, 704/251, 704/252, 704/254, 704/263, 704267-269
US Class Current

704/240
CPC Class Codes

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

Speech recognition system and method for generating phonotic estimates

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system and method for generating phonotic estimates

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links