Speech-recognition method and apparatus for recognizing phonemes in a voice signal

US 4,592,085 A
Filed: 02/23/1983
Issued: 05/27/1986
Est. Priority Date: 02/25/1982
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing particular phonemes in a voice signal having silence-phoneme and phoneme-phoneme transitions, said method comprising the steps of:

providing an electrical signal representing said voice signal;

producing a first acoustic parameter signal from said electrical signal, said first acoustic parameter signal containing phonemic information of said voice signal;

generating a transition signal from the phonemic information in said first acoustic parameter signal indicating the location in said voice signal of a transition;

storing said first acoustic parameter signal; and

producing a second acoustic parameter signal from said stored first acoustic parameter signal using said transition signal, said second acoustic parameter signal containing phonemic information of said voice signal at said transition, whereby said second acoustic parameter signal can be compared with known phonemic information to recognize the phonemic information in said voice signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Phoneme recognition uses the silence-phoneme and phoneme-phoneme transition spectral information rather than the phoneme information itself. The transition detector features first and second differences in level for each frequency band.

56 Citations

View as Search Results

28 Claims

1. A method for recognizing particular phonemes in a voice signal having silence-phoneme and phoneme-phoneme transitions, said method comprising the steps of:
- providing an electrical signal representing said voice signal;
  
  producing a first acoustic parameter signal from said electrical signal, said first acoustic parameter signal containing phonemic information of said voice signal;
  
  generating a transition signal from the phonemic information in said first acoustic parameter signal indicating the location in said voice signal of a transition;
  
  storing said first acoustic parameter signal; and
  
  producing a second acoustic parameter signal from said stored first acoustic parameter signal using said transition signal, said second acoustic parameter signal containing phonemic information of said voice signal at said transition, whereby said second acoustic parameter signal can be compared with known phonemic information to recognize the phonemic information in said voice signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A method for recognizing particular phonemes in a voice signal as in claim 1, wherein said step of producing a first acoustic parameter signal comprises the sub-steps of:
    - providing from an analog electrical signal a digital electrical signal representing said voice signalstoring in turn a plurality of said digital signals in a register means; and
      
      producing said first acoustic parameter signal from said stored digital signals by Fourier-transforming a plurality of said stored digital signals.
  - 3. A method for recognizing particular phonemes in a voice signal as in claim 1, wherein said step of generating comprises the sub-steps of:
    - separating a plurality of time frames of said first acoustic parameter signal into a plurality of frequency band signals, each said frequency band signal representing a power level of said first acoustic parameter signal in a particular frequency band and time frame;
      
      calculating from said plurality of frequency band signals an average power level at each said time frame;
      
      calculating for all said time frames a plurality of first difference levels between said average power level at each said time frame and said plurality of power levels at the same time frame;
      
      calculating for all said frequency bands a plurality of second difference levels between;
      
      (1) the lowest of said first difference levels in each said frequency band for said plurality of time frames, and(2) each said first difference level in the same frequency band across said plurality time frames; and
      
      calculating the sum of all of said second difference levels, whereby said sum comprises said transition signal which can be evaluated to detect transitions in said voice signal.
  - 4. A method for recognizing particular phonemes in a voice signal as in claim 3, wherein said step of generating further comprises the sub-step of evaluating said transition signal to detect peaks therein by time-sampling said transition signal using a predetermined time interval and identifying as a peak level each maximum of said transition signal occurring in the middle of said time interval to thereby locate transitions in said voice signal.
  - 5. A method for recognizing particular phonemes in a voice signal as in claim 4, wherein each said first difference level is the difference between the logarithm of said respective average power level and the logarithm of said respective power level, whereby the influence on said first difference levels of variations in emphasis from phoneme to phoneme of a particular speaker is minimized.
  - 6. A method for recognizing particular phonemes in a voice signal as in claim 5, wherein a bias is applied to each said average power level and to each said power level prior to calculating the logarithms thereof, whereby the influence on said first difference levels of extraneous noise during silences in the voice signal is minimized.
  - 7. A method for recognizing particular phonemes in a voice signal as in claim 6, wherein said step of generating further comprises the sub-step of selectively weighting said power levels of said first acoustic parameter signal to accurately represent the phonemic information in said voice signal.
  - 8. A method for recognizing particular phonemes in a voice signal as in claim 1, wherein said step of storing comprises the sub-steps of:
    - separating said first acoustic parameter signal into a plurality of frequency band signals;
      
      converting said first acoustic parameter signal into a third acoustic parameter signal comprising fewer frequency band signals and containing the phonemic information in said first acoustic parameter signal; and
      
      storing said third acoustic parameter signal for use in producing said second acoustic parameter signal from said converted first acoustic parameter signal.
  - 9. A method for recognizing particular phonemes in a voice signal as in claim 8, further including the step of weighting the power level of said first acoustic parameter signal to accurately represent the phonemic information in said voice signal.

10. An apparatus for recognizing particular phonemes in a voice signal having silence-phoneme and phoneme-phoneme transition, said apparatus comprising:
- means for providing an electrical signal representing said voice signal;
  
  first parameter producing means for producing a first acoustic parameter signal from said electrical signal, said first acoustic parameter signal containing phonemic information of said voice signal;
  
  generating means for generating a transition signal from the phonemic information in said first acoustic parameter signal, said transition signal indicating the location in said voice signal of a transition;
  
  storage means for storing said first acoustic parameter signal; and
  
  second parameter producing means for producing a second acoustic parameter signal from said stored first acoustic parameter signal using said transition signal, said second acoustic parameter signal containing phonemic information of said voice signal at said transition, whereby said second acoustic parameter signal can be compared with known phonemic information to recognize the phonemic information in said voice signal.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. An apparatus for recognizing particular phonemes in a voice signal as in claim 10, wherein said first parameter producing means comprises:
    - means for converting an analog electrical signal of said voice signal to a digital electrical signal;
      
      register means for storing in turn a plurality of said digital signals; and
      
      means for producing said first acoustic parameter signal from said stored digital signals by Fourier-transforming a plurality of said stored digital signals.
  - 12. An apparatus for recognizing particular phonemes in a voice signal as in to claim 10, wherein said generating means comprises:
    - means for separating said first acoustic parameter signal into a plurality of frequency band signals, each said frequency band signal representing a power level of said first acoustic parameter signal in a particular frequency band and time frame;
      
      averaging means for calculating from said plurality of frequency band signals an average power level at each said time frame;
      
      difference circuit means for calculating for all said time frames a plurality of first difference levels between said average power level at each said time frame and said plurality of power levels at the same time frame;
      
      memory means for storing a plurality of said first difference levels for a plurality of time frames;
      
      operating circuit means for determining from said stored first difference levels a plurality of minimum first difference levels, each said frequency band having a minimum first difference level for said plurality of time frames; and
      
      summing means for calculating the sum of a plurality of second difference levels, each comprising the difference between;
      
      (1) said minimum first difference level in each said frequency band, and(2) each said first difference level in the same frequency band for said plurality of time frames, whereby said sum comprises said transition signal which can be evaluated to detect transitions in said voice signal.
  - 13. An apparatus for recognizing particular phonemes in a voice signal as in claim 12, wherein said generating means further comprises peak evaluation means for evaluating said transition signal to detect peaks therein by time-sampling said transition signal using a predetermined time interval and identifying, as a peak level, each maximum of said transition signal occurring in the middle of a said time interval to thereby locate transitions in said voice signal.
  - 14. An apparatus for recognizing particular phonemes in a voice signal as in claim 13, further comprising log circuit means for calculating the logarithms of said respective average power levels and said respective power levels, and wherein said first difference levels represent the differences between said respective logarithms, whereby the influence on said first difference levels of variations in emphasis from phoneme to phoneme of a particular speaker is minimized.
  - 15. An apparatus for recognizing particular phonemes in a voice signal as in claim 14, wherein said log circuit means includes bias means for applying a bias to each said average power level and to each said power level prior to calculating the logarithms thereof, whereby the influence on said first difference levels of extraneous noise during silences in said voice signal is minimized.
  - 16. An apparatus for electrically recognizing particular phonemes in a voice signal as in claim 15, wherein said generating means further comprises weighting means for weighting said power level of said first acoustic parameter signal to accurately represent the phonemic information in said voice signal.
  - 17. An apparatus for electrically recognizing particular phonemes in a voice signal as in claim 10, wherein said storage means comprises:
    - means for separating said first acoustic parameter signal into a plurality of frequency band signals;
      
      means for converting said first acoustic parameter signal into a third acoustic parameter signal comprising fewer frequency band signals and containing the phonemic information in said first acoustic parameter signal; and
      
      means for storing said third acoustic parameter signal for use in producing said second acoustic parameter signal from said converted first acoustic parameter signal.
  - 18. An apparatus for recognizing particular phonemes in a voice signal as in claim 11, further comprising weighting means for weighting the power level of said first acoustic parameter signal to accurately represent the phonemic information in said voice signal.

19. A method for generating a transition signal for indicating the location of a transition in a voice signal having silence-phoneme and phoneme-phoneme transitions, the method comprising the steps of:
- providing an acoustic parameter signal containing phonemic information of the voice signal;
  
  separating a plurality of time frames of said acoustic parameter signal into a plurality of frequency band signals, each said frequency band signal representing a power level of said acoustic parameter signal in a particular frequency band and time frame;
  
  calculating from said plurality of frequency band signals an average power level at each said time frame;
  
  calculating for all said time frames a plurality of first difference levels between said average power level at each said time frame and said plurality of power levels at the same frame;
  
  calculating for all said frequency bands a plurality of second difference levels between;
  
  (1) the lowest of said first difference levels in each said frequency band for said plurality of time frames, and(2) each said first difference level in the same frequency band for said plurality of time frames; and
  
  calculating the sum of all of said second difference levels, whereby said sum comprises said transition signal which can be evaluated to detect transitions in said voice signal.
- View Dependent Claims (20, 21, 22, 23)
- - 20. A method for generating a transition signal as in claim 19, further comprising the step of evaluating said transition signal to detect peaks therein by time-sampling said transition signal using a predermined time interval and identifying as a peak level each maximum of said transition signal occurring in the middle of a said time interval to thereby locate transitions in said voice signal.
  - 21. A method for generating a transition signal as in claim 20 wherein each said first difference level is the difference between the logarithm of said respective average power level and the logarithm of said respective power level, whereby the influence on said first difference levels of variations in emphasis from phoneme to phoneme of a particular speaker is minimized.
  - 22. A method for generating a transition signal as in claim 21, wherein a bias is applied to each said average power level and each said power level prior to calculating the logarithms thereof, whereby the influence on said first difference levels of extraneous noise during silences in the voice signal is minimized.
  - 23. A method for generating a transition signal as in claim 22 wherein said method further comprises the step of selectively weighting said power levels of said acoustic parameter signal to accurately represent the phonemic information in said voice signal.

24. An apparatus for generating a transition signal that can be evaluated to indicate the location in a voice signal of silence-phoneme and phoneme-phoneme transitions, the apparatus comprising:
- means for separating a plurality of time frames of an acoustic parameter signal containing phonemic information of the voice signal into a plurality of frequency band signals, each said frequency band signal representing a power level of said acoustic parameter signal in a particular frequency band and time frame;
  
  averaging means for calculating from said plurality of frequency band signals an average power level at each said time frame;
  
  difference circuit means for calculating for all said time frames a plurality of first difference levels between said average power level at each said time frame and said plurality of power levels at the same time frame;
  
  memory means for storing a plurality of said first difference levels for a plurality of time frames;
  
  operating circuit means for determining from said stored first difference levels a plurality of minimum first difference levels, each said frequency band having a minimum first difference level for said plurality of time frames; and
  
  summing means for calculating the sum of a plurality of second difference levels, each comprising the difference between;
  
  (1) said minimum first difference level in each said frequency band, and(2) each said first difference level in the same frequency band for said plurality of time frames, whereby said sum comprises said transition signal which can be evaluated to detect transitions in said voice signal.
- View Dependent Claims (25, 26, 27, 28)
- - 25. An apparatus for generating a transition signal as in claim 24 wherein said apparatus further comprises peak evaluation means for evaluating said transition signal to detect peaks therein by time-sampling said transition signal using a predetermined time interval and identifying as a peak level each maximum of said transition signal occurring in the middle of a said time interval and to thereby locate transitions in said voice signal.
  - 26. An apparatus for generating a transition signal as in claim 25 further comprising log circuit means for calculating the logarithms of said respective average power levels and said respective power levels, and wherein said first difference levels represent the differences between said respective logarithms, whereby the influence on said first difference levels of variations in emphasis from phoneme to phoneme of a particular speaker is minimized.
  - 27. An apparatus for generating a transition signal as in claim 26, wherein said log circuit means includes bias means for applying a bias to each said average power level and to each said power level prior to calculating the logarithms thereof, whereby the influence on said first difference levels of extraneous noise during silences in said voice signal is minimized.
  - 28. An apparatus for generating a transition signal as in claim 27 wherein said apparatus further comprises weighting means for weighting the power level of said acoustic parameter signal to accurately represent the phonemic information in said voice signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Waku, Toshihiko, Nishioka, Hisao, Akabane, Makoto, Watari, Masao
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/469,114
Time in Patent Office

1,189 Days
Field of Search

381/41-50, 364/573.5
US Class Current

704/254
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech-recognition method and apparatus for recognizing phonemes in a voice signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

56 Citations

28 Claims

Specification

Use Cases

Quick Links

Others

Speech-recognition method and apparatus for recognizing phonemes in a voice signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

28 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others