Automatic speech recognition

US 5,638,487 A
Filed: 12/30/1994
Issued: 06/10/1997
Est. Priority Date: 12/30/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing speech from a received signal representing a spoken sequence of one or more words comprising the steps ofreceiving a sequence of frames of acoustic events separated by boundaries,assigning to received frames respective boundary probabilities representative of the degree to which the received frames of speech correspond to stored representations of boundaries between acoustic events,selecting boundary frames based on the boundary probabilities assigned to the frames,using selected boundary frames to generate sequences of one or more words between a first selected boundary frame and a subsequent selected boundary frame, wherein multiple words in any given sequence are separated by one or more selected boundary frames,assigning a score to each generated sequence, andproviding an output corresponding to recognized speech using the sequence of one or more words with the highest assigned score.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A scheme for recognizing speech represented by a sequence of frames of acoustic events separated by boundaries, according to which the frames of speech are processed to assign to received frames respective boundary probabilities representative of the degree to which the frames of speech correspond to stored representations of boundaries between acoustic events. The assigned boundary probabilities are used in subsequent processing steps to enhance recognition of speech. The assignment of boundary probabilities and further adjustments of the assigned probabilities are preferably conducted by an artificial neural network (ANN).

Citations

34 Claims

1. A method for recognizing speech from a received signal representing a spoken sequence of one or more words comprising the steps ofreceiving a sequence of frames of acoustic events separated by boundaries,assigning to received frames respective boundary probabilities representative of the degree to which the received frames of speech correspond to stored representations of boundaries between acoustic events,selecting boundary frames based on the boundary probabilities assigned to the frames,using selected boundary frames to generate sequences of one or more words between a first selected boundary frame and a subsequent selected boundary frame, wherein multiple words in any given sequence are separated by one or more selected boundary frames,assigning a score to each generated sequence, andproviding an output corresponding to recognized speech using the sequence of one or more words with the highest assigned score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The method of claim 1 wherein the frames of speech are received at a fixed rate.
  - 3. The method of claim 2 further comprising the steps of receiving a speech signal representing a spoken sequence of one or more words, and processing said speech signal into a sequence of overlapping frames.
  - 4. The method of claim 1 wherein a boundary probability is assigned to a given frame of speech based on information about one or more neighboring frames of speech.
  - 5. The method of claim 4 wherein a boundary probability is assigned to a given frame of speech based on information about a frame of speech adjacent to the given frame of speech.
  - 6. The method of claim 4 wherein a boundary probability is assigned to a given frame of speech based on information about two neighboring frames of speech.
  - 7. The method of claim 4 wherein a boundary probability is assigned to a given frame of speech based on information about one or more speech frames preceding the given frame of speech and one or more speech frames subsequent to the given frame of speech.
  - 8. The method of claim 1 wherein boundary probabilities are assigned by an artificial neural network (ANN).
  - 9. The method of claim 8 further comprising the step of training said ANN based on only selected portions of continuous training speech likely to include boundaries between acoustic events.
  - 10. The method of claim 9 wherein the ANN is trained based only on selected portions of continuous training speech likely to involve boundaries between phonemes.
  - 11. The method of claim 10 wherein relatively few frames from the middle of phonemes are used to train the ANN.
  - 12. The method of claim 1 further comprising the step of changing a boundary probability assigned to a given frame of speech based on a boundary probability assigned to at least one neighboring frame of speech.
  - 13. The method of claim 12 wherein the boundary probability assigned to said respective frame of speech is changed based on boundary probabilities assigned to one or more speech frames preceding said respective frame of speech and based on boundary probabilities assigned to one or more speech frames subsequent to said respective frame of speech.
  - 14. The method of claim 12 wherein an ANN determines the amount by which the boundary probability assigned to said given frame of speech is changed.
  - 15. The method of claim 1 wherein said word sequences are generated based on frames of speech assigned boundary probabilities greater than a preselected threshold value.
  - 16. The method of claim 15 wherein the step of using selected boundary frames further comprises the step of preventing generation of certain word sequences based on speech frames assigned boundary probabilities greater than a second preselected threshold value that is greater than the first preselected threshold value.
  - 17. The method of claim 15 further comprising the step of respectively assigning to stored word models probabilities of match representative of the degree to which speech segments between selected boundary frames correspond to said stored word models.
  - 18. The method of claim 1 wherein the received sequence of frames of speech is generated from continuous speech.
  - 19. The method of claim 1 wherein a score is assigned to each generated sequence based at least in part on the boundary probabilities assigned to the selected boundary frames.
  - 20. The method of claim 1 wherein the plurality of word sequences are generated by generating sequences of one or more speech segments separated by selected boundary frames, with each speech segment comprising a plurality of frames, and by assigning segment probabilities to stored sub-word models based on the degree to which a given stored sub-word model corresponds to a generated speech segment.
  - 21. The method of claim 20 further comprising the step of assigning probabilities to stored word models based on probabilities assigned to the stored sub-word models.
  - 22. The method of claim 20 wherein the stored sub-word models correspond to a selected set of phonemes.

23. A speech recognizer for recognizing speech from a received signal representing a spoken sequence of one or more words comprisinga boundary classifier having an input adapted to receive a sequence of frames of acoustic events separated by boundaries, said boundary classifier being adapted to assign to received frames respective boundary probabilities representative of the degree to which the frames of speech correspond to stored representations of boundaries between acoustic events and to select boundary frames based on the boundary probabilities assigned to the frames,a network generator using boundary frames selected by said boundary classifier to generate sequences of one or more words between a first selected boundary frame and a subsequent selected boundary frame, wherein multiple words in any given sequence are separated by one or more selected boundary frames,a sequence classifier assigning a score to each generated sequence, anda processor adapted to provide an output corresponding to recognized speech using the sequence of one or more words with the highest assigned score.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 24. The speech recognizer of claim 23 further comprising a signal processor having an input for receiving a speech signal representing a spoken sequence of one or more words, said signal processor adapted to process said speech signal into a sequence of overlapping frames, said signal processor further having an output coupled to the input of said boundary classifier and adapted to pass frames of said sequence of frames to said boundary classifier at a fixed rate.
  - 25. The speech recognizer of claim 24 wherein said signal processor passes each of said overlapping frames to said boundary classifier.
  - 26. The speech recognizer of claim 23 wherein said boundary classifier assigns a boundary probability to a given frame of speech based on information about one or more neighboring frames of speech.
  - 27. The speech recognizer of claim 26 wherein said boundary classifier assigns a boundary probability to a given frame of speech using information about one or more speech frames preceding said given frame of speech and one or more speech frames subsequent to said given frame of speech.
  - 28. (Amended) The speech recognizer of claim 23 wherein said boundary classifier comprises an ANN for assigning boundary probabilities to received frames of speech.
  - 29. The speech recognizer of claim 28 wherein said ANN is trained based on only selected portions of continuous training speech likely to involve boundaries between acoustic events.
  - 30. The speech recognizer of claim 23 wherein said boundary classifier further comprises a peak-picker for changing the boundary probability assigned to a respective frame of speech based on a boundary probability assigned to at least one neighboring frame of speech.
  - 31. The speech recognizer of claim 30 wherein said peak-picker changes the boundary probability assigned to said respective frame of speech based on boundary probabilities assigned to one or more speech frames preceding said respective frame of speech and based on the boundary probabilities assigned to one or more frames of speech subsequent to said respective frame of speech.
  - 32. The speech recognizer of claim 31 wherein said peak-picker comprises an ANN.
  - 33. The recognizer of claim 23 wherein said network generator generates word sequences based on speech frames assigned boundary probabilities above a threshold value.
  - 34. The speech recognizer of claim 23 wherein the sequence classifier assigns a score to each generated sequence based at least in part on the boundary probabilities assigned to the selected boundary frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Philips Electronics North America Corporation (Koninklijke Philips N.V.)
Original Assignee
PureSpeech, Inc. (Koninklijke Philips N.V.)
Inventors
Chigier, Benjamin
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/366,682
Time in Patent Office

893 Days
Field of Search

395/2.62, 395/2.54, 395/2, 395/2.43, 395/2.57, 395/2.11, 395/2.41, 381/41, 381/43, 381/42
US Class Current

704/253
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 25/30 using neural networks

Automatic speech recognition

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic speech recognition

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links