Speech recognition system with efficient storage and rapid assembly of phonological graphs

US 4,980,918 A
Filed: 05/09/1985
Issued: 12/25/1990
Est. Priority Date: 05/09/1985
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for storing electronic representation of words, said apparatus comprising:

means for storing a first electronic signal representing at least two alternative pronunciations of a portion of speech, said first signal comprising data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;

means for storing a second electronic signal representing a first word comprising the first portion of speech, said second signal comprising the first identifier for representing at least a portion of the first word; and

means for storing a third electronic signal representing a second word different from the first word, said second word comprising the first portion of speech, said third signal comprising the first identifier for representing at least a portion of the second word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A continuous speech recognition system having a speech processor and a word recognition computer subsystem, characterized by an element for developing a graph for confluent links between confluent nodes; an element for developing a graph of boundary links between adjacent words; an element for storing an inventory of confluent links and boundary links as a coding inventory; an element for converting an unknown utterance into an encoded sequence of confluent links and boundary links corresponding to recognition sequences stored in the word recognition subsystem recognition vocabulary for speech recognition. The invention also includes a method for achieving continouous speech recognition by characterizing speech as a sequence of confluent links which are matched with candidate words. The invention also applies to isolated word speech recognition as with continuous speech recognition, except that in such case there are no boundary links.

114 Citations

23 Claims

1. An apparatus for storing electronic representation of words, said apparatus comprising:
- means for storing a first electronic signal representing at least two alternative pronunciations of a portion of speech, said first signal comprising data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  means for storing a second electronic signal representing a first word comprising the first portion of speech, said second signal comprising the first identifier for representing at least a portion of the first word; and
  
  means for storing a third electronic signal representing a second word different from the first word, said second word comprising the first portion of speech, said third signal comprising the first identifier for representing at least a portion of the second word.
- View Dependent Claims (2, 3, 4, 5)
- - 2. An apparatus as claimed in claim 1, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is independent of the occurrence of other portions of speech preceding or following the first portions of speech.
  - 3. An apparatus as claimed in claim 1, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is dependent on the occurrence of other portions of speech preceding the first portion of speech.
  - 4. An apparatus as claimed in claim 1, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is dependent on the occurrence of other portions of speech following the first portion of speech.
  - 5. An apparatus as claimed in claim 1, characterized in that the first electronic signal comprises probability data representing the probability of occurrence of each of the alternative pronunciations of the first portion of speech.

6. An apparatus for storing electronic representations of words, said apparatus comprising:
- means for storing a first electronic signal representing at least two alternative pronunciations of a first portion of speech, the occurrence of one of any one of said alternative pronunciations of the first portion of speech being independent of the occurrence of other portions of speech preceding or following the first portion of speech, said first signal comprising data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  means for storing a second electronic signal representing at least two alternative pronunciations of a second portion of speech different from the first portion of speech, the occurrence of any one of said alternative pronunciations of the second portion of speech being dependent on the occurrence of other portions of speech following the second portion of speech, the occurrence of any one of said alternative pronunciations of the second portion of speech being independent of the occurrence of other portions of speech preceding the second portion of speech, said second signal comprising data having a data length, said second signal being identified by a second identifier with a length less than the data length of the second signal;
  
  means for storing a third electronic signal representing at least two alternative pronunciations of a third portion of speech different from the first and second portions of speech, the occurrence of any one of said alternative pronunciations of the third portion of speech being dependent on the occurrence of other portions of speech preceding the third portion of speech, the occurrence of any of one of said alternative pronunciations of the third portion of speech being independent of the occurrence of other portions of speech following the third portion of speech, said third signal comprising data having a data length, said third signal being identified by a third identifier with a length less than the data length of the third signal;
  
  means for storing a fourth electronic signal representing a first word comprising the second portion of speech, said fourth signal comprising the second identifier for representing at least a portion of the first word;
  
  means for storing a fifth electronic signal representing a second word different from the first word, said second word comprising the third portion of speech, said fifth signal comprising the third identifier for representing at least a portion of the second word; and
  
  means for storing a sixth electronic signal comprising the first identifier for representing the second portion of speech followed by the third portion of speech.

7. A method of storing electronic representations of words, said method comprising:
- storing a first electronic signal representing at least two alternative pronunciations of a portion of speech, said first signal comprising data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  storing a second electronic signal representing a first word comprising the first portion of speech, said second signal comprising the identifier for representing at least a portion of the first word; and
  
  storing a third electronic signal representing a second word different from the first word, said second word comprising the first portion of speech, said third signal comprising the first identifier for representing at least a portion of the second word.
- View Dependent Claims (8, 9, 10, 11)
- - 8. A method as claimed in claim 7, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is independent of the occurrence of other portions of speech preceding or following the first portion of speech.
  - 9. A method as claimed in claim 7, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is dependent on the occurrence of other portions of speech preceding the first portion of speech.
  - 10. A method as claimed in claim 7, characterized in that the occurrence of any one of the alternative pronunciations of the first portion of speech is dependent on the occurrence of other portions of speech following the first portion of speech.
  - 11. A method as claimed in claim 7, characterized in that the first electronic signal comprises probability data representing the probability of occurrence of each of the alternative pronunciations of the first portion of speech.

12. A method of storing electronic representations of words, said method comprising:
- storing a first electronic signal representing at least two alternative pronunciations of a first portion of speech, the occurrence of any one of said alternative pronunciations of the first portion of speech being independent of the occurrence of other portions of speech preceding or following the first portion of speech, said first signal comprising data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  storing a second electronic signal representing at least two alternative pronunciations of a second portion of speech different from the first portion of speech the occurrence of any one of said alternative pronunciations of the second portion of speech being dependent on the occurrence of other portions of speech following the second portion of speech, the occurrence of any one of said alternative pronunciations of the second portion of speech being independent of the occurrence of other portions of speech preceding the second portion of speech, said second signal comprising data having a data length, said second signal being identified by a second identifier with a length less than the data length of the second signal;
  
  storing a third electronic signal representing at least two alternative pronunciations of a third portion of speech different from the first and second portions of speech, the occurrence of any one of said alternative pronunciations of the third portion of speech being dependent on the occurrence of other portions of speech preceding the third portion of speech, the occurrence of any one of said alternative pronunciations of the third portion of speech being independent of the occurrence of other portions of speech following the third portion of speech, said third signal comprising data having a data length, said third signal being identified by a third identifier with a length less than the data length of the third signal;
  
  storing a fourth electronic signal representing a first word comprising the second portion of speech, said fourth signal comprising the second identifier for representing at least a portion of the first word;
  
  storing a fifth electronic signal representing a second word different from the first word, said second word comprising the third portion of speech, said fifth signal comprising the third identifier for representing at least a portion of the second word; and
  
  storing a sixth electronic signal comprising the first identifier for representing the second portion of speech followed by the third portion of speech.

13. An apparatus for storing electronic representations of words, said apparatus comprising:
- means for storing a speech unit signal, said speech unit signal comprising first data representing at least a first pronunciation of a first speech unit, said speech unit signal comprising second data representing at least a second pronunciation different from the first pronunciation of the first speech unit, said first and second data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  means for storing a first word signal representing a first word comprising the first speech unit, said first word signal comprising the first identifier for representing at least a portion of the first word;
  
  means for storing a second word signal representing a second word different from the first word, said second word comprising the first speech unit, said second word signal comprising the first identifier for representing at least a portion of the second word;
  
  means for retrieving the first word signal and for retrieving the first identifier for representing at least a portion of the first word;
  
  means for identifying the stored speech unit signal from the retrieved first identifier, and for retrieving the first data or the second data from the speech unit signal; and
  
  means for generating a phone machine for one pronunciation of the portion of the first word from the retrieved first data or second data, but not both.
- View Dependent Claims (14, 15, 16, 17)
- - 14. An apparatus as claimed in claim 13, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is independent of the occurrence of other speech units preceding or following the first speech unit.
  - 15. An apparatus as claimed in claim 13, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is dependant on the occurrence of other speech units preceding the first speech unit.
  - 16. An apparatus as claimed in claim 13, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is dependent on the occurrence of other speech units following the first speech unit.
  - 17. An apparatus as claimed in claim 13, characterized in that the speech unit signal further comprises probability data representing the probability of occurrence of either the first pronunciation or the second pronunciation of the first speech unit.

18. A method of storing electronic representations of words, said method comprising the step of:
- storing a speech unit signal, said speech unit signal comprising first data representing at least a first pronunciation of a first speech unit, said speech unit signal comprising second data representing at least a second pronunciation different from the first pronunciation of the first speech unit, said first and second data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  storing a first word signal representing a first word comprising the first speech unit, said first word signal comprising the first identifier for representing at least a portion of the fist word;
  
  storing a second word signal representing a second word different from the first word, said second word comprising the first speech unit, said second word signal comprising the first identifier for representing at least a portion of the second word;
  
  retrieving the first word signal and retrieving the first identifier for representing at least a portion of the first word;
  
  identifying the stored speech unit signal from the retrieved first identifier, and retrieving the first data or the second data from the speech unit signal; and
  
  generating a phone machine for one pronunciation of the portion of the first word from the retrieved first data or second data, but not both.
- View Dependent Claims (19, 20, 21, 22)
- - 19. A method as claimed in claim 18, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is independent of the occurrence of other speech units preceding or following the first speech unit.
  - 20. A method as claimed in claim 18, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is dependent on the occurrence of other speech units preceding the first speech unit.
  - 21. A method as claimed in claim 18, characterized in that the occurrence of either the first pronunciation or the second pronunciation of the first speech unit is dependent on the occurrence of other speech units following the first speech unit.
  - 22. A method as claimed in claim 18, characterized in that the speech unit signal further comprises probability data representing the probability of occurrence of either the first pronunciation or the second pronunciation of the first speech unit.

23. A speech recognition apparatus comprising:
- means for storing a speech unit signal, said speech unit signal comprising first data representing at least a first pronunciation of a first speech unit, said speech unit signal comprising second data representing at least a second pronunciation different from the first pronunciation of the first speech unit, said first and second data having a data length, said first signal being identified by a first identifier with a length less than the data length of the first signal;
  
  means for storing a first word signal representing a first word comprising the first speech unit, said first word signal comprising the first identifier for representing at least a portion of the first word;
  
  means for storing a second word signal representing a second word different from the first word, said second word comprising the first speech unit, said second word signal comprising the first identifier for representing at least a portion of the second word;
  
  means for retrieving the first word signal and for retrieving the first identifier for representing at least a portion of the first word;
  
  means for identifying the stored speech unit signal from the retrieved first identifier, and for retrieving the first data or the second data from the speech unit signal;
  
  means for generating a phone machine for one pronunciation of the portion of the first word from the retrieved first data or second data, but not both;
  
  means for converting a spoken sound into an utterance signal; and
  
  means for matching the utterance signal to the phone machine and for outputting a match score signal proportional to the likelihood of the phone machine producing the utterance signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Cohen, Paul S., Bahl, Lalit R., Mercer, Robert L.
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
Merecki, John A.

Application Number

US06/732,472
Time in Patent Office

2,056 Days
Field of Search

381/41-45, 364/513.5, 364/200, 364/900
US Class Current

704/240
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

Speech recognition system with efficient storage and rapid assembly of phonological graphs

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

114 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system with efficient storage and rapid assembly of phonological graphs

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

114 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links