Speech-controlled phonetic typewriter or display device using two-tier approach

US 4,435,617 A
Filed: 08/13/1981
Issued: 03/06/1984
Est. Priority Date: 08/13/1981
Status: Expired due to Term

First Claim

Patent Images

1. A two-tier method of converting an audio input, comprising words maade up of various sounds in a spoken sequence, into a visible form, comprising a sequence of corresponding phonemes, said method comprising the steps of:

(a) breaking down the spoken sequence of sounds into syllabits, each syllabit comprising a group of classes of sounds;

(b) grouping the syllabits into syllabit groups, each syllabit group defining corresponding possible words;

(c) providing, for each of said possible words corresponding to each syllabit group, a respective skeletal sequence of phonemes comprising a corresponding grouping of phonemes;

(d) determining, for each distinctive syllabit group, the phonemes occurring therein so as to develop an input sequence of phonemes for each syllabit group;

(e) comparing the input sequence of phonemes for each syllabit group with the respective skeletal sequence of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input; and

(f) providing said identified words of said audio input in said visible form.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech-controlled phonetic device utilizes a two-tier approach for converting an audio input into visual form. The device basically comprises: various components for identifying different phonemes, such as a sound separator, various sensors and transducers, a vowel scanner, a vowel transducer, and a diphthong transducer; an input synchronizer; a transcriber processor; and a printer or display device. The two-tier approach involves a first tier, wherein the identified speech sounds are broken down into syllabits (groupings of classes of sound), the spoken sequence of those syllabits is separated into possible words, and the grouping of the syllabits is indicated. The second tier involves the use of stored words with those respective groupings, but narrowed down to essential phonemes only. Thus, the second tier acts to eliminate, from such possible words, all except a specific word (the actually spoken word), which contains each of the detected phonemes in the proper sequence. Further features of the invention include a vowel identification circuit using both formant peak detection and envelope detection-comparison techniques, and the use of an input synchronizer to provide phoneme identifiers to the transcriber processor.

Citations

28 Claims

1. A two-tier method of converting an audio input, comprising words maade up of various sounds in a spoken sequence, into a visible form, comprising a sequence of corresponding phonemes, said method comprising the steps of:
- (a) breaking down the spoken sequence of sounds into syllabits, each syllabit comprising a group of classes of sounds;
  
  (b) grouping the syllabits into syllabit groups, each syllabit group defining corresponding possible words;
  
  (c) providing, for each of said possible words corresponding to each syllabit group, a respective skeletal sequence of phonemes comprising a corresponding grouping of phonemes;
  
  (d) determining, for each distinctive syllabit group, the phonemes occurring therein so as to develop an input sequence of phonemes for each syllabit group;
  
  (e) comparing the input sequence of phonemes for each syllabit group with the respective skeletal sequence of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input; and
  
  (f) providing said identified words of said audio input in said visible form.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein step (e) comprises organizing said skeletal sequences of phonemes by final phoneme, determining whether there is a match between a final phoneme of said grouping of phonemes and said final phoneme of any of said skeletal sequences of phonemes, and comparing each said skeletal sequence of phonemes having a matching said phoneme with said grouping of phonemes.
  - 3. The method of claim 1, further comprising the additional step, between steps (a) and (b), of determining whether or not a predetermined period of silence occurs after a given input sequence, and, if silence does occur, defining a word comprising at least one syllabit.
  - 4. The method of claim 3, further comprising the additional steps of determining whether or not said defined word comprises more than a predetermined number of phonemes, and, if said defined word comprises more than a predetermined number of phonemes, further processing the defined word as a long word, and, if said defined word does not comprise more than a predetermined number of phonemes, further processing said defined word as a short word.

5. A two-tier system for converting an audio input, comprising words made up of various sounds in a spoken sequence, into a visible form, comprising a sequence of corresponding phonemes, said system comprising:
- first means for breaking down the spoken sequence of sounds into syllabits, each syllabit comprising a group of classes of sounds;
  
  second means for grouping the syllabits into syllabit groups, each syllabit group defining corresponding possible words;
  
  third means for providing, for each of said possible words corresponding to each syllabit group, a respective skeletal sequence of phonemes comprising a corresponding grouping of phonemes;
  
  fourth means for determining, for each distinctive syllabit group, the phonemes occurring therein so as to develop an input sequence of phonemes for each syllabit group;
  
  fifth means for comparing the input sequence of phonemes for each syllabit group with the respective skeletal sequences of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input; and
  
  sixth means for providing said identified words of said audio input in said visible form.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 6. The system of claim 5, wherein said first means comprises at least one transducer for receiving and processing said audio input, and issuing identification outputs, vowel identification circuitry for receiving and processing said audio input to determine which of said various sounds comprise vowels, and issuing corresponding vowel identification outputs, an input synchronizer for receiving and synchronizing said identification outputs of said at least one transducer and said vowel identification outputs of said vowel identification circuitry, and providing phoneme identification outputs, and a processor responsive to said phoneme identification outputs for breaking down the spoken sequence of sounds into syllabits.
  - 7. The system of claim 6, wherein said vowel identification circuitry comprises a vowel scanner and a vowel transducer.
  - 8. The system of claim 7, wherein said vowel scanner comprises a first formant peak detector for receiving and processing said audio input to detect a first formant peak in a first predetermined frequency range of said audio input, and a second formant peak detector for receiving and processing said audio input to detect a second formant peak in a second predetermined frequency range.
  - 9. The system of claim 8, wherein said vowel scanner comprises at least one envelope detection and comparison network, each said at least one envelope detection and comparison network comprising a pair of envelope detectors for receiving and processing said audio input to determine amounts of energy stored in envelopes of said audio input in respective frequency ranges, and issuing corresponding detection outputs, and a comparator for comparing said corresponding detection outputs so as to selectively issue a corresponding vowel scanner output in correspondence thereto.
  - 10. The system of claim 7, wherein said vowel transducer comprises at least one envelope detection network, each said at least one envelope detection network comprising a pair of envelope detectors for receiving and processing said audio input to determine respective quantities of energy stored in said audio input within respective predetermined frequency ranges, and issuing respective detector outputs, and a comparator for comparing said respective detector outputs so as to selectively issue corresponding comparison outputs.
  - 11. The system of claim 10, wherein said vowel scanner issues vowel scanner outputs, said vowel transducer comprising gate means responsive to said comparison outputs and said vowel scanner outputs for issuing said vowel identification outputs.
  - 12. The system of claim 6, further comprising a diphthong transducer connected to said vowel identification circuitry and receiving said vowel identification outputs therefrom, said diphthong transducer comprising an envelope detector for receiving and processing said audio input to determine the quantity of energy stored in an envelope in said audio input, and issuing a detector output, a comparator responsive to said detector output from said envelope detector and to said audio input for issuing a comparison output, and gate means responsive to said comparison output and to said vowel identification outputs of said vowel identification circuitry for selectively issuing diphthong identification outputs.
  - 13. The system of claim 12, said diphthong transducer further comprising a ratio memory responsive to said comparison output for issuing a memory output corresponding to at least one predetermined ratio, and a further comparator responsive to said comparison output and to said memory output for issuing a further comparison output, said gate means being responsive to said further comparison output for selectively issuing said diphthong identification outputs.
  - 14. The system of claim 6, wherein said input synchronizer comprises at least one sampler for receiving at least one of said phoneme identification outputs and said vowel identification outputs to provide sampler outputs, and at least one digital encoder for receiving and encoding said sampler outputs to provide encoder outputs corresponding to said at least one of said phoneme identification outputs and said vowel identification outputs.

15. In a system for converting an audio input, comprising words made up of various sounds in a spoken sequence, into a visible form, comprising a sequence of corresponding phonemes, said system comprising:
- at least one transducer for receiving and processing said audio input to derive at least one phoneme identification output; and
  
  vowel identification means for receiving and processing said audio input to provide vowel identification outputs;
  
  the improvement wherein said vowel identification means comprises a vowel scanner for scanning said audio input to obtain preliminary vowel identification outputs, and a vowel transducer for receiving and processing said audio input so as to provide an enabling signal selecting one of said preliminary vowel identification outputs, whereby to provide said vowel identification outputs of said vowel identification means.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 28)
- - 16. In the system of claim 15, wherein said vowel scanner comprises formant peak detector means for detecting a pair of formant peaks in a respective pair of frequency ranges.
  - 17. In the system of claim 15, further comprising diphthong transducer means connected to said vowel identification means for receiving said vowel identification outputs, and for receiving and processing said audio input in accordance with said vowel identification outputs, so as to provide final vowel identification outputs identifying specific vowels in said audio input, and diphthong identification outputs identifying specific diphthongs in said audio input.
  - 18. In the system of claim 17, further comprising input synchronizer means connected to said at least one transducer and to said diphthong transducer means for receiving and synchronizing said phoneme identification outputs, said final vowel identification outputs and said diphthong identification outputs, so as to provide said sequence of corresponding phonemes comprising said visible form.
  - 19. In the system of claim 18, further comprising processor means connected to said input synchronizer means and responsive to said sequence of corresponding phonemes for breaking down the sequence of corresponding phonemes into syllabits, each syllabit comprising a group of classes of sounds.
  - 20. In the system of claim 19, wherein said processing means groups the syllabits into syllabit groups, each syllabit group defining corresponding possible words, and wherein said processor means provides, for each of said possible words corresponding to each syllabit group, a respective skeletal sequence of phonemes comprising a corresponding grouping of phonemes.
  - 21. In the system of claim 20, wherein said processor means compares the input sequence of phonemes for each syllabit group with the respective skeletal sequences of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input, whereby to provide said identified words of said audio input in said visible form.
  - 22. In the system of claim 15, further comprising input synchronizer means connected to said at least one transducer and to said vowel identification means for receiving and synchronizing said at least one phoneme identification output and said vowel identification outputs, respectively, so as to provide said sequence of corresponding phonemes comprising said visible form.
  - 23. In the system of claim 22, further comprising processor means connected to said input synchronizer means for receiving and processing said sequence of corresponding phonemes so as to break down the spoken sequence of sounds into syllabits, each syllabit comprising a group of classes of sounds.
  - 24. In the system of claim 23, wherein said processing means groups the syllabits into syllabit groups, each syllabit group defining corresponding possible words, and wherein said processor means provides, for each of said possible words corresponding to each syllabit group, a skeletal sequence of phonemes comprising a corresponding grouping of phonemes.
  - 25. In the system of claim 24, wherein said processor means compares the input sequence of phonemes for each syllabit group with the respective skeletal sequences of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input, whereby to provide said identified words of said audio input in said visible form.
  - 28. In the system of claim 16, wherein said vowel scanner further comprises envelope detector means for receiving and processing said audio input to determine amounts of energy stored in envelopes of said audio input in respective frequency ranges, and issuing corresponding detection outputs, and comparison means for comparing said corresponding detection outputs so as to selectively issue a corresponding vowel scanner output in correspondence thereto.

26. In a system for converting an audio input, comprising words made up of various sounds in a spoken sequence, into a visible form, comprising a sequence of corresponding phonemes, said system comprising:
- phoneme identifying means responsive to said audio input for identifying said sequence of corresponding phonemes, andprocessor means for receiving and processing said sequence of corresponding phonemes to provide said identified words of said audio input in said visible form;
  
  the improvement wherein said processor means breaks down the spoken sequence of sounds into syllabits, each syllabit comprising a group of classes of sounds, and wherein said processor means groups the syllabits into syllabit groups, each syllabit group defining corresponding possible words, and provides, for each of said possible words corresponding to each syllabit group, a respective skeletal sequence of phonemes comprising a corresponding grouping of phonemes.
- View Dependent Claims (27)
- - 27. In the system of claim 26, wherein said processor means compares the input sequence of phonemes for each syllabit group with the respective skeletal sequences of phonemes of each of the corresponding possible words so as to determine, with reference to the phonemes in each grouping of phonemes, which possible word has a skeletal sequence of phonemes which contains, in a given sequence, phonemes all of which are found, in said given sequence, in the input sequence of phonemes, thereby identifying each of said words of said audio input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Griggs Talkwriter Corporation
Original Assignee
David T. Griggs
Inventors
Griggs, David T.
Primary Examiner(s)
Kemeny, Emanuel S.

Application Number

US06/292,717
Time in Patent Office

936 Days
Field of Search

179/1 SA, 179/1 SB, 179/1 SD, 179/1 SE, 364/513, 340/146.3 WD
US Class Current

704/254
CPC Class Codes

G10L 25/87 Detection of discrete point...

Speech-controlled phonetic typewriter or display device using two-tier approach

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech-controlled phonetic typewriter or display device using two-tier approach

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links