Method and arrangement for speech to text conversion

US 5,752,227 A
Filed: 05/01/1995
Issued: 05/12/1998
Est. Priority Date: 05/10/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for speech to text conversion including the steps of:

identifying phonemes from a segment of input speech to be converted into text;

interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech;

determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values;

determining a second fundamental tone of the input speech;

determining a second intonation pattern of the second fundamental tone of the input speech including second maximum and minimum values of the second fundamental tone, and respective positions of the second maximum and minimum values;

comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of the word combinations of the input speech; and

providing a representation of the at least one of corresponding words and word combinations which best correspond with the input speech.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method and arrangement for speech to text conversion. A string of phonemes is identified from a given input speech. The different phonemes are identified and are joined together to form words and phrases/sentences. The words are checked lexically, any words which are not found in the language concerned being excluded. The phrases/sentences are checked syntactically, any word combinations which do not occur in the language concerned being excluded. A model of the speech is obtained by the process. The intonation patterns of the model and of the input speech are determined, and compared, the words and phrases/sentences of the model, whose intonation patterns do not correspond with those of the input speech, are excluded from the model. A representation of the words, and/or word combinations, which best corresponds with the input speech is then provided, preferably in the providing in the form of a print out of the related text.

Citations

22 Claims

1. A method for speech to text conversion including the steps of:
- identifying phonemes from a segment of input speech to be converted into text;
  
  interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech;
  
  determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values;
  
  determining a second fundamental tone of the input speech;
  
  determining a second intonation pattern of the second fundamental tone of the input speech including second maximum and minimum values of the second fundamental tone, and respective positions of the second maximum and minimum values;
  
  comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of the word combinations of the input speech; and
  
  providing a representation of the at least one of corresponding words and word combinations which best correspond with the input speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 19, 20, 21, 22)
- - 2. A method as claimed in claim 1, wherein the representation of the word combinations, from which the speech model is formed, is in print out form.
  - 3. A method as claimed in claim 1, wherein the step of identifying phonemes comprises the step of combining the phonemes into allophone strings, andwherein the step of interpreting the phonemes comprises the step of establishing the speech model using the allophone strings which include at least one of different sounds, sound combinations, and unpronounced parts of the word combinations.
  - 4. A method as claimed in claim 1, wherein the step of identifying phonemes comprises the step of combining the phonemes into allophone strings, andwherein the step of interpreting the phonemes comprises the steps of:
    - establishing the speech model from the allophone strings,checking the words in the speech model lexically,checking phrases in the speech model syntactically, andexcluding words and phrases which are not linguistically possible from the speech model.
  - 5. A method as claimed in claim 4, wherein the step of interpreting further comprises the step of checking spelling and transcription of the word combinations in the speech model.
  - 6. A method as claimed in claim 1, further comprising the steps of:
    - distinguishing the meaning of words which sound alike but have different stresses, andidentifying phrases whose meanings change in dependence upon sentence stress.
  - 7. A method as claimed in claim 1, wherein the step of identifying phonemes comprises the step of combining the phonemes into allophone strings, andwherein the step of interpreting the phonemes comprises the steps of:
    - establishing the speech model from the allophone strings,checking the words in the speech model lexically,checking spelling and transcription of the words in the speech model,checking phrases in the speech model syntactically,excluding words and phrases which are not linguistically possible from the speech model,distinguishing meanings of words which sound alike but have different stresses, andidentifying phrases whose meanings change in dependence upon the sentence stress.
  - 8. A method as claimed in claim 1, wherein the step of identifying phonemes comprises the steps of:
    - identifying phonemes occurring in different languages without training, andexcluding phonemes which do not exist in a particular language.
  - 9. A method as claimed in claim 1, wherein the step of identifying the phonemes comprises identifying the phonemes from the input speech using a Hidden Markov model.
  - 19. A system responsive to spoken words including an arrangement as claimed in claim 10, or operating in accordance with the method as claimed in claim 1.
  - 20. A system as claimed in claim 19, wherein the system includes a voice-responsive word processing unit for the production of textual information from spoken words.
  - 21. A system as claimed in claim 19, further comprising a voice-responsive telex apparatus.
  - 22. A system as claimed in claim 19, further comprising means for transmitting words via a telecommunications device.

10. An arrangement for speech to text conversion comprising:
- speech recognition means for identifying phonemes from a segment of input speech to be converted into text;
  
  word-interpretation means for interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech;
  
  first analysing means for determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values;
  
  extraction means for extracting a second fundamental tone, and respective positions of second maximum and minimum values;
  
  comparison means for comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of word combinations of the input speech.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. An arrangement as claimed in claim 10, wherein the text selection means comprises a printer for providing a print out of the word combinations which best correspond with the input speech.
  - 12. An arrangement as claimed in claim 10, wherein the speech recognition means comprises means for combining the phonemes into allophone strings, andwherein the word-interpretation means comprises means for establishing the speech model using at least one of different sounds, sound combinations, and unpronounced parts of the word combinations.
  - 13. An arrangement as claimed in claim 10, further comprising:
    - checking means for lexically checking words in the speech model, for syntactically checking words in the speech model and for syntactically checking phrases in the speech model, andexcluding means for excluding from the speech model words and phrases which are not linguistically possible.
  - 14. An arrangement as claimed in claim 13, wherein the checking means further comprises for checking spelling and transcription of words in the speech model.
  - 15. An arrangement as claimed in claim 10, wherein the comparison means comprises:
    - means adapted to distinguish meanings of words which sound alike but have different stresses, andmeans for distinguishing phrases, whose meanings change in dependence upon sentence stress.
  - 16. An arrangement as claimed in claim 10, further comprising checking means for lexically checking words in the speech model by checking spelling and transcription of the words in the speech model, and for syntactically checking phrases in the speech model, means for excluding from the speech model words and phrases which are not linguistically possible, andwherein the comparison means comprises:
    - means to distinguish meanings of words which sound alike but have different stresses, andmeans to distinguish phrases whose meanings change in dependence upon sentence stress.
  - 17. An arrangement as claimed in claim 10, wherein the speech recognition means comprises means adapted to identify phonemes occurring in different languages, without training, to exclude phonemes which do not exist in a particular language.
  - 18. An arrangement as claimed in claim 10, wherein the speech recognition means comprises means for identifying the phonemes from the input speech using a Hidden Markov model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intellectual Ventures I LLC (Intellectual Ventures LLC)
Original Assignee
Telia AB (Government of Norway)
Inventors
Lyberg, Bertil
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/432,062
Time in Patent Office

1,107 Days
Field of Search

395/2.44, 395/2.59, 395/2.64-2.66, 364/419.03
US Class Current

704/235
CPC Class Codes

G10L 15/1807   using prosody or stress

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

G10L 2015/025   Phonemes, fenemes or fenone...

Method and arrangement for speech to text conversion

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and arrangement for speech to text conversion

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links