Method and arrangement for speech to text conversion
First Claim
1. A method for speech to text conversion including the steps of:
- identifying phonemes from a segment of input speech to be converted into text;
interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech;
determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values;
determining a second fundamental tone of the input speech;
determining a second intonation pattern of the second fundamental tone of the input speech including second maximum and minimum values of the second fundamental tone, and respective positions of the second maximum and minimum values;
comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of the word combinations of the input speech; and
providing a representation of the at least one of corresponding words and word combinations which best correspond with the input speech.
5 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method and arrangement for speech to text conversion. A string of phonemes is identified from a given input speech. The different phonemes are identified and are joined together to form words and phrases/sentences. The words are checked lexically, any words which are not found in the language concerned being excluded. The phrases/sentences are checked syntactically, any word combinations which do not occur in the language concerned being excluded. A model of the speech is obtained by the process. The intonation patterns of the model and of the input speech are determined, and compared, the words and phrases/sentences of the model, whose intonation patterns do not correspond with those of the input speech, are excluded from the model. A representation of the words, and/or word combinations, which best corresponds with the input speech is then provided, preferably in the providing in the form of a print out of the related text.
-
Citations
22 Claims
-
1. A method for speech to text conversion including the steps of:
-
identifying phonemes from a segment of input speech to be converted into text; interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech; determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values; determining a second fundamental tone of the input speech; determining a second intonation pattern of the second fundamental tone of the input speech including second maximum and minimum values of the second fundamental tone, and respective positions of the second maximum and minimum values; comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of the word combinations of the input speech; and providing a representation of the at least one of corresponding words and word combinations which best correspond with the input speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 19, 20, 21, 22)
-
-
10. An arrangement for speech to text conversion comprising:
-
speech recognition means for identifying phonemes from a segment of input speech to be converted into text; word-interpretation means for interpreting the phonemes as possible word combinations to establish a speech model of the segment of input speech; first analysing means for determining a first intonation pattern of a first fundamental tone of the speech model including first maximum and minimum values of the first fundamental tone, and respective positions of the first maximum and minimum values; extraction means for extracting a second fundamental tone, and respective positions of second maximum and minimum values; comparison means for comparing the second and first intonation patterns of the input speech and the speech model, respectively, to identify the word combinations in the speech model having intonation patterns which best correspond with the second intonation pattern of word combinations of the input speech. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification