SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR
First Claim
1. A speech recognition system comprising:
- a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data;
a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates;
a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence;
a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section;
a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and
an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary.
1 Assignment
0 Petitions
Accused Products
Abstract
An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word. An additional registration section 17 combines the corrected word with the pronunciation determined by a pronunciation determining section 16 and additionally registers the combination as new word pronunciation data in the speech recognition dictionary 5 if it is determined that a word obtained after correction has not been registered in the speech recognition dictionary 5. The additional registration section 17 additionally registers the pronunciation determined by the pronunciation determining section 16 as another pronunciation of the corrected word if it is determined that the corrected word has been registered.
-
Citations
13 Claims
-
1. A speech recognition system comprising:
-
a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data; a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates; a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence; a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section; a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary. - View Dependent Claims (2, 3, 4, 5, 10, 11)
-
-
6. A program for speech recognition system that is stored in a computer-readable recording medium, the program causing a computer to function as:
-
a speech recognition section that converts speech data into text data by using a speech recognition dictionary containing a large volume of word pronunciation data each constituted by a combination of a word and one or more corresponding pronunciations, each pronunciation including one or more phonemes, and that has a function of adding to the text data a start time and a finish time of a word segment in the speech data corresponding to each word included in text data; a word correcting section that presents competitive candidates for each word in the text data acquired from the speech recognition section, allows each word to be corrected by selecting a correct word from among the competitive candidates for correction if the correct word is included in the competitive candidates, or by manually inputting a correct word if no correct word is included in the competitive candidates; a phoneme sequence converting section that recognizes the speech data in units of phoneme, converts the recognized speech data into a phoneme sequence composed of a plurality of phonemes, and that has a function of adding to the phoneme sequence a start time and a finish time of each phoneme unit in the speech data corresponding to each phoneme included in the phoneme sequence; a phoneme sequence extracting section that extracts from the phoneme sequence a phoneme sequence portion composed of one or more phonemes existing in a segment corresponding to a period of the start time and finish time of the word segment of a word corrected by the word correcting section; a pronunciation determining section that determines the phoneme sequence portion as the pronunciation of the word corrected by the word correcting section; and an additional registration section that combines the corrected word with the pronunciation determined by the pronunciation determining section as new word pronunciation data and additionally registers the new word pronunciation data in the speech recognition dictionary if it is determined that the corrected word has not been registered in the speech recognition dictionary, or additionally registers the pronunciation determined by the pronunciation determining section in the speech recognition dictionary as another pronunciation of the corrected word if it is determined that the corrected word is a registered word that has already been registered in the speech recognition dictionary. - View Dependent Claims (7, 8, 9, 12, 13)
-
Specification