Method and apparatus for speech recognition
First Claim
1. A method for speech recognition comprising:
- a feature-amount extracting step for extracting a feature amount based on a processing frame of an input utterance;
a storing step for determining whether a current processing frame is within or at the end of at least one candidate word within a hypothesis, the at least one candidate word developed from the hypothesis, and storing one or more candidate words on the basis of a first hypothesis-storage determining criterion when within the word and on the basis of a second hypothesis-storage determining criterion when at the word end;
a developing step for developing the hypothesis, each candidate word within the hypothesis selected from words previously registered, by extending utterance segments to at least one processing frame following the current processing frame to express the candidate word when the candidate word is within the word and by joining a new candidate word to follow according to an inter-word connection rule when at the word end;
an operating step of computing a similarity measure between the feature amount extracted from the input utterance and a frame-based feature amount of an acoustic model of the developed hypothesis for the current processing frame, and calculating a new recognition score from a) the similarity measure and b) a recognition score of the developed hypothesis of up to a frame immediately preceding the current processing frame calculated from the similarity measure; and
a step of repeating the storing step, the developing step and the operating step until a last processing frame of the input utterance, and outputting the stored one or more candidate words for each processing frame as a recognition result approximate to the input utterance, in a decreasing order of recognition score,wherein the first hypothesis-storage determining criterion selects candidate words from the developed hypothesis within a predetermined threshold from a maximum value of the recognition score,wherein a number of candidate words stored according to the first hypothesis-storage determining criterion when within the word is independent of the second hypothesis-storage determining criterion, andthe second hypothesis-storage determining criterion selects a subset of candidate words from among the candidate words selected according to the first hypothesis-storage determining criterion, the subset of candidate words selected according to a predetermined number of upper ranking recognition scores.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for speech recognition of the present application has a process to collate, with an input utterance, an acoustic model corresponding to a hypothesis to be expressed by the connection of utterance segments, such as phonemes or syllables, and developed according to a length of an input utterance by an inter-word connection rule thereby obtaining a recognition score. Within a word of the hypothesis, the similar hypotheses high in utterance score within a predetermined threshold from the maximum value of the score are all held to a word end irrespectively of the number of hypotheses. Meanwhile, at a word end of the hypotheses, the hypotheses are narrowed to a predetermined number of upper ranking in the order of higher score.
-
Citations
4 Claims
-
1. A method for speech recognition comprising:
-
a feature-amount extracting step for extracting a feature amount based on a processing frame of an input utterance; a storing step for determining whether a current processing frame is within or at the end of at least one candidate word within a hypothesis, the at least one candidate word developed from the hypothesis, and storing one or more candidate words on the basis of a first hypothesis-storage determining criterion when within the word and on the basis of a second hypothesis-storage determining criterion when at the word end; a developing step for developing the hypothesis, each candidate word within the hypothesis selected from words previously registered, by extending utterance segments to at least one processing frame following the current processing frame to express the candidate word when the candidate word is within the word and by joining a new candidate word to follow according to an inter-word connection rule when at the word end; an operating step of computing a similarity measure between the feature amount extracted from the input utterance and a frame-based feature amount of an acoustic model of the developed hypothesis for the current processing frame, and calculating a new recognition score from a) the similarity measure and b) a recognition score of the developed hypothesis of up to a frame immediately preceding the current processing frame calculated from the similarity measure; and a step of repeating the storing step, the developing step and the operating step until a last processing frame of the input utterance, and outputting the stored one or more candidate words for each processing frame as a recognition result approximate to the input utterance, in a decreasing order of recognition score, wherein the first hypothesis-storage determining criterion selects candidate words from the developed hypothesis within a predetermined threshold from a maximum value of the recognition score, wherein a number of candidate words stored according to the first hypothesis-storage determining criterion when within the word is independent of the second hypothesis-storage determining criterion, and the second hypothesis-storage determining criterion selects a subset of candidate words from among the candidate words selected according to the first hypothesis-storage determining criterion, the subset of candidate words selected according to a predetermined number of upper ranking recognition scores.
-
-
2. An apparatus for speech recognition comprising:
-
a feature-amount extracting section for extracting a feature amount based on a processing frame of an input utterance; a search control section for controlling to develop a hypothesis, the hypothesis being at least one hypothetic candidate word, the hypothetic candidate word selected from candidate words previously registered, by extending utterance segments to at least one processing frame following a current processing frame to express the hypothetic candidate word when the hypothesis is within the word and by joining a new hypothetic candidate word to follow according to an inter-word connection rule previously determined when at the word end; a similarity computing section for computing a similarity measure between the feature amount extracted from the input utterance and a frame feature amount of an acoustic model of the developed hypothesis for the current processing frame; a search operating section for operating a recognition score from the similarity measure and recognition score of the developed hypothesis of up to a processing frame immediately preceding the current processing frame; a hypothesis determining section for determining whether the current processing frame is within the word or at the word end of the at least one hypothetic candidate word of the developed hypothesis and using the recognition score to select from among the at least one hypothetic candidate word according to a first determining criterion when within the word and to select from among the at least one hypothetic candidate word according to a second determining criterion when at the word end to form a selected hypothesis; a hypothesis storing device for storing the selected hypothesis; a word hypothesis registering device for registering as a new hypothesis the stored hypothesis and the recognition score; and a recognition result output section for continuing a frame-based processing of the input utterance to a last processing frame of the input utterance and outputting at least one stored hypothesis in a decreasing order of recognition score, wherein the first determining criterion selects from among the at least one hypothetic candidate words within a predetermined threshold from a maximum value of the recognition score, wherein a number of candidate words stored according to the first determining criterion when within the word is independent of the second determining criterion, and the second determining criterion selects a subset of hypothetic candidate words from among the at least one hypothetic candidate words selected according to the first determining criterion, the subset of candidate words selected according to a predetermined number of upper ranking recognition scores.
-
-
3. A computer program stored in a computer-readable medium for causing a computer to execute a method, said method comprising:
-
a feature-amount extracting step for extracting a feature amount based on a processing frame of an input utterance; a storing step for determining whether a current processing frame is within or at the end of at least one candidate word within a hypothesis, the at least one candidate word developed from the hypothesis, an storing one or more candidate words on the basis of a first hypothesis-storage determining criterion when within the word and on the basis of a second hypothesis-storage determining criterion when at the word end; a developing step for developing the hypothesis, each candidate word within the hypothesis selected from words previously registered, by extending utterance segments to at least one processing frame following the current processing frame to express the candidate word when the candidate word is within the word and by joining a new candidate word to follow according to an inter-word connection rule when at the word end; an operating step of computing a similarity measure between the feature amount extracted from the input utterance and a frame-based feature amount of an acoustic model of the developed hypothesis for the current processing frame, and calculating a new recognition score from a) the similarity measure and b) a recognition score of the developed hypothesis of up to a processing frame immediately preceding the current processing frame calculated from the similarity measure; and a step of repeating the storing step, the developing step and the operating step until a last processing frame of the input utterance, and outputting the stored one or more candidate words for each processing frame as a recognition result approximate to the input utterance in a decreasing order of recognition score, wherein the first hypothesis-storage determining criterion selects candidate words from the developed hypothesis within a predetermined threshold from a maximum value of the recognition score, wherein a number of candidate words stored according to the first hypothesis-storage determining criterion when within the word is independent of the second hypothesis-storage determining criterion, and the second hypothesis-storage determining criterion selects a subset of candidate words from among the candidate words selected according to the first hypothesis-storage determining criterion, the subset of candidate words selected according to a predetermined number of upper ranking recognition scores.
-
-
4. A computer-readable recording medium recording a computer program to allow a computer to execute a method, said method comprising:
-
a feature-amount extracting step for extracting a feature amount based on a processing frame of an input utterance; a storing step for determining whether a current processing frame is within or at the end of at least one candidate word within a hypothesis, the at least one candidate word developed from the hypothesis, and storing one or more candidate words on the basis of a first hypothesis-storage determining criterion when within the word and on the basis of a second hypothesis-storage determining criterion when at the word end; a developing step for developing the hypothesis, each candidate word within the hypothesis selected from words previously registered, by extending utterance segments to at least one processing frame following the current processing frame to express the candidate word when the candidate word is within the word and by joining a new candidate word to follow according to an inter-word connection rule when at the word end; an operating step of computing a similarity measure between the feature amount extracted from the input utterance and a frame-based feature amount of an acoustic model of the developed hypothesis for the current processing frame, and calculating a new recognition score from a) the similarity measure and b) a recognition score of the developed hypothesis of up to a processing frame immediately preceding the current processing frame calculated from the similarity measure; and a step of repeating the storing step, the developing step and the operating step until a last processing frame of the input utterance, and outputting the stored one or more candidate words for each processing frame as a recognition result approximate to the input utterance in a decreasing order of recognition score, wherein the first hypothesis-storage determining criterion selects candidate words from the developed hypothesis within a predetermined threshold from a maximum value of the recognition score, wherein a number of candidate words stored according to the first hypothesis-storage determining criterion when within the word is independent of the second hypothesis-storage determining criterion, and the second hypothesis-storage determining criterion selects a subset of candidate words from among the candidate words selected according to the first hypothesis-storage determining criterion, the subset of candidate words selected according to a predetermined number of upper ranking recognition scores.
-
Specification