Recognition confidence measuring by lexical distance between candidates
First Claim
1. A recognition confidence measurement method comprising:
- detecting a feature vector of an input speech signal through normalization of a histogram and extracting a phoneme string from the detected feature vector of the input speech signal;
obtaining information about a distance between a phoneme string of the input speech signal and a phoneme string of a predetermined vocabulary, by a predetermined phoneme confusion matrix, wherein the phoneme confusion matrix sets the distance to be decreased in proportion to the increase of the matching degree between phonemes;
extracting as candidates, the phoneme string of the vocabulary that has a higher similarity which denotes a comparatively shorter distance from the speech signal, from a predetermined dictionary;
estimating a lexical distance between the extracted candidates, the estimating of the lexical distance comprising selecting a pair of candidates from the extracted candidates and performing a dynamic matching of the selected pair of candidates; and
determining whether the input speech signal is an in-vocabulary, based on the lexical distance, wherein the dynamic matching comprises;
for each phoneme in a first candidate of the selected pair of candidates, determining as a first matching pair the phoneme in the first candidate and the corresponding phoneme in a second candidate of the selected pair of candidates and determining as a second matching pair the phoneme in the first candidate and a phoneme having an identical shape to the phoneme in the first candidate;
wherein the estimating of the lexical distance further comprises;
calculating a score for the pair of candidates; and
estimating the lexical distance using the calculated score;
wherein the calculating of the score calculates the score using the phoneme confusion matrix.
1 Assignment
0 Petitions
Accused Products
Abstract
A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
-
Citations
17 Claims
-
1. A recognition confidence measurement method comprising:
-
detecting a feature vector of an input speech signal through normalization of a histogram and extracting a phoneme string from the detected feature vector of the input speech signal; obtaining information about a distance between a phoneme string of the input speech signal and a phoneme string of a predetermined vocabulary, by a predetermined phoneme confusion matrix, wherein the phoneme confusion matrix sets the distance to be decreased in proportion to the increase of the matching degree between phonemes; extracting as candidates, the phoneme string of the vocabulary that has a higher similarity which denotes a comparatively shorter distance from the speech signal, from a predetermined dictionary; estimating a lexical distance between the extracted candidates, the estimating of the lexical distance comprising selecting a pair of candidates from the extracted candidates and performing a dynamic matching of the selected pair of candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance, wherein the dynamic matching comprises; for each phoneme in a first candidate of the selected pair of candidates, determining as a first matching pair the phoneme in the first candidate and the corresponding phoneme in a second candidate of the selected pair of candidates and determining as a second matching pair the phoneme in the first candidate and a phoneme having an identical shape to the phoneme in the first candidate; wherein the estimating of the lexical distance further comprises; calculating a score for the pair of candidates; and estimating the lexical distance using the calculated score; wherein the calculating of the score calculates the score using the phoneme confusion matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable storage medium storing a program for implementing a recognition confidence measurement method comprising:
-
detecting a feature vector of an input speech signal through normalization of a histogram and extracting a phoneme string from the feature vector of the input speech signal; obtaining information about a distance between a phoneme string of the input speech signal and a phoneme string of a predetermined vocabulary, by a predetermined phoneme confusion matrix which sets the distance to be decreased in proportion to the increase of a matching degree between phonemes; extracting as candidates, the phoneme string of the vocabulary that has a higher similarity and that has a comparatively shorter distance from the speech signal, from a predetermined dictionary; estimating a lexical distance between the extracted candidates, the estimating of the lexical distance comprising performing a dynamic matching of a pair of candidates selected from the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance, wherein the dynamic matching comprises; for each phoneme in a first candidate of the selected pair of candidates, determining as a first matching pair the phoneme in the first candidate and the corresponding phoneme in a second candidate of the selected pair of candidates and determining as a second matching pair the phoneme in the first candidate and a phoneme having an identical shape to the phoneme in the first candidate; wherein the estimating of the lexical distance further comprises; calculating a score for the pair of candidates; and estimating the lexical distance using the calculated score; wherein the calculating of the score calculates the score using the phoneme confusion matrix.
-
-
11. A recognition confidence measurement system comprising:
-
at least one processor to control one or more of the following units; a phoneme string extraction unit detecting a feature vector of an input speech signal through normalization of a histogram and extracting a phoneme string from the feature vector of the input speech signal; a candidate extraction unit obtaining information about a distance between a phoneme string of the input speech signal and a phoneme string of a predetermined vocabulary, by a predetermined phoneme confusion matrix which sets the distance to be decreased in proportion to the increase of a matching degree between phonemes, extracting as candidates the phoneme string of the vocabulary that has a higher similarity and that has a comparatively shorter distance from the speech signal, from a predetermined dictionary; a distance estimation unit estimating a lexical distance between the extracted candidates, the estimating of the lexical distance comprising performing a dynamic matching of a pair of candidates selected from the extracted candidates; and a registration determination unit determining whether the input speech signal is an in-vocabulary, based on the lexical distance, wherein the dynamic matching comprises; for each phoneme in a first candidate of the selected pair of candidates, determining as a first matching pair the phoneme in the first candidate and the corresponding phoneme in a second candidate of the selected pair of candidates and determining as a second matching pair the phoneme in the first candidate and a phoneme having an identical shape to the phoneme in the first candidate; wherein the estimating of the lexical distance further comprises; calculating a score for the pair of candidates; and estimating the lexical distance using the calculated score; wherein the calculating of the score calculates the score using the phoneme confusion matrix. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A recognition confidence measurement method comprising:
-
extracting candidates by matching a phoneme string of a speech signal, extracted by a feature vector which is detected from the speech signal through normalization of a histogram, and phoneme strings of vocabularies registered in a predetermined dictionary; obtaining information about a distance between a phoneme string of the input speech signal and a phoneme string of a predetermined vocabulary, by a predetermined phoneme confusion matrix which sets the distance to be decreased in proportion to the increase of a matching degree between phonemes; extracting as candidates, the phoneme string of the vocabulary that has a higher similarity and that has a comparatively shorter distance from the speech signal, from a predetermined dictionary; estimating a lexical distance between the extracted candidates, the estimating of the lexical distance comprising performing a dynamic matching of a pair of candidates selected from the extracted candidates; and determining whether the speech signal is an in-vocabulary, based on the lexical distance, wherein the dynamic matching comprises; for each phoneme in a first candidate of the selected pair of candidates, determining as a first matching pair the phoneme in the first candidate and the corresponding phoneme in a second candidate of the selected pair of candidates and determining as a second matching pair the phoneme in the first candidate and a phoneme having an identical shape to the phoneme in the first candidate; wherein the estimating of the lexical distance further comprises; calculating a score for the pair of candidates; and estimating the lexical distance using the calculated score; wherein the calculating of the score calculates the score using the phoneme confusion matrix. - View Dependent Claims (17)
-
Specification