Speech recognition device with reference transformation means
First Claim
1. A speech recognition device (8) to which can be applied via a first receive channel (21) and a second receive channel (25, 28) speech information (SI) colored by the respective receive channel (21, 25, 28), wherein the device used on the first channel is different that the device used on the second channel, the speech recognition device comprising:
- reference storage means (36) for storing reference information (RI1) featuring the type of pronunciation of words by a plurality of reference speakers andreceive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8;
) andspeech recognition means (29) for recognizing text information (TI) to be assigned to the supplied speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and the user adaptation means (37) is evaluated, characterized in that the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21)into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3), while the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37).
3 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition device (8), to which can be applied over a first receive channel (21) and a second receive channel (25, 28) speech information (SI) that is colored by the respective receive channel (21, 25, 28), comprises reference storage means (36) for storing reference information (RI1) that features the type of pronunciation of words by a plurality of reference speakers and receive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to a first or second receive channel (21, 25, 28) used by a user and user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8) and speech recognition means (29) for recognizing text information (TI) to be assigned to the fed speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and by the user adaptation means (37) is evaluated, where now the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21) into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3) and where the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37).
-
Citations
10 Claims
-
1. A speech recognition device (8) to which can be applied via a first receive channel (21) and a second receive channel (25, 28) speech information (SI) colored by the respective receive channel (21, 25, 28), wherein the device used on the first channel is different that the device used on the second channel, the speech recognition device comprising:
-
reference storage means (36) for storing reference information (RI1) featuring the type of pronunciation of words by a plurality of reference speakers and receive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8;
) andspeech recognition means (29) for recognizing text information (TI) to be assigned to the supplied speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and the user adaptation means (37) is evaluated, characterized in that the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21)into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3), while the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37). - View Dependent Claims (2)
-
-
3. A speech recognition method (8) of recognizing text information (TI) to be assigned to speech information (SI), where the speech information (SI) is colored by a first receive channel (21) or a second receive channel (25, 28) and the speech recognition method (8) includes the following steps:
-
adapting (30, 38, 44) reference information (RI1, RI2, RI3) that features the type of pronunciation of words by a plurality of reference speakers to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and adapting (37) the reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition method; and recognizing the text information (TI) to be assigned to the speech information (SI), while the reference information (ARI1, ARI2, ARI3) adapted to the first receive channel (21) or the second receive channel (25, 28) and to the user is evaluated, characterized in that first reference information (RI1, ARI1) adapted to the first receive channel (21) is transformed into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28), while the adapted first reference information (RI1, ARI1) to be transformed may, but need not, already have been adapted to the user. - View Dependent Claims (4)
-
-
5. A reference determining method (1) of determining first reference information (RI1) adapted to a first receive channel (2) for a speech recognition method (8), while the reference determining method (1) includes the following steps:
-
analyzing (14) speech information (SI) received from a plurality of first reference speakers over the first receive channel (2), each reference speaker using substantially similar input device and producing the first reference information (RI1) adapted to the first receive channel (2), characterized in that speech information (SI) received from a plurality of second reference speakers using substantially similar input devices over a second receive channel (4, 6) is analyzed, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
a transformation matrix (T1-2, T1-3) is determined for transforming the first reference information (RI1) into second reference information (RI2, RI3) adapted to the second receive channel (4, 6), wherein said devices used by the plurality of second reference speakers are different than those used by the plurality of first reference speakers. - View Dependent Claims (6)
-
-
7. A computer program product (1, 8) which can be directly loaded into the internal memory of a digital computer and includes software code sections suitable for execution by the computer for recognizing text information (121) to be assigned to speech information (SI), where the speech information (SI) is colored by the input devices used on a first receive channel (21) or a second receive channel (25, 28), wherein the input device used on the first channel is different than the input device used on the second channel by the following steps:
-
adapting (30, 38, 44) reference information (RI1, RI2, RI3) that features the type of pronunciation of words by a plurality of reference speakers to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and adapting (37) the reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition method; and recognizing the text information (TI) to be assigned to the speech information (SI), while the reference information (ARI1, ARI2, ARI3) adapted to the first receive channel (21) or the second receive channel (25, 28) and to the user is evaluated, characterized in that first reference information (RI1, ARI1) adapted to the first receive channel (21) is transformed into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28), while the adapted first reference information (RI1, ARI1) to be transformed may, but need not already have been adapted to the user. - View Dependent Claims (8)
-
-
9. A computer program product (1, 8) which can be directly loaded into the internal memory of a digital computer and includes software code sections suitable for execution by the computer for:
-
analyzing (14) speech information (SI) received from a plurality of first reference speakers over the first receive channel (2), each reference speaker using substantially similar input device and producing the first reference information (RI1) adapted to the first receive channel (2), characterized in that speech information (SI) received from a plurality of second reference speakers using substantially similar input devices over a second receive channel (4, 6) is analyzed, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
a transformation matrix (TI-2, TI-3) is determined for transforming the first reference information (RI1) into second reference information (RI2, RI3) adapted to the second receive channel (4, 6), wherein said devices used by the plurality of second reference speakers are different than those used by the plurality of first reference speakers. - View Dependent Claims (10)
-
Specification