Speech recognition device with reference transformation means

US 7,146,317 B2
Filed: 02/22/2001
Issued: 12/05/2006
Est. Priority Date: 02/25/2000
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition device (8) to which can be applied via a first receive channel (21) and a second receive channel (25, 28) speech information (SI) colored by the respective receive channel (21, 25, 28), wherein the device used on the first channel is different that the device used on the second channel, the speech recognition device comprising:

reference storage means (36) for storing reference information (RI1) featuring the type of pronunciation of words by a plurality of reference speakers andreceive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and

user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8;

) andspeech recognition means (29) for recognizing text information (TI) to be assigned to the supplied speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and the user adaptation means (37) is evaluated, characterized in that the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21)into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3), while the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37).

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition device (8), to which can be applied over a first receive channel (21) and a second receive channel (25, 28) speech information (SI) that is colored by the respective receive channel (21, 25, 28), comprises reference storage means (36) for storing reference information (RI1) that features the type of pronunciation of words by a plurality of reference speakers and receive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to a first or second receive channel (21, 25, 28) used by a user and user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8) and speech recognition means (29) for recognizing text information (TI) to be assigned to the fed speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and by the user adaptation means (37) is evaluated, where now the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21) into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3) and where the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37).

Citations

10 Claims

1. A speech recognition device (8) to which can be applied via a first receive channel (21) and a second receive channel (25, 28) speech information (SI) colored by the respective receive channel (21, 25, 28), wherein the device used on the first channel is different that the device used on the second channel, the speech recognition device comprising:
- reference storage means (36) for storing reference information (RI1) featuring the type of pronunciation of words by a plurality of reference speakers andreceive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
  
  user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8;
  
  ) andspeech recognition means (29) for recognizing text information (TI) to be assigned to the supplied speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and the user adaptation means (37) is evaluated, characterized in that the receive channel adaptation means (30, 38, 44) include reference transformation means (T1-2, T1-3, T2-3) which are arranged for transforming first reference information (RI1, ARI1) adapted to the first receive channel (21)into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28) in accordance with a transformation matrix (T1-2, T1-3, T2-3), while the adapted first reference information (RI1, ARI1) to be transformed by the reference transformation means (T1-2, T1-3, T2-3) may, but need not, already have been adapted to the user by the user adaptation means (37).
- View Dependent Claims (2)
- - 2. A speech recognition device (8) as claimed in claim 1, characterized in that channel detection means (30) are provided which are arranged for detecting the first receive channel (21) or second receive channel (25, 28) selected by the user for entering speech information (SI), and in that selection means (44) are provided which are arranged for selecting first reference information (ARI1) or second reference information (ARI2, ARI3) adapted to the selected first receive channel (21) or second receive channel (25, 28) for evaluation by the speech recognition means (29).

3. A speech recognition method (8) of recognizing text information (TI) to be assigned to speech information (SI), where the speech information (SI) is colored by a first receive channel (21) or a second receive channel (25, 28) and the speech recognition method (8) includes the following steps:
- adapting (30, 38, 44) reference information (RI1, RI2, RI3) that features the type of pronunciation of words by a plurality of reference speakers to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
  
  adapting (37) the reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition method; and
  
  recognizing the text information (TI) to be assigned to the speech information (SI), while the reference information (ARI1, ARI2, ARI3) adapted to the first receive channel (21) or the second receive channel (25, 28) and to the user is evaluated, characterized in that first reference information (RI1, ARI1) adapted to the first receive channel (21) is transformed into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28), while the adapted first reference information (RI1, ARI1) to be transformed may, but need not, already have been adapted to the user.
- View Dependent Claims (4)
- - 4. A speech recognition method (8) as claimed in claim 3, characterized in that there is detected (30) which of the receive channels (21, 25, 28) was selected by the user for entering the speech information (SI) and in that the first reference information (ARI1) or second reference information (ARI2, ARI3) adapted to the selected receive channel (21, 25, 28) is used for the evaluation by the speech recognition means (29).

5. A reference determining method (1) of determining first reference information (RI1) adapted to a first receive channel (2) for a speech recognition method (8), while the reference determining method (1) includes the following steps:
- analyzing (14) speech information (SI) received from a plurality of first reference speakers over the first receive channel (2), each reference speaker using substantially similar input device andproducing the first reference information (RI1) adapted to the first receive channel (2), characterized in that speech information (SI) received from a plurality of second reference speakers using substantially similar input devices over a second receive channel (4, 6) is analyzed, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
  
  a transformation matrix (T1-2, T1-3) is determined for transforming the first reference information (RI1) into second reference information (RI2, RI3) adapted to the second receive channel (4, 6), wherein said devices used by the plurality of second reference speakers are different than those used by the plurality of first reference speakers.
- View Dependent Claims (6)
- - 6. A reference determining method (1) as claimed in claim 5, characterized in that for determining the first reference information (RI1) and the transformation matrix (T1-2, T1-3, T2-3) the first receive channel (2) and the second receive channel (4, 6) are formed by a plurality of terminal units (3, 5, 7), which are typical of the first receive channel (2) and the second receive channel (4, 6).

7. A computer program product (1, 8) which can be directly loaded into the internal memory of a digital computer and includes software code sections suitable for execution by the computer for recognizing text information (121) to be assigned to speech information (SI), where the speech information (SI) is colored by the input devices used on a first receive channel (21) or a second receive channel (25, 28), wherein the input device used on the first channel is different than the input device used on the second channel by the following steps:
- adapting (30, 38, 44) reference information (RI1, RI2, RI3) that features the type of pronunciation of words by a plurality of reference speakers to the first or second receive channel (21, 25, 28) used by a user, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
  
  adapting (37) the reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition method; and
  
  recognizing the text information (TI) to be assigned to the speech information (SI), while the reference information (ARI1, ARI2, ARI3) adapted to the first receive channel (21) or the second receive channel (25, 28) and to the user is evaluated, characterized in that first reference information (RI1, ARI1) adapted to the first receive channel (21) is transformed into second reference information (RI2, RI3, ARI2, ARI3) adapted to the second receive channel (25, 28), while the adapted first reference information (RI1, ARI1) to be transformed may, but need not already have been adapted to the user.
- View Dependent Claims (8)
- - 8. A computer program product as claimed in claim 7, characterized in that it is stored on a medium that can be read by a computer.

9. A computer program product (1, 8) which can be directly loaded into the internal memory of a digital computer and includes software code sections suitable for execution by the computer for:
- analyzing (14) speech information (SI) received from a plurality of first reference speakers over the first receive channel (2), each reference speaker using substantially similar input device andproducing the first reference information (RI1) adapted to the first receive channel (2), characterized in that speech information (SI) received from a plurality of second reference speakers using substantially similar input devices over a second receive channel (4, 6) is analyzed, wherein the first reference information (RI1, ARI1) and the second reference information (RI2, RI3, ARI2, ARI3) is formed by feature vectors (FV), while each feature vector (FV) features the speech information (SI) in a respective frequency sub-range, and in that the feature vectors (FV) of the first reference information (RI1, ARI1) feature the speech information (SI) in different frequency sub-ranges from the feature vectors (FV) of the second reference information (RI1, RI3, ARI2, ARI3); and
  
  a transformation matrix (TI-2, TI-3) is determined for transforming the first reference information (RI1) into second reference information (RI2, RI3) adapted to the second receive channel (4, 6), wherein said devices used by the plurality of second reference speakers are different than those used by the plurality of first reference speakers.
- View Dependent Claims (10)
- - 10. A computer program product as claimed in claim 9, characterized in that it is stored on a medium that can be read by a computer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications Austria Gmbh (Microsoft Corporation)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Bartosik, Heinrich Franz
Primary Examiner(s)
Azad, Abul K.

Application Number

US09/790,420
Publication Number

US 20010025240A1
Time in Patent Office

2,112 Days
Field of Search

704/233, 704/234, 704/203, 704/204, 704/224
US Class Current

704/234
CPC Class Codes

G10L 15/065 Adaptation

G10L 21/0216 characterised by the method...

Speech recognition device with reference transformation means

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition device with reference transformation means

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links