Speech recognition with language-dependent model vectors

US 7,630,878 B2
Filed: 05/04/2004
Issued: 12/08/2009
Est. Priority Date: 07/28/2003
Status: Active Grant

First Claim

Patent Images

1. A method for speaker-dependent speech recognition, comprising:

capturing a speech signal, including a speech command, of a speaker;

breaking down the speech signal into time frames;

characterizing the speech signal in each captured time frame by forming a corresponding feature vector;

forming a language-independent feature vector sequence from at least one feature vector;

storing the language-independent feature vector sequence;

assigning the language-independent feature vector sequence to a language-dependent sequence of model vectors in a first language resource which includes a multiplicity of language-dependent model vectors;

storing first assignment information which specifies assignment of the language-independent feature vector sequence to the language-dependent sequence of model vectors;

recognizing the speech command which is assigned to the language-dependent sequence of model vectors;

selecting a second language resource different from the first language resource;

assigning the language-independent feature vector sequence previously stored to a language-dependent model vector sequence in the second language resource; and

storing second assignment information regarding said assigning of the language-independent feature vector sequence to the language-dependent model vector sequence in the second language resource.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speaker-dependent speech recognition is performed upon detecting a speech signal encompassing a voice command. The speech signal is divided into time frames and characterized in each detected time frame by forming a corresponding property vector. A language-independent feature vector sequence is formed from one or several property vectors and then stored. The language-independent feature vector sequence is allocated to a language-dependent sequence of model vectors in a speech resource having a plurality of model vectors. A piece of allocation information indicating allocation of the language-independent feature vector sequence to a language-dependent sequence of model vectors is stored, then the voice command allocated to the model vector sequence is identified.

30 Citations

View as Search Results

12 Claims

1. A method for speaker-dependent speech recognition, comprising:
- capturing a speech signal, including a speech command, of a speaker;
  
  breaking down the speech signal into time frames;
  
  characterizing the speech signal in each captured time frame by forming a corresponding feature vector;
  
  forming a language-independent feature vector sequence from at least one feature vector;
  
  storing the language-independent feature vector sequence;
  
  assigning the language-independent feature vector sequence to a language-dependent sequence of model vectors in a first language resource which includes a multiplicity of language-dependent model vectors;
  
  storing first assignment information which specifies assignment of the language-independent feature vector sequence to the language-dependent sequence of model vectors;
  
  recognizing the speech command which is assigned to the language-dependent sequence of model vectors;
  
  selecting a second language resource different from the first language resource;
  
  assigning the language-independent feature vector sequence previously stored to a language-dependent model vector sequence in the second language resource; and
  
  storing second assignment information regarding said assigning of the language-independent feature vector sequence to the language-dependent model vector sequence in the second language resource.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A method as claimed in claim 1, wherein the speech signal is made up of acoustic units.
  - 3. A method as claimed in claim 2, wherein each of the first and second language resources is based on a Hidden Markov Modeling of acoustic units of a speech signal.
  - 4. A method as claimed in claim 3, wherein an acoustic unit is formed by a word or a phoneme.
  - 5. A method as claimed in claim 3, wherein an acoustic unit is formed by word segments or groups of related phonemes.
  - 6. A method as claimed in claim 5, wherein different language resources are assigned to at least one of different languages and different language environments.
  - 7. A method as claimed in claim 6, wherein different language environments indicate different environmental noise situations.
  - 8. A method as claimed in claim 7, further comprising reducing dimensionality of the feature vector or the language-independent feature vector sequence by a matrix multiplication before assigning to the model vector or the model vector sequence.
  - 9. A method as claimed in claim 8, further comprising specifying the matrix for dimensional reduction from one of a Linear Discriminant Analysis, a Principal Component Analysis and an Independent Component Analysis.
  - 10. A method as claimed in claim 9, wherein the speaker-independent speech recognition is language-dependent.

11. A communication device, comprising:
- a microphone recording a speech signal, including a speech command, of a speaker;
  
  a processor processing the speech signal by breaking down the speech signal into time frames, characterizing the speech signal in each captured time frame by forming a corresponding feature vector and forming a language-independent feature vector sequence from at least one feature vector;
  
  a storage unit storing the language-independent feature vector sequence obtained from the speech signal; and
  
  a speech recognition entity, coupled to the microphone, configured for at least speaker-dependent speech recognition by assigning the language-independent feature vector sequence to a language-dependent sequence of model vectors in a first language resource which includes a multiplicity of language-dependent model vectors, storing first assignment information which specifies assignment of the language-independent feature vector sequence to the language-dependent sequence of model vectors, recognizing the speech command which is assigned to the language-dependent sequence of model vectors, selecting a second language resource different from the first language resource, assigning the language-independent feature vector sequence previously stored to a language-dependent model vector sequence in the second language resource, and storing second assignment information corresponding thereto.
- View Dependent Claims (12)
- - 12. A communication device as claimed in claim 11, wherein said speech recognition entity simultaneously uses speaker-dependent and speaker-independent vocabularies.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
SVOX AG (Microsoft Corporation)
Inventors
Stan, Sorel, Fingscheidt, Tim
Primary Examiner(s)
Lerner; Martin

Application Number

US10/566,293
Publication Number

US 20070112568A1
Time in Patent Office

2,044 Days
Field of Search

704/8, 704/231, 704/233, 704/243, 704/255, 704/256, 704/277, 379/88.06
US Class Current

704/8
CPC Class Codes

G10L 15/063   Training

G10L 15/142   Hidden Markov Models [HMMs]

G10L 2015/223   Execution procedure of a sp...

Speech recognition with language-dependent model vectors

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition with language-dependent model vectors

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links