Method for Speech Recognition on All Languages and for Inputing words using Speech Recognition

US 20110066434A1
Filed: 09/29/2009
Published: 03/17/2011
Est. Priority Date: 09/17/2009
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition on all languages and for inputing words using speech recognition provides speech recognition on all languages and a method to input words comprising:

(1). a word may be English, Chinese or in any other languages and an unknown voice is the pronunciation of an unknown word, the invention needs m unknown (or known) voices and a database of commonly-used known words, each unknown voice has samples and all known words have no samples;

(2). a pre-processor to delete noise and the time interval without speech signal;

(3). a method to normalize the whole speech waveform of an unknown voice (or a word), using E equal elastic frames (windows) without filter and without overlap and to transform the waveform into an equal-sized E×

P matrix of the linear predict coding cepstra (LPCC) such that the same unknown voices (or words) have about the same LPCC at the same time position in their equal-sized E×

P matrices of LPCC;

(4) for each unknown voice of m unkown voices, find the sample mean and variance of linear predict coding cepstra (LPCC), a E×

P matrix of sample means and variances represents an unknown voice and an unknown voice represents a category of known words with similar pronunciation to the unknown voice;

(5). a speaker with standard and clear utterance pronounces all words in the database and if the user pronounces using different languages or dialects or with special accents, let the user pronounce all words;

(6). a method to normalize the whole speech waveform of a pronounced word, using E equal elastic frames (windows) without filter and without overlap to transform the waveform into an equal-sized E×

P matrix of linear predict coding cepstra (LPCC);

(7). a simplified Bayesian classifier to compare the E×

P matrix of sample means and variances of LPCC of an unknown voice with the E×

P matrix of linear predict coding cepstra (LPCC) of the pronounced word and use the Bayesian distance (similarity) to find the most similar unknown voice to the pronounced word, the pronounced word is put into the category of known words represented by its most similar unknown voice, all pronounced words are classified into m categories of known words, each category contains known words with similar pronunciation, a pronounced word may be classified into several categories;

(8). a user pronounces a word, which is transformed into a E×

P matrix of linear predict coding cepstra (LPCC);

(9). the simplified Bayesian classifier finds the F most similar unknown voices for the pronounced word, i.e., the simplified Bayesian classifier uses the F least Bayesian distances (similarity) to the pronounced word to find the F most similar unknown voices;

(10). all known words from F categories represented by the F most unknown voices are arranged in a decreasing similarity according to their (absolute) distances (similarity) of the E×

P matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word, the word pronounced by the user should be among the several top known words (left-handed side);

(11). all known words in F categories after arranged in a decreasing similarity is partitioned into several equal segments, each segment of known words are arranged in a line accoding to their alphabetic latters (or the number of strokes of a Chinese character), i.e., all known words in F categories are arranged into a matrix according to their pronunciation similarity to the pronounced word and their alphabetic letters, the pronounced word is easily to be found in the matrix by using the pronunciation similarity and the alphabetic letters (the number of strokes in Chinese) of the word pronounced by the user;

(12). a method to recognize a sentence or name;

(13). a skill to help recognize unsuccessful words, unsuccessful sentences or names, and help input words;

(14). the sample means and variances of m unknown voices are considered to be constants, which are independent of languages, accents, persons and sex, hence the recognition method of the invention is stable for all users and any user can easily use the invention to recognize and input a large number of words;

(15). for the same word, a user can use any language (English, Chinese, Japanese, German and etc.) or any accent or any dialect to pronounce, even to pronounce incorrectly or completely wrong, the Bayesian classifier classifies the same word into several categories, hence a user can easily use the invention to recognize a word, a sentence or name and input a word.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

Citations

6 Claims

1. A method for speech recognition on all languages and for inputing words using speech recognition provides speech recognition on all languages and a method to input words comprising:
- (1). a word may be English, Chinese or in any other languages and an unknown voice is the pronunciation of an unknown word, the invention needs m unknown (or known) voices and a database of commonly-used known words, each unknown voice has samples and all known words have no samples;
  
  (2). a pre-processor to delete noise and the time interval without speech signal;
  
  (3). a method to normalize the whole speech waveform of an unknown voice (or a word), using E equal elastic frames (windows) without filter and without overlap and to transform the waveform into an equal-sized E×
  
  P matrix of the linear predict coding cepstra (LPCC) such that the same unknown voices (or words) have about the same LPCC at the same time position in their equal-sized E×
  
  P matrices of LPCC;
  
  (4) for each unknown voice of m unkown voices, find the sample mean and variance of linear predict coding cepstra (LPCC), a E×
  
  P matrix of sample means and variances represents an unknown voice and an unknown voice represents a category of known words with similar pronunciation to the unknown voice;
  
  (5). a speaker with standard and clear utterance pronounces all words in the database and if the user pronounces using different languages or dialects or with special accents, let the user pronounce all words;
  
  (6). a method to normalize the whole speech waveform of a pronounced word, using E equal elastic frames (windows) without filter and without overlap to transform the waveform into an equal-sized E×
  
  P matrix of linear predict coding cepstra (LPCC);
  
  (7). a simplified Bayesian classifier to compare the E×
  
  P matrix of sample means and variances of LPCC of an unknown voice with the E×
  
  P matrix of linear predict coding cepstra (LPCC) of the pronounced word and use the Bayesian distance (similarity) to find the most similar unknown voice to the pronounced word, the pronounced word is put into the category of known words represented by its most similar unknown voice, all pronounced words are classified into m categories of known words, each category contains known words with similar pronunciation, a pronounced word may be classified into several categories;
  
  (8). a user pronounces a word, which is transformed into a E×
  
  P matrix of linear predict coding cepstra (LPCC);
  
  (9). the simplified Bayesian classifier finds the F most similar unknown voices for the pronounced word, i.e., the simplified Bayesian classifier uses the F least Bayesian distances (similarity) to the pronounced word to find the F most similar unknown voices;
  
  (10). all known words from F categories represented by the F most unknown voices are arranged in a decreasing similarity according to their (absolute) distances (similarity) of the E×
  
  P matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word, the word pronounced by the user should be among the several top known words (left-handed side);
  
  (11). all known words in F categories after arranged in a decreasing similarity is partitioned into several equal segments, each segment of known words are arranged in a line accoding to their alphabetic latters (or the number of strokes of a Chinese character), i.e., all known words in F categories are arranged into a matrix according to their pronunciation similarity to the pronounced word and their alphabetic letters, the pronounced word is easily to be found in the matrix by using the pronunciation similarity and the alphabetic letters (the number of strokes in Chinese) of the word pronounced by the user;
  
  (12). a method to recognize a sentence or name;
  
  (13). a skill to help recognize unsuccessful words, unsuccessful sentences or names, and help input words;
  
  (14). the sample means and variances of m unknown voices are considered to be constants, which are independent of languages, accents, persons and sex, hence the recognition method of the invention is stable for all users and any user can easily use the invention to recognize and input a large number of words;
  
  (15). for the same word, a user can use any language (English, Chinese, Japanese, German and etc.) or any accent or any dialect to pronounce, even to pronounce incorrectly or completely wrong, the Bayesian classifier classifies the same word into several categories, hence a user can easily use the invention to recognize a word, a sentence or name and input a word.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method for speech recognition on all languages and for inputing words using speech recognition of claim 1 wherein said step (2) further includes two methods to delete noise:
    - (a) in a unit time interval, compute the variance of sampled points in the unit time interval and if the variance is less than the variance of noise, delete the unit time interval;
      
      (b). in a unit time interval, compute the total sum of absolute distances between two consecutive sampled points and if the total sum of absolute distances is less than that of noise, delete the time interval.
  - 3. A method for speech recognition on all languages and for inputing words using speech recognition of claim 1 wherein said step (3) further includes a method for normalization of signal waveform of a word or an unknown voice into an equal-sized E×
    - P matrix of linear predict coding cepstra (LPCC);
      
      (a). a method to uniformly partition the whole waveform of a word or an unknown voice into E equal sections and each section forms an elastic frame (window) without filter and without overlap such that E equal elastic frames can contract and expand themselves to cover the whole waveform;
      
      (b). in each elastic frame, use a linear regression model to estimate the nonlinear time-varying waveform to produce a set of regression coefficients, i.e., linear predict coding (LPC) coefficients by the least squares method;
      
      (c). use Durbin'"'"'s recursive equations
  - 4. A method for speech recognition on all languages and for inputing words using speech recognition of claim 1 wherein said step (7) further includes a simplified Bayesian classifier to compare the E×
    - P matrix of sample means and variances of LPCC of an unknown voice with the E×
      
      P matrix of linear predict coding cepstra (LPCC) of the pronounced word;
      
      (a). a pronounced word is represented by a E×
      
      P matrix of linear predict coding cepstra (LPCC), represented by X={X_jl}, j=1, . . . , E, l=1, . . . , P;
      
      (b). assume that E×
      
      P {X_jl} are independent and have normal distributions;
      
      (c). if the pronounced word is compared with an unknown voice ω
      
      _i, i=1, . . . , m, then {X_jl} have the means and variances (μ
      
      _ijl, σ
      
      _ijl²) which are estimated by the sample means and sample variances of the samples of ω
      
      _i;
      
      (d). the density of X is
  - 5. A method for speech recognition on all languages and for inputing words using speech recognition of claim 1 wherein step (12) further includes a method to recognize a sentence and a name:
    - (a). a sentence or name is represented by a long sequence of speech sampled points, in a unit time interval, we first compute total sum of distances between any two consecutive points, if the total sum is less than the total sum of noise, the unit time interval does not have speech signal, if the unit time intervals without speech signal are accumulated to a certain amount (more than the time between two syllables in a word), it must be a border line between two pronounced words (a Chinese syllable is considered as a word with one syllable), the sentence or name is partitioned into D pronounced words;
      
      (b). in the sentence and name database, find the sentence or name uttered by a speaker, since a pronounced word may be partitioned into two words, in the sentence and name database, pick up the matching sentences or names with D−
      
      1, D and D+1 known words for matching the sentence or name uttered by the user;
      
      (c). to each of D pronounced words, find its F most similar unknown voices using the Bayesian classifier, i.e., the F E×
      
      P matrices of means and variances have the F shortest Bayesian distances to the E×
      
      P matrix of LPCC representing the pronounced word, a sentence or name is represented by a D×
      
      F matrix of unknown voices;
      
      (d). for each pronounced word, arrange all known words from F categories represented by its F most similar unknown voices in a decreasing similarity according to their (absolute) distances (pronunciation similarity) of the matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word, there are D lines of decreasingly similar known words which should contain the sentence or name uttered by the user;
      
      (e). if a matching sentence or name in the sentence and name database has exact D known words, then match each known word of the matching sentence or name with each line of decreasingly similar known words of D lines in a row order from the first row to the last one, if each row of decreasingly similar known words contains its corresponding known word of the matching sentence or name, there are a number of D pronounced words recognized correctly and hence the matching sentence or name is the sentence or name uttered by the speaker;
      
      (f). if a matching sentence or name in the sentence and name database does not have the exact number of D pronounced words or in (e), at least one line of decreasingly similar known words does not have a known word of the matching sentence or name, we use a screen window of 3 consecutive lines of decreasingly similar known words to find the sentence or name, i.e., the (i−
      
      1)-th, i-th and (i+1)-th lines of decreasingly similar known words in the screen window checks the i-th known word of the matching sentence or name and compute the probability (the number of known words of the matching sentence or name in the screen window divided by total number of words in the matching sentence or name), the invention selects a matching sentence or name in the sentence and name database with the highest probability to be the sentence or name uttered by the user.
  - 6. A method for speech recognition on all languages and for inputing words using speech recognition of claim 1 wherein step (13) further includes a skill to help recognize unsuccessful words, sentences and names and also help input unsuccessful words:
    - (a). if a pronounced word can not be recognized, then the pronounced word is not in F categories represented by its F most similar unknown voices or not in all m categories;
      
      (b) if a pronounced word is not in the F categories represented by its F most similar unknown voices, the user pronounces the unsuccessful word again, the invention finds its most similar unknown voice, the pronounced word is relocated into its proper category represented by the most similar unknown voice and the pronounced word will be recognized and input correctly;
      
      (c) if the pronounced word is not in all m categories, the user pronounces the new word again, the invention finds its most similar unknown voice and the new pronounced word is added to the category represented by the most similar unknown voice;
      
      (d). if a sentence or a name is not recognized, the user utters the sentence or name again, the sentence or name is partitioned into D pronounced words, the invention finds the most similar unknown voice for each pronounced word, the pronounced word is relocated into its proper category represented by its most similar unknown voice and then the sentence or name will be recognized correctly;
      
      (e). the invention only relocates unsuccessful words into another categories, it does not change any features (sample means and variances) of m unknown voices, the sample means and variances of m unknown voices are considered to be constants, which are independent of languages, accents, persons and sex, the recognition method of the invention is stable for all users and any user can easily use the invention to recognize and input words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Li-Chuan Liao, Shih-Hon Li, Shih-Tzung Li, Tai-Jan Lee Li, Tze-Fen Li
Original Assignee
Li-Chuan Liao, Shih-Hon Li, Shih-Tzung Li, Tai-Jan Lee Li, Tze-Fen Li
Inventors
Liao, Li-Chuan, Li, Tai-Jan Lee, Li, Shih-Tzung, Li, Shih-Hon, LI, Tze-Fen

Granted Patent

US 8,352,263 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/241
CPC Class Codes

G10L 15/10 using distance or distortio...

Method for Speech Recognition on All Languages and for Inputing words using Speech Recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method for Speech Recognition on All Languages and for Inputing words using Speech Recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links