Method for speech recognition on all languages and for inputing words using speech recognition

US 8,352,263 B2
Filed: 09/29/2009
Issued: 01/08/2013
Est. Priority Date: 09/17/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A method for speech recognition on all languages and for inputting words, wherein a word is language independent and an unknown voice provides pronunciation of an unknown word, wherein m unknown voices having samples and a database of commonly-used known words not having samples is used, the method comprising:

(a) using a pre-processor to delete noise and the time interval without speech signal;

(b) normalizing the whole speech waveform of an unknown voice (or a word), using E equal elastic frames (windows) without filter and without overlap and to transform the waveform into an equal-sized E×

P matrix, such that E is equal to P, of the linear predict coding cepstra (LPCC) such that the same unknown voices (or words) have about the same LPCC at the same time position in their equal-sized E×

P matrices of LPCC;

(c) for each unknown voice of m unknown voices, finding the sample mean and variance of linear predict coding cepstra (LPCC), a E×

P matrix of sample means and variances representing an unknown voice and an unknown voice representing a category of known words with similar pronunciation to the unknown voice;

(d) pronouncing with a speaker standard and clear utterance pronunciations of all words in the database wherein if the user pronunciations use different languages or dialects or with special accents, letting the user pronounce all the words;

(e) normalizing the whole speech waveform of a pronounced word, using E equal elastic frames (windows) without filter and without overlap to transform the waveform into an equal-sized E×

P matrix of linear predict coding cepstra (LPCC);

(f) comparing with a simplified Bayesian classifier the E×

P matrix of linear predict coding cepstra (LPCC) of the pronounced word and using Bayesian distance (similarity) to find the most similar unknown voice to the pronounced word, the pronounced word being put into the category of known words represented by its most similar unknown voice, all pronounced words being classified into m categories of known words, each category containing known words with similar pronunciations, wherein a pronounced word may be classified into several categories;

(g) pronunciation by a user of a word, which is transformed into a E×

P matrix of linear predict coding cepstra (LPCC);

(h) finding with the simplified Bayesian classifier the F most similar unknown voices for the pronounced word, wherein the simplified Bayesian classifier uses the F least Bayesian distances (similarities) to the pronounced word to find the F most similar unknown voices;

(i) representing all known words from F categories, wherein the F most unknown voices are arranged in a decreasing similarity according to their (absolute) distances (similarities) of the E×

P matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word;

(j) arranging all known words into F categories in a decreasing similarity and partitioning them into several equal segments, wherein each segment of known words is arranged in a line according to their alphabetic letters or the number of strokes of Chinese character, wherein all known words in F categories are arranged into a matrix according to their pronunciation similarity to the pronounced word and their alphabetic letters, the pronounced word being found in the matrix by using the pronunciation similarity and the alphabetic letters or number of strokes in Chinese;

(k) recognizing a sentence or name within the voice;

(l) recognizing unsuccessful words, unsuccessful sentences or names and providing help to input words;

(m) representing the sample means and variances of m unknown voices using constants, which are independent of languages, accents, person and sex; and

(n) using the Bayesian classifier to classify the word into several categories, using any language-independent word or any accent or any dialect to pronounce the word, even if pronounced incorrectly or completely wrong.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

28 Citations

View as Search Results

4 Claims

1. A method for speech recognition on all languages and for inputting words, wherein a word is language independent and an unknown voice provides pronunciation of an unknown word, wherein m unknown voices having samples and a database of commonly-used known words not having samples is used, the method comprising:
- (a) using a pre-processor to delete noise and the time interval without speech signal;
  
  (b) normalizing the whole speech waveform of an unknown voice (or a word), using E equal elastic frames (windows) without filter and without overlap and to transform the waveform into an equal-sized E×
  
  P matrix, such that E is equal to P, of the linear predict coding cepstra (LPCC) such that the same unknown voices (or words) have about the same LPCC at the same time position in their equal-sized E×
  
  P matrices of LPCC;
  
  (c) for each unknown voice of m unknown voices, finding the sample mean and variance of linear predict coding cepstra (LPCC), a E×
  
  P matrix of sample means and variances representing an unknown voice and an unknown voice representing a category of known words with similar pronunciation to the unknown voice;
  
  (d) pronouncing with a speaker standard and clear utterance pronunciations of all words in the database wherein if the user pronunciations use different languages or dialects or with special accents, letting the user pronounce all the words;
  
  (e) normalizing the whole speech waveform of a pronounced word, using E equal elastic frames (windows) without filter and without overlap to transform the waveform into an equal-sized E×
  
  P matrix of linear predict coding cepstra (LPCC);
  
  (f) comparing with a simplified Bayesian classifier the E×
  
  P matrix of linear predict coding cepstra (LPCC) of the pronounced word and using Bayesian distance (similarity) to find the most similar unknown voice to the pronounced word, the pronounced word being put into the category of known words represented by its most similar unknown voice, all pronounced words being classified into m categories of known words, each category containing known words with similar pronunciations, wherein a pronounced word may be classified into several categories;
  
  (g) pronunciation by a user of a word, which is transformed into a E×
  
  P matrix of linear predict coding cepstra (LPCC);
  
  (h) finding with the simplified Bayesian classifier the F most similar unknown voices for the pronounced word, wherein the simplified Bayesian classifier uses the F least Bayesian distances (similarities) to the pronounced word to find the F most similar unknown voices;
  
  (i) representing all known words from F categories, wherein the F most unknown voices are arranged in a decreasing similarity according to their (absolute) distances (similarities) of the E×
  
  P matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word;
  
  (j) arranging all known words into F categories in a decreasing similarity and partitioning them into several equal segments, wherein each segment of known words is arranged in a line according to their alphabetic letters or the number of strokes of Chinese character, wherein all known words in F categories are arranged into a matrix according to their pronunciation similarity to the pronounced word and their alphabetic letters, the pronounced word being found in the matrix by using the pronunciation similarity and the alphabetic letters or number of strokes in Chinese;
  
  (k) recognizing a sentence or name within the voice;
  
  (l) recognizing unsuccessful words, unsuccessful sentences or names and providing help to input words;
  
  (m) representing the sample means and variances of m unknown voices using constants, which are independent of languages, accents, person and sex; and
  
  (n) using the Bayesian classifier to classify the word into several categories, using any language-independent word or any accent or any dialect to pronounce the word, even if pronounced incorrectly or completely wrong.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein step (h) further includes a simplified Bayesian classifier to compare the E×
    - P matrix of samples means and variances of LPCC of an unknown voice with the E×
      
      P matrix of linear predict coding cepstra (LPCC) of the pronounced word, further comprising;
      
      wherein a pronounced word is represented by a E×
      
      P matrix of linear predict coding cepstra (LPCC), represented by X={X_jl}, j==1, . . . , E, l=1, . . . , P;
      
      wherein E×
      
      P and {X_jl} are independent and have normal distributions;
      
      wherein if the pronounced word is compared with an unknown voice ω
      
      _i, i=1, . . . , m, then {X_jl} has the means and variances (μ
      
      _ijl, σ
      
      ²_ijl) which are estimated by the sample means and sample variances of the samples of ω
      
      _i;
      
      wherein the density of X is
  - 3. The method of claim 1, wherein step (k) of recognizing a sentence or name within the voice further comprises the steps of:
    - (k1) representing a sentence or name by a long sequence of speech sampled points, in a unit time interval, wherein the total sum of distances between any two consecutive points is computed, wherein if the total sum is less than the total sum of noise, the unit time interval does not have a speech signal, and wherein if the unit time intervals without speech signal are accumulated to an amount more than the time between two syllables in a word it is determined to be a border between two pronounced words (a Chinese syllable being considered as a word with one syllable), the sentence or name is partitioned into D pronounced words;
      
      (k2) finding the sentence or name uttered by a speaker in the sentence and name database, wherein since a pronounced word may be partitioned into two words, in the sentence and name database, picking up the matching sentences or names with D−
      
      1, D and D+1 known words for matching the sentence or name uttered by the user;
      
      (k3) for each of D pronounced words, finding its F most similar unknown voices using the Bayesian classifier, and the F E×
      
      P matrices of means and variances having the F shortest Bayesian distances to the E×
      
      P matrix of LPCC representing the pronounced word, a sentence or name being represented by a D×
      
      F matrix of unknown voices;
      
      (k4) for each pronounced word, arranging all known words from F categories represented by its F most similar unknown voices in a decreasing similarity according to their absolute distances (pronunciation similarity) of the matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word, there being D lines of decreasingly similar known words which contain the sentence or name uttered by the user;
      
      (k5) if a matching sentence or name in the sentence and name database exactly D known words, then matching each known word of the matching sentence or name with each line of decreasingly similar known words of D lines in a row order from the first row to the last row, if each row of decreasingly similar known words contains its corresponding known word of the matching sentence or name, then a number of D pronounced words being recognized correctly as the sentence or name uttered by the speaker;
      
      (k6) if a matching sentence or name in the sentence and name database does not have the exact number of D pronounced words, or if in (k5) at least one line of decreasingly similar known words does not have a known word or the matching sentence or name, using a screen window of 3 consecutive lines of decreasingly similar known words to find the sentence or name, and the (i−
      
      1)-th, i-th and (i+1)-th lines of decreasingly similar known words in the screen window checks the i-th known word of the matching sentence or name and computing the probability (the number of known words of the matching sentence or name in the screen window divided by total number of words in the matching sentence or name), wherein a matching sentence or name in the sentence and name database with the highest probability to be the sentence or name uttered by the user is selected.
  - 4. The method of claim 1, wherein step (l) of recognizing unsuccessful words, unsuccessful sentences or names and providing help to input words further comprises:
    - (l1) if a pronounced word cannot be recognized, then determining whether the pronounced word is not in F categories represented by its F most similar unknown voices or not in all m categories;
      
      (l2) if a pronounced word is not in the F categories represented by its F most similar unknown voices, receiving from the user a pronunciation of the unsuccessful word again, finding it most similar unknown voice, and relocating the pronounced word into its proper category represented by the most similar unknown voice, and recognizing the pronounced word and input correctly;
      
      (l3) if the pronounced word is not in all m categories, receiving from the user a pronunciation of the new word again, and finding its most similar unknown voice wherein the new pronounced word is added to the category represented by the most similar unknown voice;
      
      (l4) if a sentence or a name is not recognized, receiving from the user an utterance of the sentence or name again, partitioning the sentence or name into D pronounced words, finding the most similar unknown voice for each pronounced word, relocating the pronounced word into its proper category represented by its most similar unknown voice and then correctly recognizing the sentence or name;
      
      (l5) only relocating unsuccessful words into another category, and not changing any features (sample means and variances) of m unknown voices as the sample means and variances of m unknown voices are considered to be constants, which are independent of languages, accents, persons and sex, wherein the recognition method is stable for all users.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Li-Chuan Liao, Shih-Hon Li, Shih-Tzung Li, Tai-Jan Lee Li, Tze-Fen Li
Original Assignee
Li-Chuan Liao, Shih-Hon Li, Shih-Tzung Li, Tai-Jan Lee Li, Tze-Fen Li
Inventors
Li, Tai-Jan Lee, Li, Shih-Tzung, Li, Shih-Hon, Liao, Li-Chuan, Li, Tze-Fen
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US12/569,155
Publication Number

US 20110066434A1
Time in Patent Office

1,197 Days
Field of Search

704 1- 10, 704219-220, 704232-245, 704246-250, 704251-257, 704E17001-E17016, 704E15001-E1505, 704E11001-E11007
US Class Current

704/243
CPC Class Codes

G10L 15/10 using distance or distortio...

Method for speech recognition on all languages and for inputing words using speech recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

28 Citations

4 Claims

Specification

Use Cases

Quick Links

Others

Method for speech recognition on all languages and for inputing words using speech recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

4 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others