×

Method for speech recognition on all languages and for inputing words using speech recognition

  • US 8,352,263 B2
  • Filed: 09/29/2009
  • Issued: 01/08/2013
  • Est. Priority Date: 09/17/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for speech recognition on all languages and for inputting words, wherein a word is language independent and an unknown voice provides pronunciation of an unknown word, wherein m unknown voices having samples and a database of commonly-used known words not having samples is used, the method comprising:

  • (a) using a pre-processor to delete noise and the time interval without speech signal;

    (b) normalizing the whole speech waveform of an unknown voice (or a word), using E equal elastic frames (windows) without filter and without overlap and to transform the waveform into an equal-sized E×

    P matrix, such that E is equal to P, of the linear predict coding cepstra (LPCC) such that the same unknown voices (or words) have about the same LPCC at the same time position in their equal-sized E×

    P matrices of LPCC;

    (c) for each unknown voice of m unknown voices, finding the sample mean and variance of linear predict coding cepstra (LPCC), a E×

    P matrix of sample means and variances representing an unknown voice and an unknown voice representing a category of known words with similar pronunciation to the unknown voice;

    (d) pronouncing with a speaker standard and clear utterance pronunciations of all words in the database wherein if the user pronunciations use different languages or dialects or with special accents, letting the user pronounce all the words;

    (e) normalizing the whole speech waveform of a pronounced word, using E equal elastic frames (windows) without filter and without overlap to transform the waveform into an equal-sized E×

    P matrix of linear predict coding cepstra (LPCC);

    (f) comparing with a simplified Bayesian classifier the E×

    P matrix of linear predict coding cepstra (LPCC) of the pronounced word and using Bayesian distance (similarity) to find the most similar unknown voice to the pronounced word, the pronounced word being put into the category of known words represented by its most similar unknown voice, all pronounced words being classified into m categories of known words, each category containing known words with similar pronunciations, wherein a pronounced word may be classified into several categories;

    (g) pronunciation by a user of a word, which is transformed into a E×

    P matrix of linear predict coding cepstra (LPCC);

    (h) finding with the simplified Bayesian classifier the F most similar unknown voices for the pronounced word, wherein the simplified Bayesian classifier uses the F least Bayesian distances (similarities) to the pronounced word to find the F most similar unknown voices;

    (i) representing all known words from F categories, wherein the F most unknown voices are arranged in a decreasing similarity according to their (absolute) distances (similarities) of the E×

    P matrices of LPCC of the known words from F categories to the matrix of LPCC of the pronounced word;

    (j) arranging all known words into F categories in a decreasing similarity and partitioning them into several equal segments, wherein each segment of known words is arranged in a line according to their alphabetic letters or the number of strokes of Chinese character, wherein all known words in F categories are arranged into a matrix according to their pronunciation similarity to the pronounced word and their alphabetic letters, the pronounced word being found in the matrix by using the pronunciation similarity and the alphabetic letters or number of strokes in Chinese;

    (k) recognizing a sentence or name within the voice;

    (l) recognizing unsuccessful words, unsuccessful sentences or names and providing help to input words;

    (m) representing the sample means and variances of m unknown voices using constants, which are independent of languages, accents, person and sex; and

    (n) using the Bayesian classifier to classify the word into several categories, using any language-independent word or any accent or any dialect to pronounce the word, even if pronounced incorrectly or completely wrong.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×