Speech recognition method

US 20030229497A1
Filed: 12/31/2002
Published: 12/11/2003
Est. Priority Date: 04/21/2000
Status: Active Grant

First Claim

Patent Images

1. A method of recognizing spoken language, comprising:

(a) collecting a plurality of spoken language samples for a first word or phrase, each of said plurality of samples being associated with a language color, such as a regional pronunciation, a mispronunciation, and emotion or Lessac energy;

(b) processing said spoken language samples into a database suitable for input into a speech recognition algorithm, said database comprising a plurality of graphemes, each of said graphemes comprising a written representation of its associated spoken language sample, said database for the comprising a color associated with at least some of said graphemes;

(c) receiving speech to be recognized and processing said speech to be recognized for input into a speech recognition algorithm;

(d) in putting a first portion of said process speech to be recognized into a speech recognition algorithm, said speech recognition algorithm selecting among process spoken language samples of different color but corresponding to the same graphemes to obtain a best match to a particular processed language sample and an associated recognized grapheme;

(e) outputting said recognized graphemes;

(f) in putting a second portion of said process speech into the sense speech recognition algorithm, said speech recognition algorithm, at least initially, limiting its selection among process spoken language samples to process spoken language samples of a single color, to efficiently obtain a best match to a particular processed language sample and an additional associated recognized graphemes; and

(g) outputting said additional recognized graphemes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with a present invention speech recognition is disclosed. It uses a microphone to receive audible sounds input by a user into a first computing device having a program with a database consisting of (i) digital representations of known audible sounds and associated alphanumeric representations of the known audible sounds and (ii) digital representations of known audible sounds corresponding to mispronunciations resulting from known classes of mispronounced words and phrases. The method is performed by receiving the audible sounds in the form of the electrical output of the microphone. A particular audible sound to be recognized is converted into a digital representation of the audible sound. The digital representation of the particular audible sound is then compared to the digital representations of the known audible sounds to determine which of those known audible sounds is most likely to be the particular audible sound being compared to the sounds in the database. A speech recognition output consisting of the alphanumeric representation associated with the audible sound most likely to be the particular audible sound is then produced. An error indication is then received from the user indicating that there is an error in recognition. The user also indicates the proper alphanumeric representation of the particular audible sound. This allows assistant to determine whether the error is a result of a known type or instance of mispronunciation. In response to a determination of error corresponding to a known type or instance of mispronunciation, the system presents an interactive training program from the computer to the user to enable the user to correct such mispronunciation.

Citations

13 Claims

1. A method of recognizing spoken language, comprising:
- (a) collecting a plurality of spoken language samples for a first word or phrase, each of said plurality of samples being associated with a language color, such as a regional pronunciation, a mispronunciation, and emotion or Lessac energy;
  
  (b) processing said spoken language samples into a database suitable for input into a speech recognition algorithm, said database comprising a plurality of graphemes, each of said graphemes comprising a written representation of its associated spoken language sample, said database for the comprising a color associated with at least some of said graphemes;
  
  (c) receiving speech to be recognized and processing said speech to be recognized for input into a speech recognition algorithm;
  
  (d) in putting a first portion of said process speech to be recognized into a speech recognition algorithm, said speech recognition algorithm selecting among process spoken language samples of different color but corresponding to the same graphemes to obtain a best match to a particular processed language sample and an associated recognized grapheme;
  
  (e) outputting said recognized graphemes;
  
  (f) in putting a second portion of said process speech into the sense speech recognition algorithm, said speech recognition algorithm, at least initially, limiting its selection among process spoken language samples to process spoken language samples of a single color, to efficiently obtain a best match to a particular processed language sample and an additional associated recognized graphemes; and
  
  (g) outputting said additional recognized graphemes.
- View Dependent Claims (2, 3)
- - 2. A method as in claim 1, wherein said spoken language samples are collected by random collection of speech samples and segregating them according to color.
  - 3. A method as in claim 1, wherein said spoken language samples are collected by repeated generation of the same graphemes with different color using trained speakers, said trained speakers speaking the words with the desired colors.

4. A method of speech recognition using a microphone to receive audible sounds input by a user into a computing device coupled to said microphone, said computing device having a program with database information comprising (i) digital representations of known audible sounds corresponding to proper pronunciations of phonemes and associated alphanumeric representations of said known audible sounds corresponding to proper pronunciations of phonemes forming a first database and (ii) digital representations of known audible sounds corresponding to mispronunciations, forming a second database comprising the steps of:
- (a) receiving said audible sounds in the form of an electrical output of said microphone;
  
  (b) converting said electrical output corresponding to a particular audible sound into a digital representation of said particular audible sound;
  
  (c) comparing said digital representation of said particular audible sound to said digital representations of said known audible sounds in said first and second databases to determine a match with the one of said known audible sounds most likely to be the particular audible sound being compared to the sounds in said database; and
  
  (d) outputting as a speech recognition output the alphanumeric representations associated with said audible sound most likely to be said particular audible sound;
- View Dependent Claims (5)
- - 5. A method as in claim 4, further comprising:
    - (e) outputting an error indication in response to a match with a known audible sound corresponding to a known mispronunciation; and
      
      (f) in response to a determination of error corresponding to a known type or instance of mispronunciation, giving the user the option of receiving speech training or training said program to recognize the user'"'"'s speech pattern; and
      
      (g) in response to exercise of said option, presenting an interactive training program from said computing device to said user to enable said user to correct such mispronunciation.

6. A method of speech recognition using a microphone to receive audible sounds input by a user into a computing device coupled to said microphone, said computing device having a program with database information comprising (i) digital representations of known audible sounds corresponding to proper pronunciations of phonemes and associated alphanumeric representations of said known audible sounds corresponding to proper pronunciations of phonemes forming a first database and (ii) digital representations of known audible sounds corresponding to mispronunciations, forming a second database comprising the steps of:
- (a) generating said database information by (i) having a person, who normally speaks said known audible sounds properly, speak said properly pronounced known audible sounds, and digitizing said properly pronounced known audible sounds spoken by said person who properly speaks said known audible sounds;
  
  to form a first database of digital representation of said properly pronounced known audible sounds and (ii) having a person who usually speaks said known audible sounds corresponding to mispronunciations and digitizing said known mispronounced audible sounds spoken by said person who usually speaks said known mispronounced audible sounds corresponding to mispronunciations to form a second database;
  
  (b) receiving said audible sounds in the form of an electrical output of said microphone receiving speech to be recognized;
  
  (c) converting said electrical output corresponding to a particular audible sound into a digital representation of said particular audible sound to be recognized;
  
  (d) comparing said digital representation of said particular audible sound to be recognized to said digital representations of said known audible sounds in said first and second databases to determine a match with the one of said known audible sounds most likely to be the particular audible sound to be recognized being compared to the sounds in said database; and
  
  (e) outputting as a speech recognition output the alphanumeric representations associated with said audible sound most likely to be said particular audible sound.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. A method as in claim 6, further comprising:
    - (f) outputting an error indication in response to a match with a known audible sound corresponding to a known mispronunciation; and
      
      (g) in response to a determination of error corresponding to a known mispronunciation, presenting an interactive training program from said computing device to said user to enable said user to correct such mispronunciation.
  - 8. A method as in claim 6, further comprising:
    - (e) outputting an error indication in response to a match with a known audible sound corresponding to a known mispronunciation; and
      
      (f) in response to a determination of error corresponding to a known mispronunciation, presenting an interactive training program from said computing device to said user to enable said user to correct such mispronunciation using Lessac System techniques.
  - 9. A method as in claim 6 further comprising:
    - (e) outputting an error indication in response to a match with a known audible sound corresponding to a known mispronunciation; and
      
      (f) in response to the detection of repeated instances or a reliable single instance of pronunciation error, presenting an interactive training program from said computer to said user to enable said user to correct such mispronunciation.
  - 10. A method of speech recognition as in claim 8, wherein said presenting an interactive training program from said computer to said user to enable said user to correct such mispronunciation is optional and is performed when elected by the user.
  - 11. A method of speech recognition as in claim 8, wherein said user is presented with an interactive training program in response to the detection of repeated instances or a reliable single instance of pronunciation error.
  - 12. A method of speech recognition as in claim 8, wherein said user is presented with an interactive training program in response to the detection of repeated instances or a reliable single instance of pronunciation error.
  - 13. A method of speech recognition as in claim 8, wherein said database information comprising (i) digital representations of known audible sounds corresponding to proper pronunciations of phonemes and associated alphanumeric representations of said known audible sounds corresponding to proper pronunciations of phonemes and (ii) digital representations of known audible sounds corresponding to mispronunciations is formed by (i) having a person, who normally speaks said known audible sounds properly, speak said known audible sounds, and digitizing said known audible sounds spoken by said person who properly speaks said known audible sounds;
    - and (ii) having a person who usually speaks said known audible sounds corresponding to mispronunciations and digitizing said known audible sounds spoken by said person who usually speaks said known audible sounds corresponding to mispronunciations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies, Inc.
Original Assignee
Lessac Technologies Incorporated
Inventors
Wilson, H. Donald, Marple, Gary, Handal, Anthony H., Lessac, Michael

Granted Patent

US 7,280,964 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270.1
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G09B 21/00   Teaching, or communicating ...

G09B 5/04   with audible presentation o...

G10L 13/10   Prosody rules derived from ...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/0638   Interactive procedures

Speech recognition method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links