Bifurcated speaker specific and non-speaker specific speech recognition method and apparatus

US 6,070,139 A
Filed: 08/20/1996
Issued: 05/30/2000
Est. Priority Date: 08/21/1995
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition device, comprising:

a data processing terminal, comprising;

a speech input unit to receive sounds including speech and translate the received speech into digital form;

a speech analyzer coupled to said speech input unit to generate voice feature parameters for the received digitized speech; and

a speaker accommodation unit comprising;

a first feature reference memory for storing pre-registered non-specific speaker feature information,a conversion rule generated in advance highlighting variations between previously stored specific speaker feature information and the pre-registered non-specific speaker feature information, anda feature converter for generating converted voice feature parameters received from said speech analyzer based on the conversion rule, anda speech recognition processor, comprising;

a second feature reference memory for storing standard feature information corresponding to pre-registered phrases;

a phrase detector to determine whether the converted voice feature parameters substantially match any pre-registered phrases in said second feature reference memory and generate phrase detection data in response thereto; and

a comprehension controller coupled to said phrase detector to receive the phrase detection data, to recognize a meaning of the received speech based on the received phrase detection data, and to perform at least one of controlling an action and formulating an appropriate response responsive to the recognized meaning;

wherein said data processing terminal transmits the converted voice feature parameters to said speech recognition processor which is in radio frequency communication with said data processing terminal to receive the converted voice feature parameters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Bifurcated speaker specific and non-speaker specific method and apparatus is provided for enabling speech-based remote control and for recognizing the speech of an unspecified speaker at extremely high recognition rates regardless of the speaker'"'"'s age, sex, or individual speech mannerisms. A device main unit is provided with a speech recognition processor for recognizing speech and taking an appropriate action, and with a user terminal containing specific speaker capture and/or preprocessing capabilities. The user terminal exchanges data with the speech recognition processor using radio transmission. The user terminal may be provided with a conversion rule generator that compares the speech of a user with previously compiled standard speech feature data and, based on this comparison result, generates a conversion rule for converting the speaker'"'"'s speech feature parameters to corresponding standard speaker'"'"'s feature information. The speech recognition processor, in turn, may reference the conversion rule developed in the user terminal and perform speech recognition based on the input speech feature parameters that have been converted above.

Citations

18 Claims

1. A speech recognition device, comprising:
- a data processing terminal, comprising;
  
  a speech input unit to receive sounds including speech and translate the received speech into digital form;
  
  a speech analyzer coupled to said speech input unit to generate voice feature parameters for the received digitized speech; and
  
  a speaker accommodation unit comprising;
  
  a first feature reference memory for storing pre-registered non-specific speaker feature information,a conversion rule generated in advance highlighting variations between previously stored specific speaker feature information and the pre-registered non-specific speaker feature information, anda feature converter for generating converted voice feature parameters received from said speech analyzer based on the conversion rule, anda speech recognition processor, comprising;
  
  a second feature reference memory for storing standard feature information corresponding to pre-registered phrases;
  
  a phrase detector to determine whether the converted voice feature parameters substantially match any pre-registered phrases in said second feature reference memory and generate phrase detection data in response thereto; and
  
  a comprehension controller coupled to said phrase detector to receive the phrase detection data, to recognize a meaning of the received speech based on the received phrase detection data, and to perform at least one of controlling an action and formulating an appropriate response responsive to the recognized meaning;
  
  wherein said data processing terminal transmits the converted voice feature parameters to said speech recognition processor which is in radio frequency communication with said data processing terminal to receive the converted voice feature parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech recognition processor of claim 1, wherein said data processing terminal includes a radio frequency transmitter coupled to said speaker accommodation unit to transmit the converted voice feature parameters to said speech recognition processor and wherein said speech recognition processor includes a complementary radio frequency receiver in radio frequency communication with said terminal transmitter to receive the converted voice feature parameters.
  - 3. The speech recognition processor of claim 1, wherein said speaker accommodation unit further comprises a conversion rule generator for generating the conversion rule and a conversion rule memory coupled to the conversion rule generator for storing the conversion rule.
  - 4. The speech recognition processor of claim 3, wherein said conversion rule generator and said conversion rule memory are housed in a removable cartridge in releasable communication with said data processing terminal.
  - 5. The speech recognition processor of claim 4, wherein said conversion rule generator includes an input speaker codebook generator for generating in advance a mapping function highlighting variations between previously stored specific speaker information and the pre-registered non-specific speaker information, wherein said conversion rule memory includes a speaker codebook coupled to said input speaker codebook generator to retain the generated mapping function, and wherein said feature converter includes a vector quantization unit in communication with said speaker codebook to generate the converted voice feature parameters based on the retained mapping function.
  - 6. The speech recognition device of claim 5, wherein said input speaker codebook generator and said speaker codebook are housed in a removable cartridge in releasable communication with said data processing terminal.
  - 7. The speech recognition device of claim 5, wherein said speaker codebook comprises disparate input speaker and standard speaker codebooks.
  - 8. The speech recognition device of claim 1, wherein said speech recognition processor further comprises a speech synthesizer in communication with said comprehension controller to selectively generate synthesized audio corresponding to the appropriate response formulated by said comprehension controller, and a speech output unit in communication with said speech synthesizer to audibly reproduce the synthesized audio.
  - 9. The speech recognition device of claim 1, wherein said speech recognition processor further comprises a drive controller in communication with said comprehension controller for performing the appropriate action responsive to the recognizing meaning.

10. A speech recognition device, comprising:
- a speech input unit to receive sounds including speech and translate the received speech into digital form;
  
  a speech analyzer coupled to said speech input unit to generate voice feature parameters for the received digitized speech;
  
  a data processing terminal including a speaker accommodation unit comprising;
  
  a first feature reference memory for storing pre-registered non-specific speaker feature information,a conversion rule generated in advance highlighting variations between previously stored specific speaker feature information and the pre-registered non-specific speaker feature information, anda feature converter for generating converted voice feature parameters received from said speech analyzer based on the conversion rule, anda speech recognition processor, comprising;
  
  a second feature reference memory for storing standard feature information corresponding to pre-registered phrases;
  
  a phrase detector to determine whether the converted voice feature parameters substantially match any pre-registered phrases in said second feature reference memory and generate phrase detection data in response thereto; and
  
  a comprehension controller coupled to said phrase detector to receive the phrase detection data, to recognize a meaning of the received speech based on the received phrase detection data, and to perform at least one of controlling an action and formulating an appropriate response responsive to the recognized meaning;
  
  wherein said speech analyzer transmits the voice feature parameters to said data processing terminal which is in radio frequency communication with said speech analyzer to receive the voice feature parameters.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The speech recognition device of claim 10, further comprising a first radio frequency transceiver coupled to said speech analyzer and said feature converter and in radio frequency communication with said data processing terminal to transmit the voice feature parameters to said terminal and receive conversion rule information therefrom, and wherein said terminal includes a complementary second radio frequency transceiver coupled to said speaker accommodation unit to enable bidirectional data exchange with said first transceiver.
  - 12. The speech recognition device of claim 10, wherein said speaker accommodation unit further comprises a conversion rule generator for generating the conversion rule and a conversion rule memory coupled to the conversion rule generator for storing the conversion rule.
  - 13. The speech recognition device of claim 12, wherein said conversion rule generator and said conversion rule memory are housed in a removable cartridge in releasable communication with said data processing terminal.
  - 14. The speech recognition device of claim 13, wherein said conversion rule generator includes an input speaker codebook generator for generating in advance a mapping function highlighting variations between previously stored specific speaker information and the pre-registered non-specific speaker information, wherein said conversion rule memory includes a speaker codebook coupled to said input speaker codebook generator to retain the generated mapping function, and wherein said feature converter includes a vector quantization unit in communication with said speaker codebook to generate the converted voice feature parameters based on the retained mapping function.
  - 15. The speech recognition device of claim 14, wherein said input speaker codebook generator and said speaker codebook are housed in a removable cartridge in releasable communication with said data processing terminal.
  - 16. The speech recognition device of claim 14, wherein said speaker codebook comprises disparate input speaker and standard speaker codebooks.
  - 17. The speech recognition device of claim 10, wherein said speech recognition processor further comprises a speech synthesizer in communication with said comprehension controller to selectively generate synthesized audio corresponding to the appropriate response formulated by said comprehension controller, and a speech output unit in communication with said speech synthesizer to audibly reproduce the synthesized audio.
  - 18. The speech recognition device of claim 10, further comprising a drive controller in communication with said comprehension controller for performing the appropriate action responsive to the recognized meaning.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Seiko Epson Corporation (Seiko Group)
Original Assignee
Seiko Epson Corporation (Seiko Group)
Inventors
Urano, Osamu, Edatsune, Isao, Miyazawa, Yasunaga, Inazumi, Mitsuhiro, Hasegawa, Hiroshi
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/699,874
Time in Patent Office

1,379 Days
Field of Search

704/246, 704/251, 704/256, 704/270, 704/275, 704/201
US Class Current

704/275
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

Bifurcated speaker specific and non-speaker specific speech recognition method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Bifurcated speaker specific and non-speaker specific speech recognition method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links