Interactive speech recognition combining speaker-independent and speaker-specific word recognition, and having a response-creation capability

US 5,794,204 A
Filed: 09/29/1995
Issued: 08/11/1998
Est. Priority Date: 06/22/1995
Status: Expired due to Fees

First Claim

Patent Images

1. An interactive speech recognition apparatus, comprising:

a voice input unit to receive voice and translate the received voice into digital form;

a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice;

a non-specific speaker word identification unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generate identified non-specific speaker expression data in response thereto;

a specific speaker word enrollment unit in communication with said voice analysis unit to register individual expressions spoken by a specific speaker and generate identified specific speaker expression data based on the characteristic voice data;

a speech recognition and dialogue management unit in communication with said non-specific speaker word identification unit and said specific speaker word enrollment unit to receive the identified non-specific speaker expression data and the identified specific speaker expression data respectively therefrom, for selecting one of said non-specific speaker word identification unit and said specific speaker word enrollment unit to recognize a meaning from the received voice based on the received identified expression data, and to formulate an appropriate response from existing response data corresponding to the recognized meaning;

a response creation function in communication with said speech recognition and dialogue management unit to enable a user to create response data based on speaker input; and

a voice synthesizer in communication with said speech recognition and dialogue management unit to generate synthesized audio corresponding to the appropriate response formulated by said speech recognition and dialogue management unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker'"'"'s own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes.

93 Citations

View as Search Results

11 Claims

1. An interactive speech recognition apparatus, comprising:
- a voice input unit to receive voice and translate the received voice into digital form;
  
  a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice;
  
  a non-specific speaker word identification unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generate identified non-specific speaker expression data in response thereto;
  
  a specific speaker word enrollment unit in communication with said voice analysis unit to register individual expressions spoken by a specific speaker and generate identified specific speaker expression data based on the characteristic voice data;
  
  a speech recognition and dialogue management unit in communication with said non-specific speaker word identification unit and said specific speaker word enrollment unit to receive the identified non-specific speaker expression data and the identified specific speaker expression data respectively therefrom, for selecting one of said non-specific speaker word identification unit and said specific speaker word enrollment unit to recognize a meaning from the received voice based on the received identified expression data, and to formulate an appropriate response from existing response data corresponding to the recognized meaning;
  
  a response creation function in communication with said speech recognition and dialogue management unit to enable a user to create response data based on speaker input; and
  
  a voice synthesizer in communication with said speech recognition and dialogue management unit to generate synthesized audio corresponding to the appropriate response formulated by said speech recognition and dialogue management unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The speech recognition apparatus of claim 1, wherein said specific speaker word enrollment unit further comprises:
    - a word enrollment unit in communication with said voice analysis unit to receive characteristic voice data generated by said voice analysis unit in response to prescribed individual expressions spoken by the specific speaker;
      
      a standard pattern memory unit in communication with said word enrollment unit to store the characteristic voice data associated with the prescribed individual expressions as standard specific speaker patterns capable of recognition; and
      
      a word detection unit in communication with said standard pattern memory unit, said voice analysis unit and said speech recognition and dialogue management unit to determine whether the characteristic voice data generated by said voice analysis unit substantially matches the specific speaker patterns stored in said standard pattern memory unit and generate identified specific speaker expression data in response thereto.
  - 3. The speech recognition apparatus of claim 2, wherein said word detection unit of said specific speaker word enrollment unit determines whether the characteristic voice data generated by said voice analysis unit substantially matches the speaker-adapted standard patterns stored in said standard pattern memory unit through DP matching techniques.
  - 4. The speech recognition apparatus of claim 1, wherein said specific speaker word enrollment unit further comprises:
    - a word enrollment unit in communication with said voice analysis unit to receive characteristic voice data generated by said voice analysis unit in response to prescribed pre-registered expressions spoken by the specific speaker and generate speaker-adapted standard patterns capable of recognition based on the received characteristic voice data and standard non-specific speaker patterns associated with the prescribed pre-registered expressions;
      
      a standard pattern memory unit in communication with said word enrollment unit to store the speaker-adapted standard patterns; and
      
      a word detection unit in communication with said standard pattern memory unit, said voice analysis unit and said speech recognition and dialogue management unit to determine whether the characteristic voice data generated by said voice analysis unit substantially matches the speaker-adapted standard patterns stored in said standard pattern memory unit and generate identified specific speaker expression data in response thereto.
  - 5. The speech recognition apparatus of claim 1, further comprising:
    - a setup switch in communication with said speech recognition and dialogue management unit to selectively prompt for and capture speaker input information relating to individual expression registration operations.
  - 6. The speech recognition apparatus of claim 1, further comprising:
    - a response data registration unit in communication with said voice input unit and said speech recognition and dialogue management unit to receive digitized voice generated by said voice input unit in response to prescribed individual responses spoken by the specific speaker;
      
      a response data memory unit in communication with said speech recognition and dialogue management unit, said response data registration unit and said voice synthesizer to store the digitized voice corresponding to the prescribed individual responses and transfer digitized voice corresponding to the prescribed individual responses and predetermined responses to said voice synthesizer; and
      
      whereinsaid speech recognition and dialogue management unit formulates the appropriate response by accessing said response data memory and registration units.
  - 7. The speech recognition apparatus of claim 6, further comprising:
    - a setup switch in communication with said speech recognition and dialogue management unit to selectively prompt for and capture speaker input information relating to individual response registration operations.

8. An interactive speech recognition method, comprising the steps of:
- perceiving voice;
  
  translating the perceived voice into corresponding digital form;
  
  generating characteristic voice data for the perceived digitized voice;
  
  enrolling specific speaker expressions spoken by a specific speaker;
  
  determining whether the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information corresponding to at least one of;
  
  the specific speaker expressions enrolled in said specific speaker expression enrolling step; and
  
  non-specific speaker expressions prestored in memory;
  
  generating identified expression data if it is determined in said determining step that the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information by selecting one member of the group consisting of the specific-speaker and non-specific speaker expressions;
  
  assimilating a contextual meaning based on the identified expression data generated in said identified expression data generating step;
  
  formulating an appropriate response from existing response data based on said assimilated contextual meaning;
  
  enabling a user to create response data based on speaker input; and
  
  synthesizing audio corresponding to the appropriate formulated response.
- View Dependent Claims (9, 10, 11)
- - 9. The speech recognition method of claim 8, whereinsaid specific speaker enrolling step comprises:
    - prompting the speaker for at least one specific speaker expression;
      
      receiving characteristic voice data generated in said characteristic voice data corresponding to the prompted specific speaker expression; and
      
      storing the received characteristic voice data as standard specific speaker patterns capable of recognition; and
      
      wherein said determining step includes determining whether subsequent characteristic voice data generated by said voice analysis unit substantially matches the stored specific speaker patterns.
  - 10. The speech recognition method of claim 8, further comprising:
    - optimizing prestored, non-specific speaker expressions for the specific speaker, comprising;
      
      receiving characteristic voice data generated in said characteristic voice data generating step corresponding to a predefined non-specific speaker expression; and
      
      storing in memory the received characteristic voice data as specific speaker adapted patterns for the predefined speaker expression; and
      
      wherein said determining step includes determining whether subsequent characteristic voice data generated in said characteristic voice data generating step substantially matches the specific speaker adapted patterns stored in memory.
  - 11. The speech recognition method of claim 8, further comprising:
    - prompting the speaker for a specific speaker response;
      
      receiving digitized voice translated from the perceived voice in said translating step;
      
      storing the received digitized voice data within a list of potential responses; and
      
      wherein said appropriate response formulating step includes interrogating the list of potential responses in connection with the assimilated contextual meaning.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Seiko Epson Corporation (Seiko Group)
Original Assignee
Seiko Epson Corporation (Seiko Group)
Inventors
Miyazawa, Yasunaga, Inazumi, Mitsuhiro, Hasegawa, Hiroshi, Edatsune, Isao
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/536,563
Time in Patent Office

1,047 Days
Field of Search

395/2.52, 395/2.53, 395/2.6, 395/2.84, 704/244, 704/251, 704/258, 704/275
US Class Current

704/275
CPC Class Codes

G10L 15/075   supervised, i.e. under mach...

G10L 2015/0638   Interactive procedures

G10L 2015/088   Word spotting

Interactive speech recognition combining speaker-independent and speaker-specific word recognition, and having a response-creation capability

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

93 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive speech recognition combining speaker-independent and speaker-specific word recognition, and having a response-creation capability

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

93 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links