Interactive voice recognition method and apparatus using affirmative/negative content discrimination

US 5,899,972 A
Filed: 09/29/1995
Issued: 05/04/1999
Est. Priority Date: 06/22/1995
Status: Expired due to Term

First Claim

Patent Images

1. An interactive voice recognition apparatus, comprising:

a voice input unit to receive voice and translate the received voice into digital form;

a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice;

a word detection unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generates detected expression data in response thereto;

an affirrnative/negative discrimination unit in communication with said voice analysis unit to characterize whether the characteristic voice data can be characterized as an affirmative or negative response and generates an affirmative/negative signal in response thereto;

a voice comprehension and conversation control unit in communication with said word detection unit and said affirmative/negative discrimination unit to;

interrogate a recognition mode boolean;

receive the detected data generated by said word detection unit, determine a contextual meaning based on the received detected data, and formulate an appropriate response if the recognition mode boolean is clear;

receive the affirmative/negative signal generated by said affirmative/negative discrimination unit and formulate the appropriate response based on the received affirmative/negative signal and prior responses if the recognition mode boolean is set; and

reset the recognition mode boolean based on the formulated appropriate response; and

a voice synthesizer in communication with said voice comprehension and conversation control unit to generate synthesized audio corresponding to the appropriate response formulated by said voice comprehension and conversation control unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for improving voice recognition in low-cost, speech interactive devices. This technique calls for implementing a affirmative/negative discrimination unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, affirmative/negative discrimination will include either spoken vowel analysis or negative language descriptor detection of the perceived message or command. Other facets include keyword identification within the perceived message or command, confidence match level comparison or correlation table compilation in order to increase recognition accuracy of word-based recognition, volume analysis, and inclusion of ambient environment information in generating responses to perceived messages or queries.

384 Citations

23 Claims

1. An interactive voice recognition apparatus, comprising:
- a voice input unit to receive voice and translate the received voice into digital form;
  
  a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice;
  
  a word detection unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generates detected expression data in response thereto;
  
  an affirrnative/negative discrimination unit in communication with said voice analysis unit to characterize whether the characteristic voice data can be characterized as an affirmative or negative response and generates an affirmative/negative signal in response thereto;
  
  a voice comprehension and conversation control unit in communication with said word detection unit and said affirmative/negative discrimination unit to;
  
  interrogate a recognition mode boolean;
  
  receive the detected data generated by said word detection unit, determine a contextual meaning based on the received detected data, and formulate an appropriate response if the recognition mode boolean is clear;
  
  receive the affirmative/negative signal generated by said affirmative/negative discrimination unit and formulate the appropriate response based on the received affirmative/negative signal and prior responses if the recognition mode boolean is set; and
  
  reset the recognition mode boolean based on the formulated appropriate response; and
  
  a voice synthesizer in communication with said voice comprehension and conversation control unit to generate synthesized audio corresponding to the appropriate response formulated by said voice comprehension and conversation control unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The voice recognition apparatus of claim 1, further comprising:
    - a first memory in communication with said word detection unit to store standard characteristic voice information corresponding to the pre-registered expressions; and
      
      wherein said word detection unit;
      
      compares standard characteristic voice information associated with each pre-registered expression obtained from said first memory with the characteristic voice data generated by said voice analysis unit;
      
      generates detection data, comprising a numerical confidence match level, a starting detection time, and an ending detection time relative to the characteristic voice data, for each pre-registered expression; and
      
      transmits the generated detection data to said voice comprehension and conversation control unit.
  - 3. The voice recognition apparatus of claim 2, wherein, for a preselected time period relative to the characteristic voice data, said voice comprehension and conversation control unit:
    - identifies each pre-registered expression whose associated detection data match confidence level exceeds a predetermined minimum threshold as a potential recognition candidate; and
      
      selects an actual recognized candidate from the potential recognition candidate having the highest relative detection data match confidence level if more than one potential recognition candidate has been identified during the preselected time period.
  - 4. The voice recognition apparatus of claim 2, further comprising:
    - a second memory in communication with said voice comprehension and conversation control unit to store a correlation table; and
      
      wherein, for a preselected time period relative to the characteristic voice data, said voice comprehension and conversation control unit;
      
      identifies each pre-registered expression whose associated detection data match confidence level exceeds a predetermined minimum threshold as a potential recognition candidate; and
      
      if more than one potential recognition candidate has been identified during the preselected time period;
      
      compiles a correlation table based on a detection relationship between potential recognition candidates; and
      
      selects an actual recognition candidate based on the compiled correlation table.
  - 5. The voice recognition apparatus of claim 2, further comprising:
    - a second memory in communication with said voice comprehension and conversation control unit to store a plurality of expression context rules; and
      
      wherein said voice comprehension and conversation control unit;
      
      identifies each pre-registered expression whose associated detection data match confidence level exceeds a predetermined minimum threshold as a recognition candidate;
      
      if at least two recognition candidates have been identified, determines a relationship therebetween based on the expression context rules stored in said second memory; and
      
      formulates the appropriate response based on the determined relationship.
  - 6. The voice recognition apparatus of claim 2, further comprising:
    - a second memory in communication with said voice comprehension and conversation control unit to store a plurality of expression context rules; and
      
      wherein said voice comprehension and conversation control unit;
      
      identifies each pre-registered expression whose associated detection data match confidence level exceeds a predetermined minimum threshold as a recognition candidate;
      
      if at least two recognition candidates have been identified, determines whether a relationship therebetween exists based on the expression context rules stored in said second memory; and
      
      formulates an error message if no relationship has been determined.
  - 7. The voice recognition apparatus of claim 1, wherein said affirmative/negative discrimination unit detects the first occurrence of a vowel component in the characteristic voice data generated by said voice analysis unit and generates the affirmative/negative signal according to the detected vowel component.
  - 8. The voice recognition apparatus of claim 1, wherein said affirmative/negative discrimination unit detects the presence of negative language descriptors in the characteristic voice data generated by said voice analysis unit and generates the affirmative/negative signal if any negative language descriptors have been detected.
  - 9. The voice recognition apparatus of claim 1, whereinsaid voice analysis unit generates a volume signal extracted from the digitized perceived voice;
    - andsaid voice comprehension and conversation control unit selectively formulates the appropriate response responsive to the volume signal generated by said voice analysis unit.
  - 10. The voice recognition apparatus of claim 1, wherein said voice comprehension and conversation control unit disables said voice input unit when said voice synthesis unit is generating synthesized audio.
  - 11. The voice recognition apparatus of claim 1, wherein said voice comprehension and conversation control unit sets a dedicated recognition mode for subsequent word detection operations if, and only if, a first re-registered expression having a predefined contextual relationship with a second pre-registered expression is detected by said word detection unit.
  - 12. The voice recognition apparatus of claim 1, further comprising:
    - a fluctuation data detection unit in communication with said voice comprehension and conversation control unit to measure and retain ambient fluctuation data including time, temperature, barometric pressure, date, and apparatus status information; and
      
      wherein said voice comprehension and conversation control unit receives the ambient fluctuation data from said fluctuation data detection unit and formulates the appropriate response based thereon.

13. An interactive voice recognition method, comprising the steps of:
- perceiving voice;
  
  translating the perceived voice into corresponding digital form;
  
  generating characteristic voice data for the perceived digitized voice;
  
  determining whether the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information corresponding to pre-registered expressions;
  
  generating detected expression data if it is determined in said determining step that the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information corresponding to at least one of the pre-registered expressions;
  
  characterizing whether the characteristic voice data generated in said characteristic voice data generating step constitutes either an affirmative or negative statement and generating a content characterization responsive thereto;
  
  assimilating a contextual meaning based on the detected expression data generated in said detected expression data generating step;
  
  based on a recognition mode, performing one of;
  
  formulating an appropriate response based on said assimilated contextual meaning assimilated in said assimilating step if the recognition mode is set for word recognition; and
  
  formulating the appropriate response based on the content characterization generated by said characterizing step if the recognition mode is set for affirmative/negative discrimination;
  
  resetting the recognition mode based on the formulated appropriate response; and
  
  synthesizing audio corresponding to the appropriate formulated response.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The voice recognition method of claim 13, whereinsaid determining step comprises comparing standard characteristic voice information associated with each pre-registered expression with the characteristic voice data generated in said characteristic voice data generating step;
    - andwherein said detected expression data generating steps comprises generating a numerical confidence match level, a starting detection time, and an ending detection time relative to the generated characteristic voice data for each pre-registered expression.
  - 15. The voice recognition method of claim 14, wherein, for a preselected time period relative to the characteristic voice data, the steps of the method further compriseidentifying each pre-registered expression whose associated detected expression data match confidence level exceeding a predetermined minimum threshold as a potential recognition candidate;
    - andselecting an actual recognized candidate from the potential recognition candidate having the highest relative detection data match confidence level if more than one potential recognition candidate has been identified.
  - 16. The voice recognition method of claim 14, wherein, for a preselected time period relative to the characteristic voice data, the steps of the method comprise:
    - identifying each pre-registered expression whose associated detected expression data match confidence level exceeds a predetermined minimum threshold as a potential recognition candidate; and
      
      if more than one potential recognition candidate has been identified;
      
      compiling a correlation table based on a detection relationship between potential recognition candidates; and
      
      selecting an actual recognition candidate based on the compiled correlation table.
  - 17. The voice recognition method of claim 14, further comprising:
    - identifying each pre-registered expression whose associated detected expression data match confidence level exceeds a predetermined minimum threshold as a recognition candidate;
      
      determining a relationship therebetween based on prestored expression context rules if at least two recognition candidates have been identified; and
      
      formulating the appropriate response based on the determined relationship, if any.
  - 18. The voice recognition method of claim 14, further comprising:
    - identifying each pre-registered expression whose associated detected expression data match confidence level exceeds a predetermined minimum threshold as a recognition candidate;
      
      determining whether a relationship therebetween exists based on prestored expression context rules if at least two recognition candidates have been identified; and
      
      formulating an error message if no relationship has been determined.
  - 19. The voice recognition method of claim 13, wherein said characterizing step comprises scanning for the first occurrence of a vowel component in the characteristic voice data generated in said characteristic voice data generating step and generating the content characterization according to the first vowel component.
  - 20. The voice recognition method of claim 13, wherein said characterizing step comprises scanning the characteristic voice data generated in said characteristic voice data generating step for the presence of negative language descriptors and indicating the content characterization as negative if any negative language descriptors have been detected.
  - 21. The voice recognition method of claim 13, further comprising:
    - extracting a volume level of the perceived voice; and
      
      selectively formulating the appropriate response with respect to the extracted volume level.
  - 22. The voice recognition method of claim 13, further comprising setting a dedicated recognition mode for subsequent word detection operations if, and only if, a first re-registered expression having a predefined contextual relationship with a second pre-registered expression is assimilated in said contextual meaning assimilating step.
  - 23. The voice recognition method of claim 13, further comprising selectively augmenting the appropriate response with ambient fluctuation data including time, temperature, barometric pressure, date, and apparatus status information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Seiko Epson Corporation (Seiko Group)
Original Assignee
Seiko Epson Corporation (Seiko Group)
Inventors
Miyazawa, Yasunaga, Inazumi, Mitsuhiro, Hasegawa, Hiroshi, Edatsune, Isao
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/536,550
Time in Patent Office

1,313 Days
Field of Search

395/2.79, 395/2.4, 395/2.44, 395/2.52, 395/2.55, 395/2.64, 704/270, 704/231, 704/235, 704/243, 704/246, 704/255, 704/256, 704/257, 704/258, 704/275, 704/254, 704/250, 704/249, 704/239, 704/240, 704/236
US Class Current

704/249
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/225 Feedback of the input speech

Interactive voice recognition method and apparatus using affirmative/negative content discrimination

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

384 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive voice recognition method and apparatus using affirmative/negative content discrimination

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

384 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links