SELECTIVE SPEECH RECOGNITION FOR CHAT AND DIGITAL PERSONAL ASSISTANT SYSTEMS

US 20160027440A1
Filed: 03/15/2013
Published: 01/28/2016
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition in a chat information system (CIS), the method comprising:

receiving, by a processor operatively coupled to a memory, an audio input;

recognizing, by a first speech recognizer of a plurality of speech recognizers, a first part of the audio input to generate a first recognized input;

identifying, by the processor, at least one trigger in the first recognized input;

based on the identification, selecting, by the processor, a second speech recognizer of the plurality of speech recognizers; and

recognizing, by the second speech recognizer, a second part of the audio input to generate a second recognized input.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are computer-implemented methods and systems for dynamic selection of speech recognition systems for the use in Chat Information Systems (CIS) based on multiple criteria and context of human-machine interaction. Specifically, once a first user audio input is received, it is analyzed so as to locate specific triggers, determine the context of the interaction or predict the subsequent user audio inputs. Based on at least one of these criteria, one of a free-diction recognizer, pattern-based recognizer, address book based recognizer or dynamically created recognizer is selected for recognizing the subsequent user audio input. The methods described herein increase the accuracy of automatic recognition of user voice commands, thereby enhancing overall user experience of using CIS, chat agents and similar digital personal assistant systems.

45 Citations

View as Search Results

32 Claims

1. A method for speech recognition in a chat information system (CIS), the method comprising:
- receiving, by a processor operatively coupled to a memory, an audio input;
  
  recognizing, by a first speech recognizer of a plurality of speech recognizers, a first part of the audio input to generate a first recognized input;
  
  identifying, by the processor, at least one trigger in the first recognized input;
  
  based on the identification, selecting, by the processor, a second speech recognizer of the plurality of speech recognizers; and
  
  recognizing, by the second speech recognizer, a second part of the audio input to generate a second recognized input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, further comprising separating, by the processor, the audio input into a plurality of parts having at least the first part of the audio input and the second part of the audio input.
  - 3. The method of claim 2, wherein the separating of the audio input comprises recognizing, by one of the plurality of speech recognizers, at least a beginning part of the audio input to generate a recognized input.
  - 4. The method of claim 3, further comprising selecting, by the processor, the first speech recognizer based at least in part on the recognized input.
  - 5. The method of claim 1, wherein the at least one trigger includes a type of the audio input identified based at least in part on the first recognized input.
  - 6. The method of claim 5, wherein the type of the audio input includes a free speech input or a pattern-based speech input.
  - 7. The method of claim 6, wherein the pattern-based speech input includes at least one of the following:
    - a name, a nickname, a title, an address, and a number.
  - 8. The method of claim 1, wherein the first speech recognizer or the second speech recognizer includes a pattern-based speech recognizer.
  - 9. The method of claim 1, wherein the first speech recognizer or the second speech recognizer includes a free-dictation recognizer.
  - 10. The method of claim 1, wherein the first speech recognizer or the second speech recognizer includes an address book based recognizer.
  - 11. The method of claim 1, wherein the first speech recognizer or the second speech recognizer includes a dynamically created recognizer.
  - 12. The method of claim 1, further comprising combining, by the processor, the first recognized input and the second recognized input.
  - 13. The method of claim 1, further comprising generating, by the CIS, a response based at least in part on the first recognized input or the second recognized input.

14. A method for speech recognition in a CIS, the method comprising:
- receiving, by a processor operatively coupled with a memory, a first audio input;
  
  recognizing, by a first speech recognizer of a plurality of speech recognizers, at least a part of the first audio input to generate a first recognized input;
  
  receiving, by the processor, a second audio input;
  
  identifying, by the processor, at least one trigger in the first recognized input;
  
  based on the identification, selecting, by the processor, a second speech recognizer of the plurality of speech recognizers; and
  
  recognizing, by the second speech recognizer, at least a part of the second audio input to generate a second recognized input.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The method of claim 14, wherein the at least one trigger includes a type of the first audio input, wherein the type of the audio input includes a free speech input or a pattern-based speech input.
  - 16. The method of claim 14, wherein the at least one trigger includes a predetermined word or phrase.
  - 17. The method of claim 14, wherein the at least one trigger includes a predetermined word pattern.
  - 18. The method of claim 14, wherein the at least one trigger includes an indication of a type of the second audio input.
  - 19. The method of claim 18, further comprising predicting, by the processor, the type of the second audio input based at least in part on one or more outputs generated by the CIS.
  - 20. The method of claim 18, wherein the prediction is based at least in part on a chat context between a user and the CIS.
  - 21. The method of claim 18, further comprising dynamically generating, by the processor, a pattern-based speech recognizer based at least on part on the prediction.
  - 22. The method of claim 14, wherein the first speech recognizer or the second speech recognizer includes one of the following:
    - a pattern-based speech recognizer, a free-dictation recognizer, an address book based recognizer, and a dynamically created recognizer.
  - 23. The method of claim 14, further comprising combining, by the processor, the first recognized input and the second recognized input.
  - 24. The method of claim 14, further comprising generating, by the processor, a response of the CIS based at least in part on the first recognized input or the second recognized input.

25. A method for speech recognition in a CIS, the method comprising:
- receiving, by a processor operatively coupled with a memory, a first audio input;
  
  recognizing, by a first speech recognizer of a plurality of speech recognizers, at least a part of the first audio input to generate a first recognized input;
  
  providing, by the processor, a response to the first recognized input utilizing the CIS;
  
  determining, by the processor, a type of the response;
  
  receiving, by the processor, a second audio input;
  
  based on the determination, selecting, by the processor, a second speech recognizer of the plurality of speech recognizers; and
  
  recognizing, by the second speech recognizer, at least a part of the second audio input to generate a second recognized input.
- View Dependent Claims (26, 27, 28, 29, 30, 31)
- - 26. The method of claim 25, wherein the selecting of the second speech recognizer includes selecting, by the processor, a free-dictation recognizer, when the type of response defines that the second audio input includes a free speech of a user.
  - 27. The method of claim 25, wherein the selecting of the second speech recognizer includes selecting, by the processor, a pattern-based recognizer, when the type of response defines that the second audio input includes a pattern-based speech of a user.
  - 28. The method of claim 25, wherein the selecting of the second speech recognizer includes selecting, by the processor, an address book based recognizer, when the type of response defines that the second audio input includes a name or nickname from a digital address book.
  - 29. The method of claim 25, wherein the selecting of the second speech recognizer includes selecting, by the processor, a dynamically created recognizer, when the type of response defines that the second audio input includes an item from a list storing items of the same type.
  - 30. The method of claim 25, wherein the response generated by the CIS.
  - 31. The method of claim 25, further comprising generating, by the processor, a second response utilizing the CIS based at least in part on the second recognized input.

32. A system for speech recognition, the system comprising:
- a communication module configure to receive one or more audio inputs;
  
  two or more speech recognizers configured to generate recognized inputs; and
  
  a decision making logic configured to identify at least one trigger in one of the recognized inputs and, based on the at least one trigger, select one of the two or more speech recognizers for performing speech recognition of at least a part of the one or more audio inputs;
  
  wherein the at least one trigger includes a type of the one or more audio inputs or prediction regarding a type of the one or more audio inputs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Ooo "Speaktoit"
Inventors
Goncharuk, Artem, Platonov, Ilya Andreevich, Gelfenbeyn, Olga Aleksandrovna, Gelfenbeyn, Ilya Genadevich, Sirotin, Pavel Aleksandrovich

Granted Patent

US 9,875,741 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

SELECTIVE SPEECH RECOGNITION FOR CHAT AND DIGITAL PERSONAL ASSISTANT SYSTEMS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

SELECTIVE SPEECH RECOGNITION FOR CHAT AND DIGITAL PERSONAL ASSISTANT SYSTEMS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links