System and method for performing dual mode speech recognition

US 9,330,669 B2
Filed: 02/12/2015
Issued: 05/03/2016
Est. Priority Date: 11/18/2011
Status: Active Grant

First Claim

Patent Images

1. A method for performing dual mode speech recognition, comprising:

receiving a spoken query from a user;

processing the spoken query, including;

sending the spoken query to a local recognition system on a mobile device;

transmitting the spoken query to a remote recognition system via a communications link; and

setting a latency timer period to a preset timeout value;

in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choosing recognition failure as a final result;

in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system and choosing the recognition result associated with the higher recognition score as the final result;

in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result from the local recognition system, and choosing the local recognition result as the final result;

in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result from the remote recognition system, and choosing the remote recognition result as the final result;

taking action on behalf of the user based on the final result.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

18 Citations

View as Search Results

17 Claims

1. A method for performing dual mode speech recognition, comprising:
- receiving a spoken query from a user;
  
  processing the spoken query, including;
  
  sending the spoken query to a local recognition system on a mobile device;
  
  transmitting the spoken query to a remote recognition system via a communications link; and
  
  setting a latency timer period to a preset timeout value;
  
  in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choosing recognition failure as a final result;
  
  in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system and choosing the recognition result associated with the higher recognition score as the final result;
  
  in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result from the local recognition system, and choosing the local recognition result as the final result;
  
  in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result from the remote recognition system, and choosing the remote recognition result as the final result;
  
  taking action on behalf of the user based on the final result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein:
    - the local recognition system maintains a client vocabulary programmed to describe words or phrases available to be recognized.
  - 3. The method of claim 2, further comprising:
    - determining that the remote recognition system contains vocabulary information that is not contained in the client vocabulary of the local recognition system.
  - 4. The method of claim 2, further comprising:
    - receiving vocabulary information updates from the remote recognition system; and
      
      updating the client vocabulary of the local recognition system with the received vocabulary information.
  - 5. The method of claim 2, wherein one or more words from the client vocabulary are assigned at least one of:
    - a frequency value that indicates how often the word is used; and
      
      a recency value that indicates when the word was last used.
  - 6. The method of claim 5, further comprising removing a word from the client vocabulary based at least on a frequency value or a recency value.
  - 7. The method of claim 1, further comprising:
    - receiving from the local recognition system and the remote recognition system a transcription that is an estimate for the text of what the spoken query said.
  - 8. The method of claim 7, wherein the recognition score received from the local recognition system and remote recognition system measures the confidence in the accuracy of the respective transcription.

9. A system for dual mode speech recognition, comprising:
- a local recognition system housed in a mobile device, including;
  
  a communication module programmed to communicate with a user and other devices and for receiving a spoken query;
  
  a recognition module programmed to recognize and transcribe audio content;
  
  a control module; and
  
  a client vocabulary programmed to describe words or phrases available to the recognition module;
  
  a remote recognition system housed in a server, including;
  
  a recognition engine programmed to recognize and transcribe audio content;
  
  a vocabulary download module programmed to provide updates to the vocabulary update module;
  
  a latency timer;
  
  wherein the control module of the local recognition system is programmed to;
  
  set a latency timer period to a preset timeout value; and
  
  in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the control module of the local recognition system is further programmed to send the spoken query to the recognition module of the local recognition system and the remote recognition system.
  - 11. The system of claim 9, wherein the control module of the local recognition system is programmed to:
    - in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and
      
      choosing the local recognition result as the final result; and
      
      in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and
      
      choosing the remote recognition result as the final result.
  - 12. The system of claim 9, further comprising a vocabulary updater module that is programmed to remove one or more words from the client vocabulary.
  - 13. The system of claim 12, wherein one or more words from the client vocabulary are assigned a priority value that indicates the word'"'"'s importance, and the vocabulary updater module is further programmed to remove a word selected from the client vocabulary based at least on the priority assigned the selected word.
  - 14. The system of claim 12, wherein one or more words from the client vocabulary are assigned a frequency value that indicates how often the word is used and a recency value that indicates when the word was last used, and the client vocabulary updater module is further programmed to remove a word selected from the client vocabulary based at least on the frequency value or recency value associated with the selected word.
  - 15. The system of claim 9, wherein the recognition module of the local recognition system and the recognition module of the remote recognition system are each programmed to output:
    - a transcription that is an estimate for the text of what the spoken query said; and
      
      a score associated with the respective transcription that measures confidence in the accuracy of the associated transcription.

16. A system for dual mode speech recognition, comprising:
- a latency timer;
  
  a local recognition system housed in a mobile device, including;
  
  a communication module programmed to communicate with a user and other devices and to receive a spoken query;
  
  a recognition module programmed to recognize and transcribe audio content;
  
  a control module;
  
  a client vocabulary programmed to describe words or phrases available to the recognition module; and
  
  a remote recognition system housed in a server, including;
  
  a recognition engine programmed to recognize and transcribe audio content;
  
  a vocabulary download module programmed to provide updates to the vocabulary update module;
  
  wherein the control module of the local recognition system is programmed to;
  
  set a latency timer period to a predefined value;
  
  in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score;
  
  in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and
  
  choosing the local recognition result as the final result; and
  
  in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and
  
  choosing the remote recognition result as the final result.

17. A method for performing dual mode speech recognition, comprising:
- receiving a spoken query from a user;
  
  processing the spoken query, including;
  
  sending the spoken query to a local recognition system on a mobile device;
  
  transmitting the spoken query to a remote recognition system via a communications link; and
  
  setting a latency timer period to a preset timeout value;
  
  in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score; and
  
  in the event that the spoken query is recognized by the remote recognition system within the latency timer period, upon determining that the remote recognition result contains vocabulary information not contained within a client vocabulary maintained within the local recognition system, requesting that the remote recognition system update the client vocabulary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Soundhound AI IP LLC
Original Assignee
SoundHound Incorporated
Inventors
Stonehocker, Timothy P., Mohajer, Keyvan, Mont-Reynaud, Bernard
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US14/621,024
Publication Number

US 20150154959A1
Time in Patent Office

446 Days
Field of Search

704231-257, 704270-275
US Class Current

1/1
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/34   Adaptation of a single reco...

G10L 17/06   Decision making techniques;...

G10L 2015/0635   updating or merging of old ...

G10L 2015/081   Search algorithms, e.g. Bau...

System and method for performing dual mode speech recognition

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for performing dual mode speech recognition

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links