System and method for performing dual mode speech recognition

US 9,691,390 B2
Filed: 03/30/2016
Issued: 06/27/2017
Est. Priority Date: 11/18/2011
Status: Active Grant

First Claim

Patent Images

1. A method for performing dual mode speech recognition, comprising:

receiving at a device a query from a user;

sending the query to a first recognition system;

sending the query to a second recognition system;

receiving at least a first recognition result from either the first recognition system or the second recognition system;

producing a final result considering the first recognition result; and

setting a latency timer to a timeout value,wherein the first recognition system maintains a first vocabulary and the second recognition system maintains a second vocabulary, and whereby the final result is produced at or before the time that the latency timer reaches the timeout value.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

24 Citations

View as Search Results

15 Claims

1. A method for performing dual mode speech recognition, comprising:
- receiving at a device a query from a user;
  
  sending the query to a first recognition system;
  
  sending the query to a second recognition system;
  
  receiving at least a first recognition result from either the first recognition system or the second recognition system;
  
  producing a final result considering the first recognition result; and
  
  setting a latency timer to a timeout value,wherein the first recognition system maintains a first vocabulary and the second recognition system maintains a second vocabulary, and whereby the final result is produced at or before the time that the latency timer reaches the timeout value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the final result is produced at the time of receiving the first recognition result.
  - 3. The method of claim 1 wherein the final result is produced at the time of receiving a second recognition result, the final result selected from either the first recognition result or the second recognition result.
  - 4. The method of claim 1 further comprising:
    - receiving a first recognition score associated with the first recognition result; and
      
      producing the final result based on the first recognition score.
  - 5. The method of claim 4 further comprising:
    - receiving a second recognition result;
      
      receiving a second recognition score associated with the second recognition result; and
      
      basing the producing of the final result on the greater of the first recognition score and the second recognition score.
  - 6. The method of claim 1, wherein:
    - the first recognition system is a local recognition system local to the device that receives the query from the user;
      
      the second recognition system is a remote recognition system; and
      
      the first recognition system sends the query to the remote recognition system over a communications link.
  - 7. The method of claim 1, further comprising:
    - determining that the second vocabulary contains at least one word that is not contained in the first vocabulary.
  - 8. The method of claim 1, further comprising:
    - the first recognition system receiving vocabulary information from the second recognition system; and
      
      updating the first vocabulary with the received vocabulary information.
  - 9. The method of claim 1, wherein one or more words from the first vocabulary are assigned at least one of:
    - a frequency value that indicates how often the word is used; and
      
      a recency value that indicates when the word was last used.
  - 10. The method of claim 9, further comprising removing a word from the first vocabulary based at least on the frequency value or the recency value.

11. A client for dual mode speech recognition, the client comprising:
- an interface enabled to receive a query from a user;
  
  a communication module enabled to send the query to a server and receive a remote recognition result from a server;
  
  a local recognition module enabled to create a local recognition result from the query;
  
  a latency timer;
  
  a control module enabled to receive a notification from the latency timer and to select between the local recognition result and the remote recognition result; and
  
  a client vocabulary enabled to describe words or phrases available to the local recognition module.
- View Dependent Claims (12, 13, 14)
- - 12. The client of claim 11 further comprising a vocabulary update module enabled to update the client vocabulary.
  - 13. The client of claim 12, wherein one or more words from the client vocabulary are assigned at least one of a frequency value and a recency value, and the client vocabulary update module removes the one or more words from the client vocabulary based on the at least one of a frequency value and a recency value.
  - 14. The client of claim 11, wherein the control module is enabled to:
    - receive a recognition score from the server;
      
      receive a recognition score from the local recognition module; and
      
      choose a recognition result based on the recognition score from the server and the recognition score from the local recognition module.

15. A server for dual mode speech recognition, the server comprising:
- a recognition engine enabled to create a recognition result from audio content;
  
  a communication module enabled to receive a query from a client and send the recognition result to the client; and
  
  a vocabulary download module enabled to respond to requests from the client to send updates to a client vocabulary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Soundhound AI IP LLC
Original Assignee
SoundHound Incorporated
Inventors
Mohajer, Keyvan, Stonehocker, Timothy, Mont-Reynaud, Bernard
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US15/085,944
Publication Number

US 20160217788A1
Time in Patent Office

454 Days
Field of Search

704231-257, 704270-275
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/063   Training

G10L 15/08   Speech classification or se...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/34   Adaptation of a single reco...

G10L 17/06   Decision making techniques;...

G10L 2015/0635   updating or merging of old ...

G10L 2015/081   Search algorithms, e.g. Bau...

System and method for performing dual mode speech recognition

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for performing dual mode speech recognition

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links