Methods and apparatus for hybrid speech recognition processing

US 10,971,157 B2
Filed: 01/11/2017
Issued: 04/06/2021
Est. Priority Date: 01/11/2017
Status: Active Grant

First Claim

Patent Images

1. A mobile electronic device for use in a hybrid speech processing system comprising the mobile electronic device and a network-connected server remotely located from the mobile electronic device, the mobile electronic device comprising:

an input interface configured to receive input audio comprising speech;

an embedded speech recognizer configured to perform speech recognition in a first language and process at least a portion of the input audio to produce first recognized text;

a controller configured to determine whether to send information to the server for speech processing, wherein the information includes the at least a portion of the input audio and/or at least a portion of the first recognized text, wherein;

the determination of whether to send the information to the server for speech processing is based, at least in part, on a semantic category associated with the first recognized text or on an analysis of the first recognized text, the analysis of the first recognized text comprising;

detecting at least one language that the speech may include based at least in part on the analysis of the first recognized text,determining whether the detected at least one language includes a second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, andin response to determining that the detected at least one language includes the second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, determining to send at least a portion of the speech to the server; and

a network interface configured to send the information to the server in response to determining that the information should be sent to the server.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for selectively performing speech processing in a hybrid speech processing system. The hybrid speech processing system includes at least one mobile electronic device and a network-connected server remotely located from the at least one mobile electronic device. The mobile electronic device is configured to use an embedded speech recognizer to process at least a portion of input audio to produce recognized text. A controller on the mobile electronic device determines whether to send information from the mobile electronic device to the server for speech processing. The determination of whether to send the information is based, at least in part, on an analysis of the input audio, the recognized text, or a semantic category associated with the recognized text.

194 Citations

14 Claims

1. A mobile electronic device for use in a hybrid speech processing system comprising the mobile electronic device and a network-connected server remotely located from the mobile electronic device, the mobile electronic device comprising:
- an input interface configured to receive input audio comprising speech;
  
  an embedded speech recognizer configured to perform speech recognition in a first language and process at least a portion of the input audio to produce first recognized text;
  
  a controller configured to determine whether to send information to the server for speech processing, wherein the information includes the at least a portion of the input audio and/or at least a portion of the first recognized text, wherein;
  
  the determination of whether to send the information to the server for speech processing is based, at least in part, on a semantic category associated with the first recognized text or on an analysis of the first recognized text, the analysis of the first recognized text comprising;
  
  detecting at least one language that the speech may include based at least in part on the analysis of the first recognized text,determining whether the detected at least one language includes a second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, andin response to determining that the detected at least one language includes the second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, determining to send at least a portion of the speech to the server; and
  
  a network interface configured to send the information to the server in response to determining that the information should be sent to the server.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The mobile electronic device of claim 1, wherein the embedded speech recognizer is configured to perform speech recognition using at least one model that includes information about the first language and the second language;
    - andwherein determining that the detected at least one language includes the second language different from the first language comprises determining that the detected at least one language includes the second language based, at least in part, on an identification by the embedded speech recognizer that the speech may include at least one word in the second language.
  - 3. The mobile electronic device of claim 1, wherein the controller is further configured to apply a language identification process to at least a portion of the speech to identify the second language in the speech;
    - andwherein determining that the at least one language includes the second language different from the first language comprises detecting a mismatch between the second language identified by the language identification process and the first language for which the embedded speech recognizer is configured to perform speech recognition.
  - 4. The mobile electronic device of claim 3, wherein the embedded speech recognizer is further configured to output a confidence score associated with a result of the speech recognition indicated by the first recognized text;
    - andwherein the controller is configured to apply the language identification process to the at least a portion of the speech only when the confidence score is below a threshold value.
  - 5. The mobile electronic device of claim 1, wherein the information sent to the server includes the at least a portion of the input audio and an indication that the speech includes at least one word in the second language different from the first language.
  - 6. The mobile electronic device of claim 1, wherein the controller is further configured to:
    - receive second recognized text from the server, wherein the second recognized text corresponds to the at least a portion of the speech sent to the server;
      
      combine the at least a portion of the first recognized text corresponding to the first language and the second recognized text corresponding to the second language different from the first language into a multi-language speech recognition result; and
      
      output the multi-language speech recognition result.
  - 7. The mobile electronic device of claim 1, further comprising:
    - an embedded natural language understanding (NLU) engine configured to process the at least a portion of the first recognized text to determine a semantic category associated with the first recognized text;
      
      wherein determining whether to send information to the server for speech processing comprises determining whether to send information to the server for speech processing based, at least in part, on the determined semantic category.
  - 8. The mobile electronic device of claim 7, further comprising:
    - at least one storage device configured to store privacy settings for transmitting information to the server;
      
      wherein the controller is further programmed to present a user interface on the mobile electronic device, wherein the user interface enables a user of the mobile electronic device to specify at least some of the privacy settings stored on the at least one storage device; and
      
      wherein determining whether to send information to the server for speech processing further comprises determining whether to send information to the server for speech processing based, at least in part, on the stored privacy settings.
  - 9. The mobile electronic device of claim 8, wherein the user interface enables the user to specify general privacy settings that are not category specific, and wherein the stored privacy settings are determined based, at least in part on the general privacy settings specified by the user.
  - 10. The mobile electronic device of claim 8, wherein the user interface enables the user to specify privacy settings for each of a plurality of semantic categories.
  - 11. The mobile electronic device of claim 1, wherein the information sent to the server includes the at least a portion of the first recognized text and does not include any of the input audio.

12. A method for use in a hybrid speech processing system comprising a mobile electronic device and a network-connected server remotely located from the mobile electronic device, the method comprising:
- processing, by an embedded speech recognizer on the mobile electronic device and configured to perform speech recognition in a first language, at least a portion of input audio to produce recognized text;
  
  determining, by a controller, whether to send information from the mobile electronic device to the server for speech processing, wherein the information includes the at least a portion of the input audio and/or at least a portion of the recognized text, wherein;
  
  the determination of whether to send the information to the server for speech processing is based, at least in part, on a semantic category associated with the recognized text or on an analysis of the recognized text, the analysis of the recognized text comprising;
  
  detecting at least one language that the speech may include based at least in part on the analysis of the recognized text,determining whether the detected at least one language includes a second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, andin response to determining that the detected at least one language includes the second language different from the first language for which the embedded speech recognizer is configured to perform speech recognition, determining to send at least a portion of the speech to the server; and
  
  sending the information from the mobile electronic device to the server in response to determining that the information should be sent to the server.
- View Dependent Claims (13, 14)
- - 13. The method of claim 12, wherein performing speech recognition in the first language comprises using at least one model that includes information about the first language and the second language;
    - andwherein determining that the detected at least one language includes the second language different from the first language comprises determining that the detected at least one language includes the second language based, at least in part, on an identification by the embedded speech recognizer that the speech may include at least one word in the second language.
  - 14. The method of claim 12, further comprising:
    - processing, by an embedded natural language understanding (NLU) engine, the at least a portion of the recognized text to determine a semantic category associated with the recognized text;
      
      wherein determining whether to send information to the server for speech processing comprises determining whether to send information to the server for speech processing based, at least in part, on the determined semantic category.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Willett, Daniel, Pinto, Joel, Ganong, III, William F.
Primary Examiner(s)
Sharma, Neeraj

Application Number

US15/403,762
Publication Number

US 20180197545A1
Time in Patent Office

1,546 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/30   Distributed recognition, e....

Methods and apparatus for hybrid speech recognition processing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

194 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for hybrid speech recognition processing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

194 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links