Method and system for hybrid decoding for enhanced end-user privacy and low latency

US 10,803,871 B2
Filed: 11/26/2018
Issued: 10/13/2020
Est. Priority Date: 05/26/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of performing automatic speech recognition, the method implemented by a processor executing program instructions stored in memory and comprising:

performing speech recognition using speech recognition result candidates received from a user device via a communications network, the speech recognition result candidates generated by the user device'"'"'s performing speech recognition on one or more frames of audio data, the speech recognition performed by the user device on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models stored on the user device, wherein at least one layer of the two parallel layers is a Statistical Language Model (SLM) based layer; and

transmitting to the user device, via the communications network, results of the speech recognition performed using the speech recognition result candidates.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods described herein provide functionality for automatic speech recognition (ASR). One such embodiment performs speech recognition using received speech recognition result candidates, where the received candidates were generated by performing Statistical Language Model (SLM) based speech recognition on one or more frames of audio data. In turn, such an embodiment transmits results of the speech recognition, performed using the received speech recognition result candidates, to a user device via a communications network. Results of the speech recognition are available with lower latency than pure cloud based ASR solution.

47 Citations

20 Claims

1. A computer-implemented method of performing automatic speech recognition, the method implemented by a processor executing program instructions stored in memory and comprising:
- performing speech recognition using speech recognition result candidates received from a user device via a communications network, the speech recognition result candidates generated by the user device'"'"'s performing speech recognition on one or more frames of audio data, the speech recognition performed by the user device on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models stored on the user device, wherein at least one layer of the two parallel layers is a Statistical Language Model (SLM) based layer; and
  
  transmitting to the user device, via the communications network, results of the speech recognition performed using the speech recognition result candidates.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computer-implemented method of claim 1 wherein the performing speech recognition using speech recognition result candidates further comprises:
    - performing speech recognition using both the speech recognition result candidates and personal data of a user of the user device that is only related to the speech recognition result candidates.
  - 3. The computer-implemented method of claim 1 further comprising:
    - destroying a user'"'"'s personal information related to the speech recognition result candidates upon performing the speech recognition using the speech recognition result candidates.
  - 4. The computer-implemented method of claim 1 wherein the SLM based layer is a first SLM based layer and wherein performing the speech recognition using the speech recognition result candidates employs a second SLM based layer.
  - 5. The computer-implemented method of claim 1 wherein the speech recognition result candidates are compressed.
  - 6. The computer-implemented method of claim 1 wherein the speech recognition result candidates are a compressed pronunciation form of the speech recognition result candidates.
  - 7. The computer-implemented method of claim 1 wherein the speech recognition result candidates are encrypted.
  - 8. The computer-implemented method of claim 1 wherein the speech recognition on the one or more frames of audio data combining two parallel layers is performed on the user device.
  - 9. The computer-implemented method of claim 1 wherein the SLM layer is based on at least one of:
    - a unigram SLM;
      
      a pruned n-gram SLM; and
      
      a SLM implemented using a finite state machine (FSM).
  - 10. The computer-implemented method of claim 1 wherein the speech recognition result candidates include audio features that are feature-adapted, the feature-adapted audio features being speaker independent.
  - 11. The computer-implemented method of claim 1 wherein at least one layer of the two parallel layers is a command-oriented finite state (FST) recognizer layer.
  - 12. The computer-implemented method of claim 1 wherein the SLM based layer is configured to utilize device information local to the user device.

13. A computer system for performing automatic speech recognition, the computer system comprising:
- a processor; and
  
  a memory with computer code instructions stored thereon, the processor and the memory, with the computer code instructions, being configured to cause the system to;
  
  perform speech recognition using speech recognition result candidates received from a user device via a communications network, the speech recognition results candidates generated by the user device'"'"'s performing speech recognition on one or more frames of audio data, the speech recognition performed by the user device on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models stored on the user device, wherein at least one layer of the two parallel layers is a Statistical Language Model (SLM) based layer; and
  
  transmit to the user device, via the communications network, results of the speech recognition performed using the speech recognition result candidates.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13 wherein, in performing the speech recognition using speech recognition result candidates, the processor and the memory, with the computer code instructions, are further configured to cause the system to:
    - perform speech recognition using both the speech recognition result candidates and personal data of a user of the user device that is only related to the speech recognition result candidates.
  - 15. The system of claim 13 wherein the processor and the memory, with the computer code instructions, are further configured to cause the system to:
    - destroy a user'"'"'s personal information related to the speech recognition result candidates upon performing the speech recognition using the speech recognition result candidates.
  - 16. The system of claim 13 wherein the speech recognition on the one or more frames of audio data combining two parallel layers is performed on the user device.
  - 17. The system of claim 13 wherein the speech recognition result candidates include audio features that are feature-adapted, the feature-adapted audio features being speaker independent.
  - 18. The system of claim 13 wherein at least one layer of the two parallel layers is a command-oriented finite state (FST) recognizer layer.
  - 19. The system of claim 13 wherein the SLM based layer is configured to utilize device information local to the user device.

20. A computer program product for performing automatic speech recognition, the computer program product comprising:
- one or more non-transitory computer-readable tangible storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions, when loaded and executed by a processor, cause an apparatus associated with the processor to;
  
  perform speech recognition using speech recognition result candidates received from a user device via a communications network, the speech recognition result candidates generated by the user device'"'"'s performing speech recognition on one or more frames of audio data, the speech recognition performed by the user device on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models stored on the user device, wherein at least one layer of the two parallel layers is a Statistical Language Model (SLM) based layer; and
  
  transmit to the user device, via the communications network, results of the speech recognition performed using the speech recognition result candidates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Quillen, Carl Benjamin, Parihar, Naveen
Primary Examiner(s)
Sirjani, Fariba

Application Number

US16/200,451
Publication Number

US 20190214014A1
Time in Patent Office

687 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/14   using statistical models, e...

G10L 15/18   using natural language mode...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

H04L 63/04   for providing a confidentia...

H04L 63/0428   wherein the data content is...

Method and system for hybrid decoding for enhanced end-user privacy and low latency

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

47 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Method and system for hybrid decoding for enhanced end-user privacy and low latency

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

47 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others