PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS

US 20150287401A1
Filed: 06/22/2015
Published: 10/08/2015
Est. Priority Date: 11/05/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of speech recognition processing, the computer-implemented method comprising:

receiving a spoken utterance;

storing audio data from the spoken utterance at a first-party device;

creating adaptation data from the audio data via processing at the first-party device, the adaptation data being in a format that prevents reconstruction of the audio data; and

transmitting the adaptation data to a third-party server.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.

188 Citations

20 Claims

1. A computer-implemented method of speech recognition processing, the computer-implemented method comprising:
- receiving a spoken utterance;
  
  storing audio data from the spoken utterance at a first-party device;
  
  creating adaptation data from the audio data via processing at the first-party device, the adaptation data being in a format that prevents reconstruction of the audio data; and
  
  transmitting the adaptation data to a third-party server.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The computer-implemented method of claim 1, wherein creating the adaptation data occurs after collecting a predetermined amount of audio data.
  - 3. The computer-implemented method of claim 1, wherein creating the adaptation data includes deriving statistical data from the audio data.
  - 4. The computer-implemented method of claim 3, wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices.
  - 5. The computer-implemented method of claim 1, wherein creating the adaptation data includes creating updated acoustic model data.
  - 6. The computer-implemented method of claim 5, wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model.
  - 7. The computer-implemented method of claim 5, wherein the updated acoustic model data is a version of an acoustic model used at the third-party server.
  - 8. The computer-implemented method of claim 1 wherein creating the adaptation data from the audio data includes processing a subset of the audio data and discarding a remaining portion of the audio data.
  - 9. The computer-implemented method of claim 1, wherein storing audio data at the first-party device includes storing audio data at a first-party computer that is in network communication with a client device that received the spoken utterance.
  - 10. The computer-implemented method of claim 1, wherein storing audio data at the first-party device includes storing audio data at a mobile client device that received the spoken utterance.
  - 11. The computer-implemented method of claim 1, wherein storing audio data from the spoken utterance includes storing audio waveform files and corresponding transcriptions;
    - andwherein the adaptation data being in a format that prevents reconstruction of the audio data includes the adaptation data being in a format that prevents reconstruction of the corresponding transcriptions by human or machine, and the adaptation data being in a format that prevents reconstruction of the corresponding waveform files by human or machine.
  - 12. The computer-implemented method of claim 1, wherein the adaptation data being in a format that prevents reconstruction of the audio data includes the adaptation data being in a format that is not readable by human or machine.
  - 13. The computer-implemented method of claim 1, wherein receiving the spoken utterance includes receiving a voice command or voice query at a mobile client device.
  - 14. The computer-implemented method of claim 1, wherein transmitting the adaptation data includes sending a compressed version of the adaptation data to the third-party server.
  - 15. The computer-implemented method of claim 1, wherein transmitting the adaptation data includes sending an encrypted version of the adaptation data to the third-party server.

16. A system for speech processing, the system comprising:
- a processor; and
  
  a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of;
  
  receiving a spoken utterance;
  
  storing audio data from the spoken utterance at a first-party device;
  
  creating adaptation data from the audio data via processing at the first-party device, the adaptation data being in a format that prevents reconstruction of the audio data; and
  
  transmitting the adaptation data to a third-party server.
- View Dependent Claims (17, 18, 19)
- - 17. The system of claim 16, wherein creating the adaptation data occurs after collecting a predetermined amount of audio data.
  - 18. The system of claim 16, wherein creating the adaptation data includes deriving statistical data from the audio data, and wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices.
  - 19. The system of claim 16, wherein creating the adaptation data includes creating updated acoustic model data, and wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model.

20. A computer program product including a non-transitory computer-storage medium having instructions stored thereon for processing data information, such that the instructions, when carried out by a processing device, cause the processing device to perform the operations of:
- receiving a spoken utterance;
  
  storing audio data from the spoken utterance at a first-party device;
  
  creating adaptation data from the audio data via processing at the first-party device, the adaptation data being in a format that prevents reconstruction of the audio data; and
  
  transmitting the adaptation data to a third-party server.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Lee, Antonio R., Novak, Petr, Olsen, Peder Andreas, Goel, Vaibhava

Granted Patent

US 9,424,836 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 21/6245   Protecting personal data, e...

G06F 21/78   to assure secure storage of...

G10L 15/04   Segmentation; Word boundary...

G10L 15/065   Adaptation

H04L 63/0407   wherein the identity of one...

PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

188 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

188 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others