Server-side ASR adaptation to speaker, device and noise condition via non-ASR audio transmission

US 9,679,560 B2
Filed: 02/28/2013
Issued: 06/13/2017
Est. Priority Date: 02/28/2013
Status: Active Grant

First Claim

Patent Images

1. A mobile device adapted for automatic speech recognition (ASR) and employing at least one hardware implemented computer processor, the mobile device comprising:

an input microphone for obtaining speech inputs from a user for automatic speech recognition;

an output interface for providing a system output to the user; and

a local controller configured to;

obtain a sample comprising non-ASR audio from the input microphone,provide a representation of the non-ASR audio to a remote ASR server for server-side adaptation to channel-specific ASR characteristics,obtain a sample comprising an unknown ASR speech input from the input microphone,provide a representation of the unknown ASR speech input to the remote ASR server,receive, from the remote ASR server, ASR results corresponding to the unknown ASR speech input, andprovide, based on the ASR results corresponding to the unknown ASR speech input, the system output to the output interface.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR-adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel-specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.

8 Citations

View as Search Results

20 Claims

1. A mobile device adapted for automatic speech recognition (ASR) and employing at least one hardware implemented computer processor, the mobile device comprising:
- an input microphone for obtaining speech inputs from a user for automatic speech recognition;
  
  an output interface for providing a system output to the user; and
  
  a local controller configured to;
  
  obtain a sample comprising non-ASR audio from the input microphone,provide a representation of the non-ASR audio to a remote ASR server for server-side adaptation to channel-specific ASR characteristics,obtain a sample comprising an unknown ASR speech input from the input microphone,provide a representation of the unknown ASR speech input to the remote ASR server,receive, from the remote ASR server, ASR results corresponding to the unknown ASR speech input, andprovide, based on the ASR results corresponding to the unknown ASR speech input, the system output to the output interface.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The mobile device according to claim 1, wherein the non-ASR audio comprises audio sampled by the input microphone during a rolling sample window before the unknown ASR speech input.
  - 3. The mobile device according to claim 1, wherein the non-ASR audio comprises non-ASR speech audio sampled by the input microphone before the unknown ASR speech input.
  - 4. The mobile device according to claim 3, wherein the non-ASR speech audio comprises speech data sampled from data windows having a length of one second or less.
  - 5. The mobile device according to claim 1, wherein the representation of the non-ASR audio comprises pre-processed ASR adaptation data produced by the mobile device from the sample comprising non-ASR audio.
  - 6. The mobile device according to claim 5, wherein the ASR adaptation data comprises at least one of:
    - background noise model data, andASR acoustic model adaptation data.
  - 7. The mobile device according to claim 1, wherein the representation of the non-ASR audio is limited to speech feature data.

8. A method comprising:
- obtaining, by an input microphone on a mobile device, a sample comprising non-automatic speech recognition (ASR) audio;
  
  transmitting, by the mobile device and to a server, a representation of the non-ASR audio for server-side adaptation to channel-specific ASR characteristics;
  
  receiving, by the input microphone on the mobile device, a sample comprising unknown ASR speech input;
  
  transmitting, by the mobile device and to the server, a representation of the unknown ASR speech input;
  
  receiving, from the server, ASR results corresponding to the unknown ASR speech input; and
  
  outputting, by the mobile device, the ASR results.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method according to claim 8, wherein the non-ASR audio comprises audio sampled by the input microphone during a rolling sample window before the unknown ASR speech input.
  - 10. The method according to claim 8, wherein the non-ASR audio comprises non-ASR speech audio sampled by the input microphone before the unknown ASR speech input.
  - 11. The method according to claim 10, wherein the non-ASR speech audio comprises speech data sampled from data windows having a length of one second or less.
  - 12. The method according to claim 8, wherein the representation of the non-ASR audio comprises pre-processed ASR adaptation data produced by the mobile device from the sample comprising non-ASR audio.
  - 13. The method according to claim 12, wherein the ASR adaptation data comprises at least one of:
    - background noise model data, andASR acoustic model adaptation data.
  - 14. The method according to claim 8, wherein the representation of the non-ASR audio is limited to speech feature data.

15. A non-transitory computer-readable medium having computer-executable program instructions stored thereon that, when executed by a processor, cause the processor to:
- obtain, using a microphone, a sample comprising non-automatic speech recognition (ASR) audio;
  
  transmit, to a server, a representation of the non-ASR audio for server-side adaptation to channel-specific ASR characteristics;
  
  obtain, using the microphone, a sample comprising unknown ASR speech input;
  
  transmit, to the server, a representation of the unknown ASR speech input;
  
  receive, from the server, ASR results corresponding to the unknown ASR speech input; and
  
  the ASR results.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable medium according to claim 15, wherein the non-ASR audio comprises audio sampled during a rolling sample window before the unknown ASR speech input.
  - 17. The non-transitory computer-readable medium according to claim 15, wherein the non-ASR audio comprises non-ASR speech audio sampled by the microphone before the unknown ASR speech input.
  - 18. The non-transitory computer-readable medium according to claim 17, wherein the non-ASR speech audio comprises speech data sampled from data windows having a length of one second or less.
  - 19. The non-transitory computer-readable medium according to claim 15, wherein the representation of the non-ASR audio comprises pre-processed ASR adaptation data produced from the sample comprising non-ASR audio.
  - 20. The non-transitory computer-readable medium according to claim 15, wherein the representation of the non-ASR audio is limited to speech feature data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ganong, III, William F., Dahan, Jean-Guy E., Wu, Jianxiong, Willett, Daniel
Primary Examiner(s)
Baker, Charlotte M

Application Number

US14/770,371
Publication Number

US 20160012819A1
Time in Patent Office

1,566 Days
Field of Search

704233, 704235, 704E15043, 704246, 704251, 704255, 704 9, 704275, 704201, 704238, 704239, 704240, 704270, 7042701, 37940605, 3792661, 379 8801
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/07   to the speaker

G10L 15/075   supervised, i.e. under mach...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

G10L 25/84   for discriminating voice fr...

Server-side ASR adaptation to speaker, device and noise condition via non-ASR audio transmission

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Server-side ASR adaptation to speaker, device and noise condition via non-ASR audio transmission

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links