User dedicated automatic speech recognition

US 10,789,950 B2
Filed: 01/22/2018
Issued: 09/29/2020
Est. Priority Date: 03/16/2012
Status: Active Grant

First Claim

Patent Images

1. A device for automatic speech recognition (ASR) comprising:

a multi-mode voice-controlled user interface employing at least one hardware implemented computer processor, wherein the user interface is adapted to conduct a speech dialog with one or more possible speakers and includes;

a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and

a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary,wherein the user interface is adapted to;

switch from the broad listening mode to the selective listening mode in response to one or more switching cues,in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, andthe user interface is adapted to remain in the selective listening mode so long as a location of the specific speaker is known.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues.

26 Citations

20 Claims

1. A device for automatic speech recognition (ASR) comprising:
- a multi-mode voice-controlled user interface employing at least one hardware implemented computer processor, wherein the user interface is adapted to conduct a speech dialog with one or more possible speakers and includes;
  
  a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and
  
  a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary,wherein the user interface is adapted to;
  
  switch from the broad listening mode to the selective listening mode in response to one or more switching cues,in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, andthe user interface is adapted to remain in the selective listening mode so long as a location of the specific speaker is known.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A device according to claim 1, wherein the switching cues include one or more mode switching words from the speech inputs.
  - 3. A device according to claim 1, wherein the switching cues include one or more dialog states in the speech dialog.
  - 4. A device according to claim 1, wherein the switching cues include one or more visual cues from the possible speakers.
  - 5. A device according to claim 1, wherein the selective listening mode uses acoustic speaker localization for the spatial filtering.
  - 6. A device according to claim 1, wherein the selective listening mode uses image processing for the spatial filtering.
  - 7. A device according to claim 1, wherein the user interface operates in the selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface.
  - 8. A device according to claim 1, wherein the user interface is adapted to operate in both listening modes in parallel, whereby the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode.
  - 9. The device according to claim 1, wherein the user interface is adapted to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.

10. A computer program product encoded in a non-transitory computer-readable medium for operating an automatic speech recognition (ASR) system, the product comprising:
- program code executable to conduct a speech dialog with one or more possible speakers via a multi-mode voice-controlled user interface adapted to;
  
  accept speech inputs from the possible speakers in a broad listening mode without spatial filtering, the broad listening mode having an associated limited broad mode recognition vocabulary; and
  
  limit speech inputs to a specific speaker in a selective listening mode using spatial filtering, the selective listening mode having an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary,wherein the program code is executable to cause the user interface to;
  
  switch from the broad listening mode to the selective listening mode in response to one or more switching cues,in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, andthe program code is executable to cause the user interface to remain in the selective listening mode so long as a location of the specific speaker is known.
- View Dependent Claims (11)
- - 11. The computer program product of claim 10, wherein the program code is executable to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.

12. A method for automatic speech recognition (ASR) comprising:
- employing a multi-mode voice-controlled user interface having a computer processor to conduct a speech dialog with one or more possible speakers by;
  
  employing a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and
  
  employing a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary,the user interface;
  
  switching from the broad listening mode to the selective listening mode in response to one or more switching cues,in the selective listening mode, engaging the specific speaker in a dialog using the selective mode recognition vocabulary, andremaining in the selective listening mode so long as a location of the specific speaker is known.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The method according to claim 12, wherein the switching cues include one or more mode switching words from the speech inputs.
  - 14. The method according to claim 12, wherein the switching cues include one or more dialog states in the speech dialog.
  - 15. The method according to claim 12, wherein the switching cues include one or more visual cues from the possible speakers.
  - 16. The method according to claim 12, wherein the selective listening mode includes using acoustic speaker localization for the spatial filtering.
  - 17. The method according to claim 12, wherein the selective listening mode includes using image processing for the spatial filtering.
  - 18. The method according to claim 12, wherein the user interface operates in selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface.
  - 19. The method according to claim 12, wherein the user interface operates in both listening modes in parallel, such that the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode.
  - 20. The method according to claim 12, including the user interface switching from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Wolff, Tobias, Buck, Markus, Haulick, Tim, Suhadi
Primary Examiner(s)
Kim, Jonathan C

Application Number

US15/876,545
Publication Number

US 20180158461A1
Time in Patent Office

981 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 2015/228   of application context

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/51   for comparison or discrimin...

User dedicated automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

26 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

User dedicated automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links