Determining dialog states for language models

US 10,553,214 B2
Filed: 05/18/2018
Issued: 02/04/2020
Est. Priority Date: 03/16/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device;

determining an initial prediction for the unknown stage of the multi-stage voice dialog;

providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog;

receiving, by the computing device and from the voice dialog system, a transcription of the voice input,wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog,wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, andwherein the additional information that describes the context of the voice input is independent of content of the voice input; and

presenting the transcription of the voice input with the computing device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

189 Citations

18 Claims

1. A computer-implemented method, comprising:
- receiving, by a computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device;
  
  determining an initial prediction for the unknown stage of the multi-stage voice dialog;
  
  providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog;
  
  receiving, by the computing device and from the voice dialog system, a transcription of the voice input,wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog,wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, andwherein the additional information that describes the context of the voice input is independent of content of the voice input; and
  
  presenting the transcription of the voice input with the computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising providing, by the computing device and to the voice dialog system in conjunction with the audio data for the voice input and the indication of the initial prediction, the additional information that describes the context of the voice input to the computing device.
  - 3. The computer-implemented method of claim 1, wherein, in generating the refined prediction for the unknown stage of the multi-stage voice dialog, the voice dialog system is configured to adjust a weight of the initial prediction that was determined by the computing device based on a confidence value assigned to the initial prediction.
  - 4. The computer-implemented method of claim 1, wherein the additional information that describes the context of the voice input comprises a screen signature that characterizes a state of a display of the computing device at a time when the voice input is received by the computing device.
  - 5. The computer-implemented method of claim 1, wherein the initial prediction for the unknown stage of the multi-stage voice dialog represents an estimate of which stage from the multi-stage voice dialog that the voice input is directed to.
  - 6. The computer-implemented method of claim 1, wherein:
    - the model comprises a language model; and
      
      the transcription was generated by biasing the language model according to the parameters that correspond to the refined prediction for the unknown stage of the multi-stage voice dialog.
  - 7. The computer-implemented method of claim 6, wherein:
    - biasing the language model according to the parameters that correspond to the refined prediction comprises boosting probability scores in the language model for n-grams that are determined to frequently occur in voice inputs corresponding to a first stage of the multi-stage voice dialog that is identified by the refined prediction.
  - 8. The computer-implemented method of claim 1, wherein presenting the transcription of the voice input with the computing device comprises filling in a text field presented by the computing device with the transcription of the voice input.
  - 9. The computer-implemented method of claim 1, wherein the voice dialog system is configured to determine, based on the additional information that describes the context of the voice input, whether to select as the refined prediction for the unknown stage of the multi-stage voice dialog (i) the initial prediction for the unknown stage or (ii) a second prediction that is different from the initial prediction for the unknown stage.

10. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:
- receiving, by the computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device;
  
  determining an initial prediction for the unknown stage of the multi-stage voice dialog;
  
  providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog;
  
  receiving, by the computing device and from the voice dialog system, a transcription of the voice input,wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog,wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, andwherein the additional information that describes the context of the voice input is independent of content of the voice input; and
  
  presenting the transcription of the voice input with the computing device.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-readable media of claim 10, wherein the operations further comprise providing, by the computing device and to the voice dialog system in conjunction with the audio data for the voice input and the indication of the initial prediction, the additional information that describes the context of the voice input to the computing device.
  - 12. The computer-readable media of claim 10, wherein, in generating the refined prediction for the unknown stage of the multi-stage voice dialog, the voice dialog system is configured to adjust a weight of the initial prediction that was determined by the computing device based on a confidence value assigned to the initial prediction.
  - 13. The computer-readable media of claim 10, wherein the additional information that describes the context of the voice input comprises a screen signature that characterizes a state of a display of the computing device at a time when the voice input is received by the computing device.
  - 14. The computer-readable media of claim 10, wherein the prediction for the unknown stage of the multi-stage voice dialog represents an estimate of which stage from the multi-stage voice dialog that the voice input is directed to.
  - 15. The computer-readable media of claim 10, wherein the voice dialog system is configured to determine, based on the additional information that describes the context of the voice input, whether to select as the refined prediction for the unknown stage of the multi-stage voice dialog (i) the initial prediction for the unknown stage or (ii) a second prediction that is different from the initial prediction for the unknown stage.

16. A computing device comprising:
- one or more processors; and
  
  one or more computer-readable media having instructions stored thereon that, when executed by the one or more processors, cause performance of operations comprising;
  
  receiving, by the computing device, audio data for a voice input to the computing device, wherein the voice input corresponds to an unknown stage of a multi-stage voice dialog between the computing device and a user of the computing device;
  
  determining an initial prediction for the unknown stage of the multi-stage voice dialog;
  
  providing, by the computing device and to a voice dialog system, (i) the audio data for the voice input to the computing device and (ii) an indication of the initial prediction for the unknown stage of the multi-stage voice dialog;
  
  receiving, by the computing device and from the voice dialog system, a transcription of the voice input,wherein the transcription was generated by processing the audio data with a model that was biased according to parameters that correspond to a refined prediction for the unknown stage of the multi-stage voice dialog,wherein the voice dialog system is configured to determine the refined prediction for the unknown stage of the multi-stage voice dialog based on (i) the initial prediction for the unknown stage of the multi-stage voice dialog and (ii) additional information that describes a context of the voice input, andwherein the additional information that describes the context of the voice input is independent of content of the voice input; and
  
  presenting the transcription of the voice input with the computing device.
- View Dependent Claims (17, 18)
- - 17. The computing device of claim 16, wherein the voice dialog system is configured to determine, based on the additional information that describes the context of the voice input, whether to select as the refined prediction for the unknown stage of the multi-stage voice dialog (i) the initial prediction for the unknown stage or (ii) a second prediction that is different from the initial prediction for the unknown stage.
  - 18. The computing device of claim 16, wherein the computing device determines the initial prediction, and the voice dialog system is physically distinct from the computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Inc. (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Aleksic, Petar, Moreno Mengibar, Pedro J.
Primary Examiner(s)
Godbold, Douglas

Application Number

US15/983,768
Publication Number

US 20180336895A1
Time in Patent Office

627 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

G10L 15/065   Adaptation

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Determining dialog states for language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

189 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Determining dialog states for language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

189 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links