Determining Dialog States for Language Models

US 20170270929A1
Filed: 03/16/2016
Published: 09/21/2017
Est. Priority Date: 03/16/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, at a computing system, audio data that indicates a voice input that was provided to a computing device;

receiving, at the computing system, display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device;

determining, by the computing system and based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input;

identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state;

in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, biasing a language model by adjusting probability scores that the language model indicates for n-grams in the set of n-grams; and

transcribing the voice input using the adjusted language model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Citations

23 Claims

1. A computer-implemented method, comprising:
- receiving, at a computing system, audio data that indicates a voice input that was provided to a computing device;
  
  receiving, at the computing system, display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device;
  
  determining, by the computing system and based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input;
  
  identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state;
  
  in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, biasing a language model by adjusting probability scores that the language model indicates for n-grams in the set of n-grams; and
  
  transcribing the voice input using the adjusted language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 21, 22)
- - 2. The computer-implemented method of claim 1, wherein the plurality of dialog states respectively indicate a plurality of stages of user voice interactions with the computing device that pertain to a particular task.
  - 3. The computer-implemented method of claim 1, further comprising:
    - receiving a second voice input at the computing system;
      
      determining a second particular dialog state, from among the plurality of dialog states, that corresponds to the second voice input; and
      
      identifying a second set of n-grams that are associated with the second particular dialog state that corresponds to the second voice input, wherein the second set of n-grams are different than the set of n-grams that are associated with the particular dialog state that corresponds to the voice input.
  - 4. The computer-implemented method of claim 1, wherein determining the particular dialog state that corresponds to the voice input comprises:
    - identifying a second particular dialog state, from among the plurality of dialog states, that corresponds to a second voice input that was provided to the computing device before the voice input for a same high-level task as the voice input; and
      
      determining the particular dialog state that corresponds to the voice input comprises selecting the particular dialog state from among multiple possible dialog states based on (i) an indication that the second particular dialog state is a preceding dialog state and (ii) the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device.
  - 5. The computer-implemented method of claim 1, wherein determining the particular dialog state that corresponds to the voice input further comprises:
    - generating a transcription of the voice input; and
      
      determining a match between one or more n-grams that occur in the transcription of the voice input and one or more n-grams in the set of n-grams that are associated with the particular dialog state.
  - 6. The computer-implemented method of claim 5, wherein determining the match comprises determining a semantic relationship between the one or more n-grams that occur in the transcription of the voice input and the one or more n-grams in the set of n-grams that are associated with the particular dialog state.
  - 7. The computer-implemented method of claim 1, wherein the display data is first context information associated with the voice input;
    - andthe method further comprises receiving second context information associated with the voice input other than the display data,wherein determining the particular dialog state that corresponds to the voice input comprises selecting the particular dialog state further based on the second context information associated with the voice input.
  - 9. The computer-implemented method of claim 1, further comprising receiving, at the computing system, an application identifier that indicates an application to which the voice input was directed at the computing device,wherein the plurality of dialog states pertain to an application-specific task for the application to which the voice input was directed.
  - 21. The computer-implemented method of claim 1, wherein the computing device is separate from the computing system.
  - 22. The computer-implemented method of claim 1, wherein the display data comprises a hash value that results from processing information about the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device with a hash function.

8. (canceled)

10-17. -17. (canceled)

18. A computing system comprising:
- one or more processors; and
  
  one or more computer-readable media having instructions stored thereon that, when executed, cause performance of operations comprising;
  
  receiving audio data that indicates a voice input that was provided to a computing device;
  
  receiving display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device;
  
  determining, based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input;
  
  identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state;
  
  in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, adjusting a language model by increasing probability scores indicated by the language model of n-grams in the set of n-grams; and
  
  transcribing the voice input using the adjusted language model.
- View Dependent Claims (19, 20)
- - 19. The computing system of claim 18, wherein the plurality of dialog states respectively indicate a plurality of stages of user voice interactions with the computing device that pertain to a particular task.
  - 20. The computing system of claim 18, wherein the operations further comprise:
    - receiving a second voice input;
      
      determining a second particular dialog state, from among the plurality of dialog states, that corresponds to the second voice input; and
      
      identifying a second set of n-grams that are associated with the second particular dialog state that corresponds to the second voice input, wherein the second set of n-grams are different than the set of n-grams that are associated with the particular dialog state that corresponds to the voice input.

23. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform operations comprising:
- receiving audio data that indicates a voice input that was provided to a computing device;
  
  receiving display data that characterizes an interface that was displayed on a screen of the computing device when the voice input was provided to the computing device;
  
  determining, based at least on the display data that characterizes the interface that was displayed on the screen of the computing device when the voice input was provided to the computing device, a particular dialog state, from among a plurality of dialog states, that corresponds to the voice input;
  
  identifying a set of n-grams that are associated with the particular dialog state that corresponds to the voice input, wherein the set of n-grams are associated with the particular dialog state based at least on n-grams in the set of n-grams occurring frequently in historical voice inputs that correspond to the dialog state;
  
  in response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, adjusting a language model by increasing probability scores indicated by the language model of n-grams in the set of n-grams; and
  
  transcribing the voice input using the adjusted language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Moreno Mengibar, Pedro J.

Granted Patent

US 9,978,367 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

G10L 15/065   Adaptation

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Determining Dialog States for Language Models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Determining Dialog States for Language Models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links