Determining dialog states for language models

US 9,978,367 B2
Filed: 03/16/2016
Issued: 05/22/2018
Est. Priority Date: 03/16/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device;

determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to;

(i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and(ii) a set of n-grams;

receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device;

selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;

biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and

transcribing the voice input using the biased language model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Citations

12 Claims

1. A computer-implemented method, comprising:
- receiving, at a computing system, audio data that indicates a first voice input that was provided to a computing device;
  
  determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to;
  
  (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and(ii) a set of n-grams;
  
  receiving, at the computing system, first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device;
  
  selecting, by the computing system, a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;
  
  biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and
  
  transcribing the voice input using the biased language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising:
    - receiving a second voice input at the computing system;
      
      selecting a second particular dialog state, from among the plurality of pre-defined dialog states, that corresponds to the second voice input; and
      
      biasing the language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the second particular dialog state, wherein the set of n-grams that are mapped to the second particular dialog state are different than the set of n-grams that are mapped to the particular dialog state that corresponds to the first voice input.
  - 3. The computer-implemented method of claim 1, wherein selecting the particular dialog state that corresponds to the first voice input comprises:
    - identifying a second particular dialog state, from among the plurality of dialog states, that corresponds to a second voice input that was provided to the computing device before the first voice input for the same particular task as the first voice input; and
      
      selecting the particular dialog state that corresponds to the first voice input based on (i) an indication that the second particular dialog state is a preceding dialog state in an order of dialog states for the plurality of pre-defined dialog states and (ii) the first display data that characterizes the content that was displayed on the screen of the computing device when the first voice input was provided to the computing device.
  - 4. The computer-implemented method of claim 1, wherein selecting the particular dialog state that corresponds to the first voice input further comprises:
    - generating a transcription of the first voice input; and
      
      determining a match between one or more n-grams that occur in the transcription of the first voice input and one or more n-grams in the corresponding set of n-grams that are mapped to the particular dialog state.
  - 5. The computer-implemented method of claim 4, wherein determining the match comprises determining a semantic relationship between the one or more n-grams that occur in the transcription of the first voice input and the one or more n-grams in the corresponding set of n-grams that are mapped to the particular dialog state.
  - 6. The computer-implemented method of claim 1, further comprising:
    - receiving context information associated with the first voice input other than the first display data; and
      
      selecting the particular dialog state that corresponds to the first voice input further based on the second context information associated with the voice input.
  - 7. The computer-implemented method of claim 1, further comprising receiving, at the computing system, an application identifier that indicates an application to which the first voice input was directed at the computing device,wherein the plurality of dialog states pertain to an application-specific task for the application to which the first voice input was directed.
  - 8. The computer-implemented method of claim 1, wherein the computing device is separate from the computing system.
  - 9. The computer-implemented method of claim 1, wherein the first display data comprises a hash value that results from processing data that represents information about the content that was displayed on the screen of the computing device when the first voice input was provided to the computing device.

10. A computing system comprising:
- one or more processors; and
  
  one or more computer-readable media having instructions stored thereon that, when executed, cause performance of operations comprising;
  
  receiving audio data that indicates a first voice input that was provided to a computing device;
  
  determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to;
  
  (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and(ii) a set of n-grams;
  
  receiving first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device;
  
  selecting a particular dialog state of the plurality of pre-defined dialog states;
  
  that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;
  
  biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and
  
  transcribing the voice input using the biased language model.
- View Dependent Claims (11)
- - 11. The computing system of claim 10, wherein the operations further comprise:
    - receiving a second voice input;
      
      selecting a second particular dialog state, from among the plurality of pre-defined dialog states, that corresponds to the second voice input; and
      
      biasing the language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the second particular dialog state, wherein the set of n-grams that are mapped to the second particular dialog state are different than the set of n-grams that are mapped to the particular dialog state that corresponds to the first voice input.

12. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform operations comprising:
- receiving audio data that indicates a first voice input that was provided to a computing device;
  
  determining that the first voice input is part of a voice dialog that includes a plurality of pre-defined dialog states arranged to receive a series of voice inputs related to a particular task, wherein each dialog state is mapped to;
  
  (i) a set of display data characterizing content that is designated for display when voice inputs for the dialog state are received, and(ii) a set of n-grams;
  
  receiving first display data that characterizes content that was displayed on a screen of the computing device when the first voice input was provided to the computing device;
  
  selecting a particular dialog state of the plurality of pre-defined dialog states that corresponds to the first voice input, including determining a match between the first display data and the corresponding set of display data that is mapped to the particular dialog state;
  
  biasing a language model by adjusting probability scores that the language model indicates for n-grams in the corresponding set of n-grams that are mapped to the particular dialog state; and
  
  transcribing the voice input using the adjusted biased language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Aleksic, Petar, Moreno Mengibar, Pedro J.
Primary Examiner(s)
Godbold, Douglas

Application Number

US15/071,651
Publication Number

US 20170270929A1
Time in Patent Office

797 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

G10L 15/065   Adaptation

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

Determining dialog states for language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Determining dialog states for language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links