Language model biasing modulation

US 9,460,713 B1
Filed: 03/30/2015
Issued: 10/04/2016
Est. Priority Date: 03/30/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving audio data encoding an utterance of a user;

receiving context data associated with the received audio data;

determining a likely context associated with a user, based on at least a portion of the context data;

selecting one or more language model biasing parameters based at least on the likely context associated with the user;

determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content;

adjusting one or more of the language model biasing parameters based at least on the context confidence score;

biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters;

providing the biased language model for use by an automated speech recognizer (ASR);

generating a transcription of the received audio data using the biased language model; and

transmitting the generated transcription for display on a client computing device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Citations

17 Claims

1. A computer-implemented method comprising:
- receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content;
  
  adjusting one or more of the language model biasing parameters based at least on the context confidence score;
  
  biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription for display on a client computing device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the context confidence score reflects a likelihood that the likely context remains associated with the user.
  - 3. The method of claim 1, wherein adjusting one or more of the language model biasing parameters comprises:
    - comparing the context confidence score associated with the likely context to a threshold context confidence score value;
      
      determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
      
      in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; and
      
      providing, for output to a language module biaser, one or more adjusted language model biasing parameters based on at least the likely context with decreased biasing weights.
  - 4. The method of claim 3, wherein the interpolating the one or more language model biasing parameters comprises:
    - reducing, by a first magnitude, based on additional context data indicating a time difference between a presentation of a search result and a user response to the presentation of the search result;
      
      orreducing, by a second magnitude, based on additional context data including gaze tracking data.
  - 5. The method of claim 1, wherein the baseline language model indicates scores associated with different n-gram sequences.

6. A non-transitory computer storage device encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content;
  
  adjusting one or more of the language model biasing parameters based at least on the context confidence score;
  
  biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription for display on a client computing device.
- View Dependent Claims (7, 8, 9)
- - 7. The device of claim 6, wherein the context confidence score reflects a likelihood that the likely context remains associated with the user.
  - 8. The device of claim 6, wherein adjusting one or more of the language model biasing parameters comprises:
    - comparing the context confidence score associated with the likely context to a threshold context confidence score value;
      
      determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
      
      in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; and
      
      providing, for output to a language module biaser, one or more adjusted language model biasing parameters based on at least the likely context with decreased biasing weights.
  - 9. The device of claim 8, wherein the interpolating the one or more language model biasing parameters comprises:
    - reducing, by a first magnitude, based on additional context data indicating a time difference between a presentation of a search result and a user response to the presentation of the search result;
      
      orreducing, by a second magnitude, based on additional context data including gaze tracking data.

10. A system comprising:
- one or more processors; and
  
  a non-transitory computer-readable medium coupled to the one or more computers having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content;
  
  adjusting one or more of the language model biasing parameters based at least on the context confidence score;
  
  biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription for display on a client computing device.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, wherein the context confidence score reflects a likelihood that the likely context remains associated with the user.
  - 12. The system of claim 10, wherein adjusting one or more of the language model biasing parameters comprises:
    - comparing the context confidence score associated with the likely context to a threshold context confidence score value;
      
      determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
      
      in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; and
      
      providing, for output to a language module biaser, one or more adjusted language model biasing parameters based on at least the likely context with decreased biasing weights.
  - 13. The system of claim 12, wherein the interpolating the one or more language model biasing parameters comprises:
    - reducing, by a first magnitude, based on additional context data indicating a time difference between a presentation of a search result and a user response to the presentation of the search result;
      
      orreducing, by a second magnitude, based on additional context data including gaze tracking data.
  - 14. The system of claim 10, wherein the baseline language model indicates scores associated with different n-gram sequences.

15. A computer-implemented method comprising:
- receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data;
  
  comparing the context confidence score associated with the likely context to a threshold context confidence score value;
  
  determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
  
  in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context;
  
  biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription to a client computing device.

16. A non-transitory computer storage device encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data;
  
  comparing the context confidence score associated with the likely context to a threshold context confidence score value;
  
  determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
  
  in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context;
  
  biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription to a client computing device.

17. A system comprising:
- one or more processors; and
  
  a non-transitory computer-readable medium coupled to the one or more computers having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  receiving audio data encoding an utterance of a user;
  
  receiving context data associated with the received audio data;
  
  determining a likely context associated with a user, based on at least a portion of the context data;
  
  selecting one or more language model biasing parameters based at least on the likely context associated with the user;
  
  determining a context confidence score associated with the likely context based on at least a portion of the context data;
  
  comparing the context confidence score associated with the likely context to a threshold context confidence score value;
  
  determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user;
  
  in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context;
  
  biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters;
  
  providing the biased language model for use by an automated speech recognizer (ASR);
  
  generating a transcription of the received audio data using the biased language model; and
  
  transmitting the generated transcription to a client computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Moreno Mengibar, Pedro J., Aleksic, Petar
Primary Examiner(s)
Singh, Satwant

Application Number

US14/673,731
Publication Number

US 20160293163A1
Time in Patent Office

554 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/24   Speech recognition using no...

Language model biasing modulation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Language model biasing modulation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links