Language model biasing system

US 10,311,860 B2
Filed: 02/14/2017
Issued: 06/04/2019
Est. Priority Date: 02/14/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving audio data corresponding to a user utterance and context data for the user utterance;

identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance;

generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams;

based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams;

determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words;

after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams;

after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and

providing the transcription of the user utterance for output.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

212 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving audio data corresponding to a user utterance and context data for the user utterance;
  
  identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance;
  
  generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams;
  
  based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams;
  
  determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words;
  
  after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams;
  
  after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and
  
  providing the transcription of the user utterance for output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The computer-implemented method of claim 1, wherein adjusting the language model based at least on the expanded set of n-grams comprises adjusting recognition scores for one or more of the n-grams of the expanded set of n-grams in the language model.
  - 3. The computer-implemented method of claim 1, wherein adjusting the language model based at least on the expanded set of n-grams comprises adjusting probability masses assigned to one or more n-grams of the expanded set of n-grams in the language model.
  - 4. The computer-implemented method of claim 1, wherein the language model includes one or more placeholder transitions, and wherein adding the one or more n-grams of the expanded set of n-grams to the language model comprises:
    - assigning a particular n-gram of the expanded set of n-grams to a particular placeholder transition.
  - 5. The computer-implemented method of claim 4, wherein assigning the particular n-gram of the expanded set of n-grams to the particular placeholder transition comprises:
    - adjusting a recognition score for the particular placeholder transition that is assigned the particular n-gram of the expanded set of n-grams.
  - 6. The computer-implemented method of claim 1, wherein generating the expanded set of n-grams based at least on the initial set of n-grams comprises:
    - sending one or more of the initial set of n-grams to each of one or more language expansion services; and
      
      receiving, from the one or more language expansion services and in response to sending the one or more of the initial set of n-grams to each of the one or more language expansion services, the expanded set of one or more n-grams.
  - 7. The computer-implemented method of claim 1, wherein one or more of the expanded set of n-grams are included in a hash map, and wherein adjusting the score for the particular speech recognition candidate determined to be included in the expanded set of n-grams comprises:
    - determining that the particular speech recognition candidate is included in the hash map.
  - 8. The computer-implemented method of claim 1, wherein the audio data corresponding to the user utterance corresponds to a particular segment of a spoken user input that comprises multiple segments;
    - andwherein the language model is a general language model or a general language model that has been influenced during processing of a segment preceding the particular segment of the spoken user input.
  - 9. The computer-implemented method of claim 1, wherein the context data does not include one or more words included in a previous transcription of a user utterance.
  - 10. The computer-implemented method of claim 1, wherein the language model is adjusted based at least on the expanded set of n-grams prior to receiving the audio data corresponding to the user utterance.
  - 11. The computer-implemented method of claim 1, wherein adjusting the language model is performed in response to receiving the context data for the user utterance, the context data indicating a context at a time the user utterance is spoken.
  - 12. The method of claim 1, wherein receiving the audio data comprises receiving audio data detected by a user device;
    - andwherein the initial set of n-grams comprises one or more words or phrases displayed on a screen of the user device.
  - 13. The method of claim 1, wherein receiving the audio data comprises receiving audio data detected by a user device;
    - andwherein one or more of the initial set of n-grams are provided by an application running on the user device.
  - 14. The method of claim 1, wherein receiving the audio data comprises receiving audio data detected by a user device;
    - andwherein the initial set of n-grams include a predetermined set of terms corresponding to a current dialog state corresponding one of multiple steps for carrying out a task.
  - 15. The method of claim 1, wherein the context data includes an application identifier or a dialog state identifier;
    - wherein the method comprises;
      
      retrieving a set of terms based on the application identifier or dialog state identifier; and
      
      including the retrieved set of terms in the expanded set of one or more terms.
  - 16. The method of claim 1, wherein receiving the context data comprises receiving context data indicating a current context of a user device when the user device generates the audio data for the user utterance.

17. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving audio data corresponding to a user utterance and context data for the user utterance;
  
  identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance;
  
  generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams;
  
  based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams;
  
  determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words;
  
  after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams;
  
  after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and
  
  providing the transcription of the user utterance for output.
- View Dependent Claims (18, 19)
- - 18. The system of claim 17, wherein adjusting the language model based at least on the expanded set of n-grams comprises adjusting the language model to generate scores for candidate transcriptions using adjusted probability scores different from probability scores determined through training of the language model, the adjusted probability scores indicating increased probabilities for one or more of the n-grams of the expanded set of n-grams compared to the probability scores determined through training of the language model.
  - 19. The system of claim 17, wherein the language model includes one or more placeholder transitions, and wherein adding the one or more n-grams of the expanded set of n-grams to the language model comprises:
    - assigning a particular n-gram of the expanded set of n-grams to a particular placeholder transition for determining the one or more speech recognition candidates for at least a portion of the user utterance; and
      
      wherein the operations comprise;
      
      after determining the one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, removing the assignment of the particular n-gram to the particular placeholder transition.

20. A non-transitory computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving audio data corresponding to a user utterance and context data for the user utterance;
  
  identifying an initial set of one or more n-grams from the context data;
  
  generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams;
  
  adjusting a language model based at least on the expanded set of n-grams;
  
  determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words;
  
  after determining the one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, undoing the adjustment to the language model;
  
  after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams;
  
  after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and
  
  providing the transcription of the user utterance for output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Aleksic, Petar, Moreno Mengibar, Pedro J.
Primary Examiner(s)
Chawan, Vijay B

Application Number

US15/432,620
Publication Number

US 20180233131A1
Time in Patent Office

840 Days
Field of Search

704231, 704246, 7042564
US Class Current
CPC Class Codes

G10L 15/01   Assessment or evaluation of...

G10L 15/07   to the speaker

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/30   Distributed recognition, e....

Language model biasing system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

212 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Language model biasing system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

212 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links