Adjusting language models

US 8,352,245 B1
Filed: 03/31/2011
Issued: 01/08/2013
Est. Priority Date: 12/30/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

accessing audio data;

accessing information that indicates a first context, the first context comprising a first physical environment or physical state of a device that records the audio data;

accessing at least one term;

accessing information that indicates a second context, the second context comprising a second physical environment or physical state associated with the accessed term;

determining a similarity score that indicates a degree of similarity between the first physical environment or physical state and the second physical environment or physical state;

adjusting a language model based on the accessed term and the determined similarity score to generate an adjusted language model, wherein the adjusted language model includes the accessed term and a weighting value assigned to the accessed term based on the similarity score; and

performing speech recognition on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for adjusting language models. In one aspect, a method includes accessing audio data. Information that indicates a first context is accessed, the first context being associated with the audio data. At least one term is accessed. Information that indicates a second context is accessed, the second context being associated with the term. A similarity score is determined that indicates a degree of similarity between the second context and the first context. A language model is adjusted based on the accessed term and the determined similarity score to generate an adjusted language model. Speech recognition is performed on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.

Citations

29 Claims

1. A computer-implemented method, comprising:
- accessing audio data;
  
  accessing information that indicates a first context, the first context comprising a first physical environment or physical state of a device that records the audio data;
  
  accessing at least one term;
  
  accessing information that indicates a second context, the second context comprising a second physical environment or physical state associated with the accessed term;
  
  determining a similarity score that indicates a degree of similarity between the first physical environment or physical state and the second physical environment or physical state;
  
  adjusting a language model based on the accessed term and the determined similarity score to generate an adjusted language model, wherein the adjusted language model includes the accessed term and a weighting value assigned to the accessed term based on the similarity score; and
  
  performing speech recognition on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24, 25, 26, 27)
- - 2. The computer-implemented method of claim 1, wherein the adjusted language model indicates a probability of an occurrence of a term in a sequence of terms based on other terms in the sequence.
  - 3. The computer-implemented method of claim 1, wherein adjusting a language model comprises accessing a stored language model and adjusting the stored language model based on the similarity score.
  - 4. The computer-implemented method of claim 3, wherein adjusting the accessed language model comprises adjusting the accessed language model to increase a probability in the language model that the accessed term will be selected as a candidate transcription for the audio data.
  - 5. The computer-implemented method of claim 4, wherein adjusting the accessed language model to increase a probability in the language model includes changing an initial weighting value assigned to the term based on the similarity score.
  - 6. The computer-implemented method of claim 1, comprising determining that the accessed term was entered by a user, wherein:
    - the audio data encodes speech of the user;
      
      the first context includes the environment in which the speech occurred; and
      
      the second context includes the environment in which the accessed term was previously entered by the user.
  - 7. The computer-implemented method of claim 1, wherein the information that indicates the first context and the information that indicates the second context each indicate a geographic location.
  - 8. The computer-implemented method of claim 1, wherein the information that indicates a first context and information that indicates the second context each indicate a document type or application type.
  - 9. The method of claim 1, wherein:
    - the information indicating the first context includes a first identifier for a recipient of a first message that is currently being dictated;
      
      the information indicating the second context includes a second identifier of a recipient of a second message that was previously sent; and
      
      determining the similarity score comprises determining the similarity score based in part on a degree of similarity of the first identifier and the second identifier.
  - 10. The computer-implemented method of claim 1, comprising identifying at least one second term related to the accessed term, wherein the language model includes the second term, and wherein adjusting the language model includes assigning a weighting value to the second term based on the similarity score.
  - 11. The computer-implemented method of claim 1, wherein the accessed term was recognized from a speech sequence and wherein the audio data is a continuation of the speech sequence.
  - 24. The computer-implemented method of claim 1, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first level of motion of the device that records the audio data when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second level of motion associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first level of motion and the second level of motion.
  - 25. The computer-implemented method of claim 1, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first orientation of the device that records the audio data when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second orientation associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first orientation and the second orientation.
  - 26. The computer-implemented method of claim 1, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first time when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second time associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first time and the second time.
  - 27. The computer-implemented method of claim 1, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first docking station type for a docking station to which the device that records the audio data is connected when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second docking station type associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first docking station type and the second docking station type.

12. A system comprising:
- one or more computers; and
  
  a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
  
  accessing audio data;
  
  accessing information that indicates a first context, the first context comprising a first physical environment or physical state of a device that records the audio data;
  
  accessing at least one term;
  
  accessing information that indicates a second context, the second context comprising a second physical environment or physical state associated with the accessed term;
  
  determining a similarity score that indicates a degree of similarity between the first physical environment or physical state and the second physical environment or physical state;
  
  adjusting a language model based on the accessed term and the determined similarity score to generate an adjusted language model, wherein the adjusted language model includes the accessed term and a weighting value assigned to the accessed term based on the similarity score; and
  
  performing speech recognition on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
- View Dependent Claims (13, 14, 15, 16, 28, 29)
- - 13. The system of claim 12, wherein adjusting a language model comprises accessing a stored language model and adjusting the stored language model based on the similarity score.
  - 14. The system of claim 13, wherein adjusting the accessed language model comprises adjusting the accessed language model to increase a probability in the language model that the accessed term will be selected as a candidate transcription for the audio data.
  - 15. The system of claim 14, wherein adjusting the accessed language model to increase a probability in the language model includes changing an initial weighting value assigned to the term based on the similarity score.
  - 16. The system of claim 12, wherein the operations comprise determining that the accessed term was entered by a user, wherein:
    - the audio data encodes speech of the user;
      
      the first context includes the environment in which the speech occurred; and
      
      the second context includes the environment in which the accessed term was previously entered by the user.
  - 28. The system of claim 12, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first geographic location of the device that records the audio data when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second geographic location associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first geographic location and the second geographic location.
  - 29. The system of claim 12, wherein:
    - accessing information that indicates the first context comprises accessing information that indicates a first location type of a location of the device that records the audio data when the audio data is recorded;
      
      accessing information that indicates the second context comprises accessing information that indicates a second location type associated with the accessed term; and
      
      determining the similarity score comprises determining a similarity score that indicates a degree of similarity between the first location type and the second location type.

17. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- accessing audio data;
  
  accessing information that indicates a first context, the first context comprising a first physical environment or physical state of a device that records the audio data;
  
  accessing at least one term;
  
  accessing information that indicates a second context, the second context comprising a second physical environment or physical state associated with the accessed term;
  
  determining a similarity score that indicates a degree of similarity between the first physical environment or physical state and the second physical environment or physical state;
  
  adjusting a language model based on the accessed term and the determined similarity score to generate an adjusted language model, wherein the adjusted language model includes the accessed term and a weighting value assigned to the accessed term based on the similarity score; and
  
  performing speech recognition on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
- View Dependent Claims (18, 19, 20)
- - 18. The non-transitory computer storage medium of claim 17, wherein adjusting a language model comprises accessing a stored language model and adjusting the stored language model based on the similarity score.
  - 19. The non-transitory computer storage medium of claim 18, wherein adjusting the accessed language model comprises adjusting the accessed language model to increase a probability in the language model that the accessed term will be selected as a candidate transcription for the audio data.
  - 20. The non-transitory computer storage medium of claim 17, wherein the operations comprise determining that the accessed term was entered by a user, wherein:
    - the audio data encodes speech of the user;
      
      the first context includes the environment in which the speech occurred; and
      
      the second context includes the environment in which the accessed term was previously entered by the user.

21. A computer-implemented method comprising:
- transmitting, at a client device, audio data to a server system;
  
  identifying a first context of the client device, the first context comprising a first physical environment or physical state of the client device;
  
  transmitting information indicating the first context to the server system; and
  
  receiving, at the client device, a transcription of at least a portion of the audio data at the client device, the server system havingaccessed at least one term,accessed information that indicates a second context, the second context comprising a second physical environment or physical state associated with the accessed term,determined a similarity score that indicates a degree of similarity between the first physical environment or physical state and the second physical environment or physical state,adjusted a language model based on the accessed term and the determined similarity score to generate an adjusted language model, wherein the adjusted language model includes the accessed term and a weighting value assigned to the accessed term based on the similarity score,performed speech recognition on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data, andtransmitted the transcription to the client device.
- View Dependent Claims (22, 23)
- - 22. The computer-implemented method of claim 21, wherein having adjusted a language model comprises having accessed a stored language model and having adjusted the stored language model based on the similarity score.
  - 23. The computer-implemented method of claim 21, wherein the server system has further determined that the accessed term was entered by a user, and wherein:
    - the audio data encodes speech of the user;
      
      the first context includes the environment in which the speech occurred; and
      
      the second context includes the environment in which the accessed term was previously entered by the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I.
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/077,106
Time in Patent Office

649 Days
Field of Search

704/9
US Class Current

704/9
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

G10L 25/12   the extracted parameters be...

G10L 25/51   for comparison or discrimin...

Adjusting language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Adjusting language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links