Speech recognition using dock context

US 8,296,142 B2
Filed: 03/04/2011
Issued: 10/23/2012
Est. Priority Date: 01/21/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving, at a server system, audio data that includes encoded speech, the encoded speech having been detected by a client device;

receiving, at the server system, information that indicates a docking context of the client device while the speech encoded in the audio data was detected by the client device;

identifying a plurality of language models, each of the plurality of language models indicating a probability of an occurrence of a term in a sequence of terms based on other terms in the sequence;

for each of the plurality of language models, determining a weighting value to assign to the language model based on the docking context by accessing a stored weighting value associated with the docking context, the weighting value indicating a probability that using the language model will generate a correct transcription of the encoded speech;

selecting at least one of the plurality of language models based on the assigned weighting values; and

performing speech recognition on the audio data using the selected language model to identify a transcription for a portion of the audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.

169 Citations

29 Claims

1. A computer-implemented method, comprising:
- receiving, at a server system, audio data that includes encoded speech, the encoded speech having been detected by a client device;
  
  receiving, at the server system, information that indicates a docking context of the client device while the speech encoded in the audio data was detected by the client device;
  
  identifying a plurality of language models, each of the plurality of language models indicating a probability of an occurrence of a term in a sequence of terms based on other terms in the sequence;
  
  for each of the plurality of language models, determining a weighting value to assign to the language model based on the docking context by accessing a stored weighting value associated with the docking context, the weighting value indicating a probability that using the language model will generate a correct transcription of the encoded speech;
  
  selecting at least one of the plurality of language models based on the assigned weighting values; and
  
  performing speech recognition on the audio data using the selected language model to identify a transcription for a portion of the audio data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computer-implemented method of claim 1, wherein the docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.
  - 3. The computer-implemented method of claim 1, wherein the encoded speech includes one or more query terms, and wherein the transcription includes the query terms, and wherein the method further comprises:
    - generating a search query that includes the query terms;
      
      performing a search using the search query; and
      
      providing information indicating the results of the search to the client device.
  - 4. The computer-implemented method of claim 1, wherein each of the plurality of language models is trained for a particular topical category of words.
  - 5. The computer-implemented method of claim 1, wherein determining a weighting value based on the docking context comprises:
    - determining that the client device is connected to a vehicle docking station; and
      
      in response to determining that the client device is connected to a vehicle docking station, determining, for a navigation language model trained to output addresses, a weighting value that increases the probability that the navigation language model is selected relative to the other language models in the plurality of language models.

6. A computer-implemented method, comprising:
- accessing audio data that includes encoded speech;
  
  accessing information that indicates a docking context of a client device, the docking context being associated with the audio data;
  
  identifying a plurality of language models;
  
  determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech;
  
  selecting at least one of the plurality of language models based on the weighting values; and
  
  performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 7. The computer-implemented method of claim 6, wherein the information that indicates a docking context of the client device indicates a connection between the client device and a second device with which the client device is physically connected.
  - 8. The computer-implemented method of claim 6, wherein the information that indicates a docking context of the client device indicates a connection between the client device and a second device with which the client device is wirelessly connected.
  - 9. The computer-implemented method of claim 6, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 10. The computer-implemented method of claim 6, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.
  - 11. The computer-implemented method of claim 6, wherein the encoded speech includes one or more spoken query terms, and wherein the transcription includes a transcription of the spoken query terms, and wherein the method further comprises:
    - causing a search engine to perform a search using the transcription of the one or more spoken query terms; and
      
      providing information indicating the results of the search query to the client device.
  - 12. The computer-implemented method of claim 6, wherein determining weighting values for each of the plurality of language models comprises accessing stored weighting values associated with the docking context.
  - 13. The computer-implemented method of claim 6, wherein determining weighting values for each of the plurality of language models comprises accessing stored weighting values and altering the stored weighting values based on the docking context.
  - 14. The computer-implemented method of claim 6, wherein each of the plurality of language models is trained for a particular topical category of words.
  - 15. The computer-implemented method of claim 6, wherein determining a weighting value based on the docking context comprises:
    - determining that the docking context includes a connection to a vehicle docking station; and
      
      in response to determining that the docking context includes a connection to a vehicle docking station, determining, for a navigation language model trained to output addresses, a weighting value that increases the probability that the navigation language model is selected relative to the other language models in the plurality of language models.
  - 16. The computer-implemented method of claim 6, wherein:
    - the docking context indicates docking of the client device with a first docking station; and
      
      determining, for each of the plurality of language models, the weighting value based on the docking context comprises;
      
      determining that the first docking station has a particular docking station type from a predetermined set of docking station types; and
      
      determining weighting values that correspond to the particular docking station type.
  - 17. The computer-implemented method of claim 6, wherein determining, for each of the plurality of language models, the weighting value based on the docking context comprises:
    - determining the weighting value for each language model before using the language model to identify a transcription for the audio data.
  - 18. The computer-implemented method of claim 6, wherein performing speech recognition on the audio data using the selected at least one language model occurs in response to selecting the at least one language model.

19. A system comprising:
- one or more processors; and
  
  a computer-readable medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the system to perform operations comprising;
  
  accessing audio data that includes encoded speech;
  
  accessing information that indicates a docking context of a client device, the docking context being associated with the audio data;
  
  identifying a plurality of language models;
  
  determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech;
  
  selecting at least one of the plurality of language models based on the weighting values; and
  
  performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data.
- View Dependent Claims (20, 21)
- - 20. The system of claim 19, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 21. The system of claim 19, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.

22. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- accessing audio data that includes encoded speech;
  
  accessing information that indicates a docking context of a client device, the docking context being associated with the audio data;
  
  identifying a plurality of language models;
  
  determining, for each of the plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech;
  
  selecting at least one of the plurality of language models based on the weighting values; and
  
  performing speech recognition on the audio data using the selected at least one language model to identify a transcription for a portion of the audio data.
- View Dependent Claims (23, 24, 25)
- - 23. The non-transitory computer storage medium of claim 22, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 24. The non-transitory computer storage medium of claim 22, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.
  - 25. The non-transitory computer storage medium of claim 22, wherein the encoded speech includes one or more spoken query terms, and wherein the transcription includes a transcription of the spoken query terms, and wherein the operations further comprise:
    - causing a search engine to perform a search using the transcription of the one or more spoken query terms; and
      
      providing information indicating the results of the search query to the client device.

26. A computer-implemented method comprising:
- detecting audio containing speech at a client device;
  
  encoding the detected audio as audio data;
  
  transmitting the audio data to a server system;
  
  identifying a docking context of the client device;
  
  transmitting information indicating the docking context to the server system; and
  
  receiving a transcription of at least a portion of the audio data at the client device, the server system havingdetermined, for each of a plurality of language models, a weighting value based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech,selected at least one of the plurality of language models based on the weighting values, andgenerated the transcription by performing speech recognition on the audio data using the selected at least one language model, andtransmitted the transcription to the client device.
- View Dependent Claims (27, 28, 29)
- - 27. The computer-implemented method of claim 26, wherein the identified docking context is the docking context of the client device at the time the audio is detected.
  - 28. The computer-implemented method of claim 26, wherein the information indicating a docking context of the client device indicates a connection between the client device and a second device with which the client device is physically connected.
  - 29. The computer-implemented method of claim 26, wherein the information indicating a docking context of the client device indicates a connection between the client device and a second device with which the client device is wirelessly connected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lloyd, Matthew I., Risbood, Pankaj
Primary Examiner(s)
Neway, Samuel G

Application Number

US13/040,553
Publication Number

US 20120191448A1
Time in Patent Office

599 Days
Field of Search

704231-257
US Class Current

704/257
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

H04M 1/04   Supports for telephone tran...

H04M 1/6075   adapted for handsfree use i...

H04M 2250/74   with voice recognition mean...

Speech recognition using dock context

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

169 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using dock context

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

169 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links