SPEECH RECOGNITION USING DOCK CONTEXT

US 20120191449A1
Filed: 09/30/2011
Published: 07/26/2012
Est. Priority Date: 01/21/2011
Status: Active Grant

First Claim

Patent Images

1-5. -5. (canceled)

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.

Citations

31 Claims

1-5. -5. (canceled)

6. A computer-implemented method, comprising:
- accessing first audio data that includes encoded speech;
  
  accessing information that indicates a first docking context of a client device, the first docking context being associated with the first audio data;
  
  identifying a plurality of language models;
  
  determining that the first docking context indicates docking of the client device with a first docking station of a first type;
  
  selecting at least a first language model of the plurality of language models based on determining that the first docking context indicates docking of the client device with the first docking station of the first type;
  
  performing speech recognition on the first audio data using the selected first language model to identify a transcription for a portion of the first audio data;
  
  accessing second audio data that includes encoded speech;
  
  accessing information that indicates a second docking context of the client device, the second docking context being associated with the second audio data;
  
  determining that the second docking context indicates docking of the client device with a second docking station of a second type, the second type being different from the first type;
  
  selecting at least a second language model of the plurality of language models based on determining that the second docking context indicates docking of the client device with the second docking station of the second type, the second language model being different from the first language model; and
  
  performing speech recognition on the second audio data using the second language model to identify a transcription for a portion of the second audio data,wherein docking stations of the first type provide capabilities for one or more first manners of using the client device, and wherein docking stations of the second type provide capabilities for one or more second manners of using the client device that are different from the one or more first manners of using the client device.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 26, 27, 28, 29, 30)
- - 7. The computer-implemented method of claim 6, wherein the information that indicates a docking context of the client device indicates a connection between the client device and a second device with which the client device is physically connected.
  - 8. The computer-implemented method of claim 6, wherein the information that indicates a docking context of the client device indicates a connection between the client device and a second device with which the client device is wirelessly connected.
  - 9. The computer-implemented method of claim 6, further comprising determining, for each of the plurality of language models, a weighting value to assign to the language model based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech,wherein selecting at least the first language model of the plurality of language models based on the docking context comprises selecting at least the first language model based on the assigned weighting values.
  - 10. The computer-implemented method of claim 6, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 11. The computer-implemented method of claim 6, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.
  - 12. The computer-implemented method of claim 6, wherein the encoded speech includes one or more spoken query terms, and wherein the transcription includes a transcription of the spoken query terms, and wherein the method further comprises:
    - causing a search engine to perform a search using the transcription of the one or more spoken query terms; and
      
      providing information indicating the results of the search query to the client device.
  - 13. The computer-implemented method of claim 9, wherein determining weighting values for each of the plurality of language models comprises accessing stored weighting values associated with the first docking context.
  - 14. The computer-implemented method of claim 9, wherein determining weighting values for each of the plurality of language models comprises accessing stored weighting values and altering the stored weighting values based on the first docking context.
  - 15. The computer-implemented method of claim 6, wherein each of the plurality of language models is trained for a particular topical category of words.
  - 16. The computer-implemented method of claim 9, wherein:
    - determining that the first docking context indicates docking of the client device with a first docking station of a first type comprises determining that the first docking context indicates that the first docking station is a vehicle docking station; and
      
      determining a weighting value based on the first docking context comprises;
      
      in response to determining that the first docking context indicates that the first docking station is a vehicle docking station, determining, for a navigation language model trained to output addresses, a weighting value that increases the probability that the navigation language model is selected relative to the other language models in the plurality of language models.
  - 26. The method of claim 6, wherein:
    - determining that the first docking context indicates docking of the client device with the first docking station of the first type comprises determining that the first docking station type is a particular one of a predetermined set of dock types; and
      
      determining that the second docking context indicates docking of the client device with the second docking station of the second type comprises determining that the second docking station type is a particular different one of the predetermined set of dock types.
  - 27. The method of claim 6, wherein the first type and the second type are different types selected from a group consisting of a media player docking station type, a vehicle docking station type, and a computer docking station type.
  - 28. The method of claim 6, wherein:
    - determining that the first docking context indicates docking of the client device with the first docking station of the first type comprises determining that the first docking station is a vehicle docking station; and
      
      selecting at least the first language model of the plurality of language models comprises selecting a language model associated with navigation in response to determining that the first docking station is a vehicle docking station.
  - 29. The method of claim 6, wherein:
    - determining that the first docking context indicates docking of the client device with the first docking station of the first type comprises determining that the first docking station is a media playing device; and
      
      selecting at least the first language model of the plurality of language models comprises selecting a language model associated with media in response to determining that the first docking station is a media playing device.
  - 30. The method of claim 6, wherein:
    - determining that the first docking context indicates docking of the client device with the first docking station of the first type comprises determining that the first docking station is a computer; and
      
      selecting at least the first language model of the plurality of language models comprises selecting a language model associated with Internet searching in response to determining that the first docking station is a computer.

17. A system comprising:
- one or more processors; and
  
  a computer-readable medium coupled to the one or more processors having instructions stored thereon which, when executed by the one or more processors, cause the system to perform operations comprising;
  
  accessing first audio data that includes encoded speech;
  
  accessing information that indicates a first docking context of a client device, the first docking context being associated with the first audio data;
  
  identifying a plurality of language models;
  
  determining that the first docking context indicates docking of the client device with a first docking station of a first type;
  
  selecting at least a first language model of the plurality of language models based on determining that the first docking context indicates docking of the client device with the first docking station of the first type;
  
  performing speech recognition on the first audio data using the first language model to identify a transcription for a portion of the first audio data;
  
  accessing second audio data that includes encoded speech;
  
  accessing information that indicates a second docking context of the client device, the second docking context being associated with the second audio data;
  
  determining that the second docking context indicates docking of the client device with a second docking station of a second type, the second type being different from the first type;
  
  selecting at least a second language model of the plurality of language models based on determining that the second docking context indicates docking of the client device with the second docking station of the second type, the second language model being different from the first language model; and
  
  performing speech recognition on the second audio data using the second language model to identify a transcription for a portion of the second audio data,wherein docking stations of the first type provide capabilities for one or more first manners of using the client device, and wherein docking stations of the second type provide capabilities for one or more second manners of using the client device that are different from the one or more first manners of using the client device.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the operations further comprise determining, for each of the plurality of language models, a weighting value to assign to the language model based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech,wherein selecting at least one of the plurality of language models based on the docking context comprises selecting at least one of the plurality of language models based on the assigned weighting values.
  - 19. The system of claim 17, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 20. The system of claim 17, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.

21. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- accessing first audio data that includes encoded speech;
  
  accessing information that indicates a first docking context of a client device, the first docking context being associated with the first audio data;
  
  identifying a plurality of language models;
  
  determining that the first docking context indicates docking of the client device with a first docking station of a first type;
  
  selecting at least a first language model of the plurality of language models based on determining that the first docking context indicates docking of the client device with the first docking station of the first type;
  
  performing speech recognition on the first audio data using the first language model to identify a transcription for a portion of the first audio data;
  
  accessing second audio data that includes encoded speech;
  
  accessing information that indicates a second docking context of the client device, the second docking context being associated with the second audio datadetermining that the second docking context indicates docking of the client device with a second docking station of a second type, the second type being different from the first type;
  
  selecting at least a second language model of the plurality of language models based on determining that the second docking context indicates docking of the client device with the second docking station of the second type, the second language model being different from the first language model; and
  
  performing speech recognition on the second audio data using the second language model to identify a transcription for a portion of the second audio datawherein docking stations of the first type provide capabilities for one or more first manners of using the client device, and wherein docking stations of the second type provide capabilities for one or more second manners of using the client device that are different from the one or more first manners of using the client device.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The computer storage medium of claim 21, wherein the operations comprise determining, for each of the plurality of language models, a weighting value to assign to the language model based on the docking context, the weighting value indicating a probability that the language model will indicate a correct transcription for the encoded speech,wherein selecting at least one of the plurality of language models based on the docking context comprises selecting at least one of the plurality of language models based on the assigned weighting values.
  - 23. The computer storage medium of claim 21, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates whether the client device was connected to a docking station while the speech encoded in the audio data was detected by the client device.
  - 24. The computer storage medium of claim 21, wherein the speech encoded in the audio data was detected by the client device, and wherein the information that indicates a docking context indicates a type of docking station to which the client device was connected while the speech encoded in the audio data was detected by the client device.
  - 25. The computer storage medium of claim 21, wherein the encoded speech includes one or more spoken query terms, and wherein the transcription includes a transcription of the spoken query terms, and wherein the operations further comprise:
    - causing a search engine to perform a search using the transcription of the one or more spoken query terms; and
      
      providing information indicating the results of the search query to the client device.

31. A computer-implemented method, comprising:
- accessing first audio data that includes encoded speech;
  
  accessing information that indicates a first docking context of a first client device, the first docking context being associated with the first audio data;
  
  identifying a plurality of language models;
  
  determining that the first docking context indicates docking of the first client device with a first docking station of a first type;
  
  selecting at least a first language model of the plurality of language models based on that the first docking context indicates docking of the first client device with the first docking station of the first type;
  
  performing speech recognition on the first audio data using the first language model to identify a transcription for a portion of the first audio data;
  
  accessing second audio data that includes encoded speech;
  
  accessing information that indicates a second docking context of a second client device, the second docking context being associated with the second audio data, the second client device being different from the first client device;
  
  determining that the second docking context indicates docking of the second client device with a second docking station of a second type, the second type being different from the first type;
  
  selecting at least a second language model of the plurality of language models based on determining that the second docking context indicates docking of the second client device with the second docking station of the second type, the second language model being different from the first language model; and
  
  performing speech recognition on the second audio data using the second language model to identify a transcription for a portion of the second audio data,wherein docking stations of the first type provide capabilities for one or more first manners of using a client device, and wherein docking stations of the second type provide capabilities for one or more second manners of using a client device that are different from the one or more first manners of using a client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
LLOYD, Matthew I., RISBOOD, Pankaj

Granted Patent

US 8,396,709 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

H04M 1/04   Supports for telephone tran...

H04M 1/6075   adapted for handsfree use i...

H04M 2250/74   with voice recognition means

SPEECH RECOGNITION USING DOCK CONTEXT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH RECOGNITION USING DOCK CONTEXT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links