Speech recognition models based on location indicia

US 8,831,957 B2
Filed: 10/15/2012
Issued: 09/09/2014
Est. Priority Date: 08/01/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, at a processing system, data corresponding to an utterance;

obtaining, at the processing system, location indicia for an area within a building where the utterance was spoken;

determining, at the processing system, a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building;

selecting, at the processing system, one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods;

accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building;

generating, at the processing system, a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and

generating, at the processing system, a transcription of the utterance using the composite model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.

Citations

28 Claims

1. A computer-implemented method comprising:
- receiving, at a processing system, data corresponding to an utterance;
  
  obtaining, at the processing system, location indicia for an area within a building where the utterance was spoken;
  
  determining, at the processing system, a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building;
  
  selecting, at the processing system, one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods;
  
  accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building;
  
  generating, at the processing system, a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and
  
  generating, at the processing system, a transcription of the utterance using the composite model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a client device, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the client device.
  - 3. The method of claim 2, wherein the location indicia comprises location data based on short-range wireless radio transmissions received at the client device.
  - 4. The method of claim 1, wherein obtaining the location indicia comprises:
    - generating one or more candidate transcriptions of the utterance using a location-independent language model; and
      
      based on comparing the one or more candidate transcriptions with phrases in one or more location-dependent language models, identifying one or more candidate areas within the building.
  - 5. The method of claim 1,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a processing system at the building, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the processing system at the building.
  - 6. The method of claim 5, wherein the location indicia comprises location data obtained from the processing system, wherein the processing system localizes the utterance using a microphone array arranged in the building, and wherein the microphone array is operatively coupled to the processing system.
  - 7. The method of claim 1,wherein each model for speech recognition associated with the candidate areas of the building comprises a language model;
    - andwherein the composite model comprises a composite language model.
  - 8. The method of claim 1,wherein each model for speech recognition associated with the candidate areas of the building comprises an acoustic model;
    - andwherein the composite model comprises a composite acoustic model.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving data corresponding to an utterance;
  
  obtaining location indicia for an area within a building where the utterance was spoken;
  
  determining a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building;
  
  selecting one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods;
  
  accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building;
  
  generating a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and
  
  generating a transcription of the utterance using the composite model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a client device, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the client device.
  - 11. The system of claim 10, wherein the location indicia comprises location data based on short-range wireless radio transmissions received at the client device.
  - 12. The system of claim 9, wherein obtaining the location indicia comprises:
    - generating one or more candidate transcriptions of the utterance using a location-independent language model; and
      
      based on comparing the one or more candidate transcriptions with phrases in one or more location-dependent language models, identifying one or more candidate areas within the building.
  - 13. The system of claim 9,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a processing system at the building, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the processing system at the building.
  - 14. The system of claim 13, wherein the location indicia comprises location data obtained from the processing system, wherein the processing system localizes the utterance using a microphone array arranged in the building, and wherein the microphone array is operatively coupled to the processing system.
  - 15. The system of claim 9,wherein each model for speech recognition associated with the candidate areas of the building comprises a language model;
    - andwherein the composite model comprises a composite language model.
  - 16. The system of claim 9,wherein each model for speech recognition associated with the candidate areas of the building comprises an acoustic model;
    - andwherein the composite model comprises a composite acoustic model.

17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- receiving data corresponding to an utterance;
  
  obtaining location indicia for an area within a building where the utterance was spoken;
  
  determining a set of likelihoods based on the location indicia, each likelihood in the set corresponding to a likelihood that the utterance was spoken in a particular area of the building from a plurality of candidate areas of the building;
  
  selecting one or more candidate areas of the building from the plurality of candidate areas of the building based on the set of likelihoods;
  
  accessing, for each selected candidate area of the building, a model for speech recognition associated with the respective candidate area of the building;
  
  generating a composite model using the accessed models for speech recognition and the likelihoods associated with the corresponding candidate areas of the building; and
  
  generating a transcription of the utterance using the composite model.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The computer-readable medium of claim 17,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a client device, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the client device.
  - 19. The computer-readable medium of claim 18, wherein the location indicia comprises location data based on short-range wireless radio transmissions received at the client device.
  - 20. The computer-readable medium of claim 17, wherein obtaining the location indicia comprises:
    - generating one or more candidate transcriptions of the utterance using a location-independent language model; and
      
      based on comparing the one or more candidate transcriptions with phrases in one or more location-dependent language models, identifying one or more candidate areas within the building.
  - 21. The computer-readable medium of claim 17,wherein receiving data corresponding to the utterance comprises receiving data corresponding to the utterance from a processing system at the building, andwherein obtaining location indicia for an area within a building where the utterance was spoken comprises receiving location indicia for the area within the building where the utterance was spoken from the processing system at the building.
  - 22. The computer-readable medium of claim 21, wherein the location indicia comprises location data obtained from the processing system, wherein the processing system localizes the utterance using a microphone array arranged in the building, and wherein the microphone array is operatively coupled to the processing system.
  - 23. The computer-readable medium of claim 17,wherein each model for speech recognition associated with the candidate areas of the building comprises a language model;
    - andwherein the composite model comprises a composite language model.
  - 24. The computer-readable medium of claim 17,wherein each model for speech recognition associated with the candidate areas of the building comprises an acoustic model;
    - andwherein the composite model comprises a composite acoustic model.

25. A client device comprising:
- one or more processors and one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising;
  
  receiving an utterance at a client device;
  
  obtaining, at the client device, location indicia for an area within a building where the utterance was spoken;
  
  communicating, from the client device to a server, data corresponding to the utterance and the location indicia for the area within the building where the utterance was spoken; and
  
  receiving, at the client device, a transcription of the utterance, wherein the transcription of the utterance was generated using a composite model, and wherein the composite model was generated using one or more selected models associated with one or more predetermined areas of the building, each of the one or more selected models being selected based on a likelihood that the utterance was spoken in the corresponding predetermined area of the building.
- View Dependent Claims (26, 27, 28)
- - 26. The client device of claim 25, wherein the location indicia comprises location data based on short-range wireless radio transmissions received at the client device.
  - 27. The client device of claim 25, wherein the composite model comprises a composite acoustic model, and wherein the one or more models and respective weights of the one or more models comprises one or more acoustic models and respective weights of the one or more acoustic models.
  - 28. The client device of claim 25, wherein the composite model comprises a composite language model, and wherein the one or more models and respective weights of the one or more models comprises one or more language models and respective weights of the one or more language models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Taubman, Gabriel, Strope, Brian
Primary Examiner(s)
Chawan, Vijay B

Application Number

US13/651,566
Publication Number

US 20140039888A1
Time in Patent Office

694 Days
Field of Search

704/275, 704/257, 704/246, 704/243, 704/247, 704/226, 704/270, 704/270.1, 704/235, 379/88.04, 379/88.02, 379/88.01
US Class Current

704/275
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/30   Distributed recognition, e....

G10L 2015/0635   updating or merging of old ...

G10L 2015/226   using non-speech characteri...

H04M 1/72457   according to geographic loc...

Speech recognition models based on location indicia

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition models based on location indicia

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links