Environmentally aware speech recognition

US 9,159,315 B1
Filed: 01/07/2013
Issued: 10/13/2015
Est. Priority Date: 01/07/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving, via a device, one or more spoken utterances;

based on the one or more spoken utterances, identifying a language of the one or more spoken utterances;

determining an acoustic model for a particular language based on the identified language, wherein the acoustic model for the particular language is configured for use in speech recognition;

determining a location of the device;

determining one or more environmental conditions regarding an environment of the location of the device;

determining from among a plurality of data sets at least one adaptation data set based on the one or more environmental conditions and the location of the device, wherein the at least one adaptation data set includes information that enables recognition of speech preferences associated with the location; and

using the at least one adaptation data set, adapting the acoustic model for the particular language to obtain another acoustic model that is adapted to the one or more environmental conditions and the location of the device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples of methods and systems for implementing environmentally aware speech recognition are described. In some examples, a method may be performed by a computing device within a system to adapt an acoustic model for a particular language to one or more environmental conditions. A device may receive one or more spoken utterances and based on the utterances, a system containing the device may determine an acoustic model for the particular language. The system may adapt the acoustic model using one or more data sets depending on the environmental conditions at the location of the device or may obtain another acoustic model that is adapted to the environmental conditions. In some examples, the system may also adapt the acoustic model using one or more data sets based on the voice characteristics of the speaker of the one or more spoken utterances.

Citations

20 Claims

1. A method, comprising:
- receiving, via a device, one or more spoken utterances;
  
  based on the one or more spoken utterances, identifying a language of the one or more spoken utterances;
  
  determining an acoustic model for a particular language based on the identified language, wherein the acoustic model for the particular language is configured for use in speech recognition;
  
  determining a location of the device;
  
  determining one or more environmental conditions regarding an environment of the location of the device;
  
  determining from among a plurality of data sets at least one adaptation data set based on the one or more environmental conditions and the location of the device, wherein the at least one adaptation data set includes information that enables recognition of speech preferences associated with the location; and
  
  using the at least one adaptation data set, adapting the acoustic model for the particular language to obtain another acoustic model that is adapted to the one or more environmental conditions and the location of the device.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - based on the one or more spoken utterances, determining one or more voice characteristics of a speaker of the one or more spoken utterances;
      
      determining from among the plurality of data sets at least one speaker data set based on the one or more voice characteristics of the speaker; and
      
      using the at least one speaker data set to further adapt the acoustic model for the particular language to the speaker.
  - 3. The method of claim 1, further comprising:
    - determining a type of wireless connection in use by the device;
      
      based on the type of wireless connection, determining from among the plurality of data sets at least one device data set; and
      
      using the at least one device data set to further adapt the acoustic model for the particular language.
  - 4. The method of claim 1, wherein the one or more environmental conditions include one or more weather conditions based on the location of the device, and the method further comprises:
    - determining the one or more weather conditions of the location of the device;
      
      based on the one or more weather conditions, determining from among the plurality of data sets at least one weather data set; and
      
      using the at least one weather data set to further adapt the acoustic model for the particular language.
  - 5. The method of claim 1, wherein using the at least one adaptation data set to adapt the acoustic model for the particular language comprises:
    - using one or more of the following techniques;
      
      maximum likelihood linear regression (MLLR), vocal tract length normalization (VTLN), or maximum a posteriori (MAP) adaptation.
  - 6. The method of claim 1, further comprising:
    - determining a rate of movement of the device; and
      
      adapting the acoustic model for the particular language based on the rate of movement of the device.
  - 7. The method of claim 1, wherein determining the acoustic model for the particular language based on the identified language comprises:
    - sending the one or more spoken utterances to a server; and
      
      receiving from the server the acoustic model.

8. A non-transitory computer readable medium having stored therein instructions, that when executed by a computing system, cause the computing system to perform functions comprising:
- receiving, via a device, one or more spoken utterances;
  
  based on the one or more spoken utterances, identifying a language of the one or more spoken utterances;
  
  determining an acoustic model for a particular language based on the identified language, wherein the acoustic model for the particular language is configured for use in speech recognition;
  
  determining a location of the device;
  
  determining one or more environmental conditions regarding an environment of the location of the device;
  
  determining from among a plurality of data sets at least one adaptation data set based on the one or more environmental conditions and the location of the device, wherein the at least one adaptation data set includes information that enables recognition of speech preferences associated with the location; and
  
  using the at least one adaptation data set, adapting the acoustic model for the particular language to obtain another acoustic model that is adapted to the one or more environmental conditions and the location of the device.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer readable medium of claim 8, wherein the functions further comprise:
    - based on the one or more spoken utterances, determining one or more voice characteristics of a speaker of the one or more spoken utterances;
      
      determining from among the plurality of data sets at least one speaker data set based on the one or more voice characteristics of the speaker; and
      
      using the at least one speaker data set to further adapt the acoustic model for the particular language to the speaker.
  - 10. The non-transitory computer readable medium of claim 8, wherein the functions further comprise:
    - determining a type of wireless connection in use by the device;
      
      based on the type of wireless connection, determining from among the plurality of data sets at least one device data set; and
      
      using the at least one device data set to further adapt the acoustic model for the particular language.
  - 11. The non-transitory computer readable medium of claim 8, wherein the one or more environmental conditions include one or more weather conditions based on the location of the device, and the functions further comprises:
    - determining the one or more weather conditions of the location of the device;
      
      based on the one or more weather conditions, determining from among the plurality of data sets at least one weather data set; and
      
      using the at least one weather data set to further adapt the acoustic model for the particular language.
  - 12. The non-transitory computer readable medium of claim 8, wherein the function of using the at least one adaptation data set, adapting the acoustic model for the particular language to obtain another acoustic model that is adapted to the one or more environmental conditions comprises:
    - using one or more of the following techniques;
      
      maximum likelihood linear regression (MLLR), vocal tract length normalization (VTLN), or maximum a posteriori (MAP) adaptation.
  - 13. The non-transitory computer readable medium of claim 8, wherein the functions further comprise:
    - determining a rate of movement of the device;
      
      adapting the acoustic model for the particular language based on the rate of movement of the device.
  - 14. The non-transitory computer readable medium of claim 8, wherein the function for determining the acoustic model for the particular language based on the identified language comprises:
    - sending the one or more spoken utterances to a server; and
      
      receiving from the server the acoustic model.

15. A system, comprising:
- at least one processor; and
  
  data storage comprising program instructions executable by the at least one processor to cause the system to perform functions comprising;
  
  receiving, via a device, one or more spoken utterances;
  
  based on the one or more spoken utterances, identifying a language of the one or more spoken utterances;
  
  determining an acoustic model for a particular language based on the identified language, wherein the acoustic model for the particular language is configured for use in speech recognition;
  
  determining a location of the device;
  
  determining one or more environmental conditions regarding an environment of the location of the device;
  
  determining from among a plurality of data sets at least one adaptation data set based on the one or more environmental conditions and the location of the device, wherein the at least one adaptation data set includes information that enables recognition of speech preferences associated with the location; and
  
  using the at least one adaptation data set, adapting the acoustic model for the particular language to obtain another acoustic model that is adapted to the one or more environmental conditions and the location of the device.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the functions further comprise:
    - based on the one or more spoken utterances, determining one or more voice characteristics of a speaker of the one or more spoken utterances;
      
      determining from among the plurality of data sets at least one speaker data set based on the one or more voice characteristics of the speaker; and
      
      using the at least one speaker data set to further adapt the acoustic model for the particular language to the speaker.
  - 17. The system of claim 15, wherein the functions further comprise:
    - determining a type of wireless connection in use by the device;
      
      based on the type of wireless connection, determining from among the plurality of data sets at least one device data set; and
      
      using the at least one device data set to further adapt the acoustic model for the particular language.
  - 18. The system of claim 15, wherein the one or more environmental conditions include one or more weather conditions based on the location of the device, and the functions further comprises:
    - determining the one or more weather conditions of the location of the device;
      
      based on the one or more weather conditions, determining from among the plurality of data sets at least one weather data set; and
      
      using the at least one weather data set to further adapt the acoustic model for the particular language.
  - 19. The system of claim 15, wherein the functions further comprise:
    - determining a rate of movement of the device;
      
      adapting the acoustic model for the particular language based on the rate of movement of the device.
  - 20. The system of claim 15, wherein the function for determining the acoustic model for the particular language based on the identified language comprises:
    - sending the one or more spoken utterances to a server; and
      
      receiving from the server the acoustic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Mengibar, Pedro J. Moreno, Weinstein, Eugene
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US13/735,592
Time in Patent Office

1,009 Days
Field of Search

704/246, 704/247, 704/251, 704/252, 704/255
US Class Current

1/1
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/07   to the speaker

G10L 15/20   Speech recognition techniqu...

G10L 2015/226   using non-speech characteri...

Environmentally aware speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Environmentally aware speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links