Acoustic model adaptation using geographic information
First Claim
Patent Images
1. A system comprising:
- one or more computers; and
a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising;
receiving an audio signal that corresponds to an utterance recorded by a mobile device,determining a geographic location associated with the mobile device,determining a geographic location type associated with the geographic location,selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device;
adapting one or more acoustic models for the geographic location type, andperforming speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.
-
Citations
16 Claims
-
1. A system comprising:
-
one or more computers; and a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, determining a geographic location type associated with the geographic location, selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device; adapting one or more acoustic models for the geographic location type, and performing speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer non-transitory storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving an audio signal that corresponds to an utterance recorded by a mobile device; determining a geographic location associated with the mobile device; determining a geographic location type associated with the geographic location; after receiving the audio signal that corresponds to the utterance, selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device; adapting one or more acoustic models for the geographic location type using the subset of geotagged audio signals selected after receiving the audio signal that corresponds to the utterance; and performing speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.
-
-
16. A computer-implemented method comprising:
-
receiving an audio signal that corresponds to an utterance recorded by a mobile device; receiving a data tar associated with the audio signal, wherein the data tar identifies an accent of a user of the mobile device; determining a geographic location associated with the mobile device; determining a geographic location type associated with the geographic location; selecting a subset of geotagged audio signals based on the geographic location type associated with the geographic location of the mobile device, and based on context data associated with the utterance, wherein the context data comprises data that references a time or a date when the utterance was recorded by the mobile device, data that references a speed or an amount of motion measured by the mobile device when the utterance was recorded, data that references settings of the mobile device or data that references a type of the mobile device; adapting one or more acoustic models for the geographic location type based on the accent of the user of the mobile device; and performing speech recognition on the audio signal using the one or more acoustic models that are adapted for the geographic location type.
-
Specification