DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING
First Claim
1. A computer-implemented method comprising:
- obtaining audio samples of speech corresponding to a particular term;
obtaining candidate pronunciations for the particular term;
generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample;
aggregating the scores for each candidate pronunciation; and
adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
228 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining audio samples of speech corresponding to a particular term; obtaining candidate pronunciations for the particular term; generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample; aggregating the scores for each candidate pronunciation; and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining audio samples of speech corresponding to a particular term; obtaining candidate pronunciations for the particular term; generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample; aggregating the scores for each candidate pronunciation; and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining audio samples of speech corresponding to a particular term; obtaining candidate pronunciations for the particular term; generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample; aggregating the scores for each candidate pronunciation; and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification