DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING

US 20150006178A1
Filed: 06/28/2013
Published: 01/01/2015
Est. Priority Date: 06/28/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining audio samples of speech corresponding to a particular term;

obtaining candidate pronunciations for the particular term;

generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample;

aggregating the scores for each candidate pronunciation; and

adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.

228 Citations

20 Claims

1. A computer-implemented method comprising:
- obtaining audio samples of speech corresponding to a particular term;
  
  obtaining candidate pronunciations for the particular term;
  
  generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample;
  
  aggregating the scores for each candidate pronunciation; and
  
  adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein adding one or more candidate pronunciations for the particular term comprises:
    - identifying a candidate pronunciation of the candidate pronunciations with an aggregated score that indicates a closer level of similarity between the candidate pronunciation and the audio samples than levels of similarity between the other candidate pronunciations and the audio samples; and
      
      adding the identified candidate expression to the pronunciation lexicon.
  - 3. The method of claim 1, wherein adding one or more candidate pronunciations for the particular term comprises:
    - adding all candidate pronunciations and the aggregated scores for the candidate pronunciations to the pronunciation lexicon.
  - 4. The method of claim 1, wherein obtaining the candidate pronunciations for the particular term comprises:
    - generating the candidate pronunciations for the particular term based on letters, graphemes, or other units, in the particular term and one or more rules for pronunciation.
  - 5. The method of claim 1, wherein obtaining the candidate pronunciations for the particular term comprises:
    - obtaining a previous set of candidate pronunciations for the particular term;
      
      generating, for each candidate pronunciation of the particular term in the previous set and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation of the previous set and the audio sample;
      
      aggregating the scores for each candidate pronunciation of the previous set;
      
      determining that no aggregated score for a candidate pronunciation of the previous set reflects a level of similarity between the candidate pronunciation and the audio samples that is closer than levels of similarity between the other candidate pronunciations and the audio samples by a predetermined amount; and
      
      obtaining the candidate pronunciations based on the candidate pronunciation with the aggregated score that indicates a closer level of similarity between the audio samples and the other candidate pronunciations.
  - 6. The method of claim 1, wherein obtaining audio samples comprises:
    - accessing query transcription logs;
      
      identifying the particular term in the query transcription logs; and
      
      identifying one or more portions of query audio logs corresponding to the identified particular term in the query transcription logs as the audio samples.
  - 7. The method of claim 1, wherein obtaining audio samples comprises:
    - receiving audio samples of multiple different people speaking the particular term in response to a prompt to speak the particular term.
  - 8. The method of claim 1, further comprising:
    - determining the pronunciation lexicon does not include an accurate pronunciation for the particular term,wherein obtaining audio samples of speech corresponding to a particular term is in response to determining the pronunciation lexicon does not include an accurate pronunciation for the particular term.

9. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- obtaining audio samples of speech corresponding to a particular term;
  
  obtaining candidate pronunciations for the particular term;
  
  generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample;
  
  aggregating the scores for each candidate pronunciation; and
  
  adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The medium of claim 9, wherein adding one or more candidate pronunciations for the particular term comprises:
    - identifying a candidate pronunciation of the candidate pronunciations with an aggregated score that indicates a closer level of similarity between the candidate pronunciation and the audio samples than levels of similarity between the other candidate pronunciations and the audio samples; and
      
      adding the identified candidate expression to the pronunciation lexicon.
  - 11. The medium of claim 9, wherein adding one or more candidate pronunciations for the particular term comprises:
    - adding all candidate pronunciations and the aggregated scores for the candidate pronunciations to the pronunciation lexicon.
  - 12. The medium of claim 9, wherein obtaining the candidate pronunciations for the particular term comprises:
    - obtaining a previous set of candidate pronunciations for the particular term;
      
      generating, for each candidate pronunciation of the particular term in the previous set and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation of the previous set and the audio sample;
      
      aggregating the scores for each candidate pronunciation of the previous set;
      
      determining that no aggregated score for a candidate pronunciation of the previous set reflects a level of similarity between the candidate pronunciation and the audio samples that is closer than levels of similarity between the other candidate pronunciations and the audio samples by a predetermined amount; and
      
      obtaining the candidate pronunciations based on the candidate pronunciation with the aggregated score that indicates a closer level of similarity between the audio samples and the other candidate pronunciations.
  - 13. The medium of claim 9, wherein obtaining audio samples comprises:
    - accessing query transcription logs;
      
      identifying the particular term in the query transcription logs; and
      
      identifying one or more portions of query audio logs corresponding to the identified particular term in the query transcription logs as the audio samples.
  - 14. The medium of claim 9, wherein obtaining audio samples comprises:
    - receiving audio samples of multiple different people speaking the particular term in response to a prompt to speak the particular term.

15. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  obtaining audio samples of speech corresponding to a particular term;
  
  obtaining candidate pronunciations for the particular term;
  
  generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation and the audio sample;
  
  aggregating the scores for each candidate pronunciation; and
  
  adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein adding one or more candidate pronunciations for the particular term comprises:
    - identifying a candidate pronunciation of the candidate pronunciations with an aggregated score that indicates a closer level of similarity between the candidate pronunciation and the audio samples than levels of similarity between the other candidate pronunciations and the audio samples; and
      
      adding the identified candidate expression to the pronunciation lexicon.
  - 17. The system of claim 15, wherein adding one or more candidate pronunciations for the particular term comprises:
    - adding all candidate pronunciations and the aggregated scores for the candidate pronunciations to the pronunciation lexicon.
  - 18. The system of claim 15, wherein obtaining the candidate pronunciations for the particular term comprises:
    - obtaining a previous set of candidate pronunciations for the particular term;
      
      generating, for each candidate pronunciation of the particular term in the previous set and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between the candidate pronunciation of the previous set and the audio sample;
      
      aggregating the scores for each candidate pronunciation of the previous set;
      
      determining that no aggregated score for a candidate pronunciation of the previous set reflects a level of similarity between the candidate pronunciation and the audio samples that is closer than levels of similarity between the other candidate pronunciations and the audio samples by a predetermined amount; and
      
      obtaining the candidate pronunciations based on the candidate pronunciation with the aggregated score that indicates a closer level of similarity between the audio samples and the other candidate pronunciations.
  - 19. The system of claim 15, wherein obtaining audio samples comprises:
    - accessing query transcription logs;
      
      identifying the particular term in the query transcription logs; and
      
      identifying one or more portions of query audio logs corresponding to the identified particular term in the query transcription logs as the audio samples.
  - 20. The system of claim 15, wherein obtaining audio samples comprises:
    - receiving audio samples of multiple different people speaking the particular term in response to a prompt to speak the particular term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Peng, Fuchun, Beaufays, Francoise, Strope, Brian, Lei, Xin, Moreno Mengibar, Pedro J., Strohman, Trevor D.

Granted Patent

US 9,741,339 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G09B 17/006   with audible presentation o...

G10L 13/08   Text analysis or generation...

G10L 15/06   Creation of reference templ...

G10L 15/18   using natural language mode...

DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

228 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DATA DRIVEN PRONUNCIATION LEARNING WITH CROWD SOURCING

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

228 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links