Annotating maps with user-contributed pronunciations

US 8,949,125 B1
Filed: 06/16/2010
Issued: 02/03/2015
Est. Priority Date: 06/16/2010
Status: Active Grant

First Claim

Patent Images

1. A method of selecting a user spoken utterance, the method comprising:

receiving, at a processing device, a set of user spoken utterances of a text string, each spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest;

generating, at the processing device, a speech model based on the text string and the set of received user spoken utterances from each corresponding user;

comparing the generated speech model to each pronunciation received in the set of user spoken utterances;

selecting a given one of the received user spoken utterances, as a most typical pronunciation of the text string, based on measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either;

a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;

annotating a mapping application with the selected pronunciation of the selected spoken utterance; and

providing audio information of the selected user spoken utterance to a user device in response to selection of the location or point of interest by a user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user'"'"'s request.

Citations

25 Claims

1. A method of selecting a user spoken utterance, the method comprising:
- receiving, at a processing device, a set of user spoken utterances of a text string, each spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest;
  
  generating, at the processing device, a speech model based on the text string and the set of received user spoken utterances from each corresponding user;
  
  comparing the generated speech model to each pronunciation received in the set of user spoken utterances;
  
  selecting a given one of the received user spoken utterances, as a most typical pronunciation of the text string, based on measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either;
  
  a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;
  
  annotating a mapping application with the selected pronunciation of the selected spoken utterance; and
  
  providing audio information of the selected user spoken utterance to a user device in response to selection of the location or point of interest by a user.
- View Dependent Claims (2, 3, 4, 5, 6, 24, 25)
- - 2. The method of claim 1, wherein generating a speech model based on the text string and the set of user spoken utterances further comprises obtaining a Hidden Markov Model of the text string.
  - 3. The method of claim 2, wherein obtaining the Hidden Markov Model of the text string includes training a set of phoneme Hidden Markov Models with speech data.
  - 4. The method of claim 1, wherein generating the speech model based on the text string and the user spoken utterances includes adapting a generic speech model based on the text string with the received set of user spoken utterances of the text string.
  - 5. The method of claim 4, wherein adapting the generic speech model comprises adapting by Maximum A-Posteriori Estimation with the set of user spoken utterances of the text string.
  - 6. The method of claim 4, wherein adapting the generic speech model comprises adapting by Maximum Likelihood Linear Regression with the set of user spoken utterances of the text string.
  - 24. The method of claim 4, further comprising, prior to adapting the generic speech model, normalizing the user spoken utterances and reducing a vocal tract length effect of the user spoken utterances.
  - 25. The method of claim 1, further comprising designating one or more other ones of the received user spoken utterances as variations of the typical pronunciation.

7. A method of selecting a user spoken utterance, comprising:
- receiving, at a processing device, a plurality of user spoken utterances of a text string, each user spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest;
  
  the processing device generating a speech model for each received user spoken utterance from each corresponding user;
  
  the processing device measuring a distance value between each of the generated speech models and every other generated speech model;
  
  the processing device selecting a given one of the user spoken utterances, as a most typical pronunciation of the text string, based on the measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either;
  
  a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;
  
  annotating a mapping application with the selected user spoken utterance; and
  
  providing audible output of the text string based on the selected user spoken utterance in response to a user input.
- View Dependent Claims (8, 9)
- - 8. The method of claim 7, wherein the generated speech models are Hidden Markov Models.
  - 9. The method of claim 7, wherein the distance value is a minimum edit distance.

10. A method of providing audible output of a location name or a point of interest on an electronically generated map, the method comprising:
- receiving, at a processing device, a plurality of user spoken utterances of the location name or point of interest, each spoken utterance being a pronunciation of the location name or point of interest by a corresponding different user;
  
  the processing device selecting one of the user spoken utterances, as a most typical pronunciation of the text string, from the plurality of user spoken utterances based on acoustic features of the pronunciations of each of the plurality of user spoken utterances, the selection based on the acoustic features of the pronunciations including one or more of;
  
  a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;
  
  annotating the electronically generated map with the selected user spoken utterance; and
  
  providing audio output of the location name or point of interest to a user device based on the selected user spoken utterance.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, wherein selecting one of the user spoken utterances comprises:
    - generating the speech model for each received user spoken utterance of the plurality of user spoken utterances; and
      
      measuring a distance value between each of the generated speech models and every other generated speech model;
      
      wherein selecting one of the plurality of user spoken utterances is based on the measured distance values between the speech model for the selected user spoken utterance and every other generated speech model.
  - 12. The method of claim 11, wherein the generated speech models are Hidden Markov Models.
  - 13. The method of claim 11, wherein the distance value is a minimum edit distance.
  - 14. The method of claim 10, wherein selecting one of the user spoken utterances comprises:
    - generating the speech model based on the location name and the plurality of user spoken utterances; and
      
      comparing the generated speech model to each of the plurality of user spoken utterances;
      
      wherein selecting the user spoken utterance is based on a degree of similarity between the selected user spoken utterance and the generated speech model.
  - 15. The method of claim 14, wherein generating the speech model comprises generating a Hidden Markov Model of the location name.
  - 16. The method of claim 14, wherein generating the speech model comprises adapting a generic speech model of the location name by Maximum A-Posteriori Estimation or Maximum Likelihood Linear Regression with the plurality of user spoken utterances of the location name.

17. A server apparatus for providing audible data to a user device on a network, the server apparatus comprising:
- a memory for storing information including map information; and
  
  a processor operatively coupled to the memory and being configured to;
  
  receive a plurality of user spoken utterances of a location name or point of interest, each spoken utterance being a pronunciation of the location name or point of interest by a corresponding different user;
  
  select one of the user spoken utterances from the plurality of user spoken utterances, as a most typical pronunciation of the text string, based on acoustic features of the pronunciations of each of the plurality of user spoken utterances, the selection based on the acoustic features of the pronunciations including one or more of;
  
  a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;
  
  annotate a computer generated map with the selected user spoken utterance; and
  
  provide audio output of the location name or point of interest to the user device based on the selected user spoken utterance.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The server apparatus of claim 17, wherein the processor is further configured to:
    - generate the speech model based on the location name and the plurality of user spoken utterances;
      
      compare the generated speech model to each pronunciation in the plurality of user spoken utterances; and
      
      select one of the user spoken utterances based on a degree of similarity of the pronunciation of that user spoken utterance to the generated speech model.
  - 19. The server apparatus of claim 18, wherein the generated speech model is a Hidden Markov Model.
  - 20. The server apparatus of claim 18, wherein the processor is further configured to adapt a generic speech model based on the location name by Maximum A-Posteriori Estimation or Maximum Likelihood Linear Regression with the plurality of user spoken utterances of the location name.
  - 21. The server apparatus of claim 17, wherein the processor is further configured to:
    - generate the speech model for each of the plurality of user spoken utterances; and
      
      measure a distance value between each of the generated speech models and every other generated speech model;
      
      wherein the user spoken utterance is selected based on the measured distance values between the speech model for the selected user spoken utterance and every other generated speech model.
  - 22. The server apparatus of claim 21, wherein the generated speech models are Hidden Markov Models.
  - 23. The server apparatus of claim 21, wherein the distance value is a minimum edit distance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Chechik, Gal
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US12/816,563
Time in Patent Office

1,693 Days
Field of Search

704/209, 704/243, 704/249, 704/246, 704/256, 704/251, 704/9, 704/235, 704/270, 704/252, 704/244, 704/233, 704/234, 704/221, 704/231, 707/503, 701/211, 701/208, 701/532, 701/429, 715/209, 700/1
US Class Current

704/243
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/06   Elementary speech units use...

G10L 15/08   Speech classification or se...

G10L 2015/085   Methods for reducing search...

Annotating maps with user-contributed pronunciations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Annotating maps with user-contributed pronunciations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links