Annotating maps with user-contributed pronunciations

US 9,672,816 B1
Filed: 01/30/2015
Issued: 06/06/2017
Est. Priority Date: 06/16/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by one or more processors, a text string for a location or point of interest;

receiving, by the one or more processors, a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest;

adapting, by the one or more processors, a phoneme-based speech model based on the text string with the received plurality of speech signals;

determining, by the one or more processors, a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model;

the one or more processors selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined scores;

annotating, by the one or more processors, an electronic map including the location or point of interest with the most common pronunciation of the text string; and

providing, by the one or more processors, audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user'"'"'s request.

Citations

20 Claims

1. A method comprising:
- receiving, by one or more processors, a text string for a location or point of interest;
  
  receiving, by the one or more processors, a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest;
  
  adapting, by the one or more processors, a phoneme-based speech model based on the text string with the received plurality of speech signals;
  
  determining, by the one or more processors, a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model;
  
  the one or more processors selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined scores;
  
  annotating, by the one or more processors, an electronic map including the location or point of interest with the most common pronunciation of the text string; and
  
  providing, by the one or more processors, audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 19, 20)
- - 2. The method of claim 1, wherein the audio information is provided in response to a selection of the location or point of interest is related to an image or other object indicating with the location or point of interest associated with the electronic map.
  - 3. The method of claim 1, further comprising selecting one or more additional ones of the plurality of speech signals as variant pronunciations of the text string.
  - 4. The method of claim 3, further comprising annotating the electronic map with the one or more variant pronunciations of the text string.
  - 5. The method of claim 3, further comprising providing information to a given client device for a user of the given client device to hear the most common pronunciation or the one or more variant pronunciations.
  - 6. The method of claim 1, wherein the audio information is provided upon any of:
    - selection of an image of the location or point of interest,selection of a name of the location or point of interest, ortyping in text indicating the name of the location or point of interest.
  - 7. The method of claim 1, wherein audio information of the most common pronunciation is provided to a given client device for presentation to a user independent of a visual form of the location or point of interest displayed on the given client device.
  - 19. The method of claim 1, further comprising, prior to adapting the phoneme-based speech model, normalizing each user pronunciation and reducing a vocal tract length effect of the user pronunciation.
  - 20. The method of claim 1, wherein determining the score for each of the received plurality of speech signals includes, for each speech signal, determining at least one of a phoneme log-likelihood or a phoneme log-posterior probability score.

8. A server apparatus for providing audible data to user devices on a network, the server apparatus comprising:
- a memory storing map information of a location or point of interest associated with an electronic map; and
  
  one or more processors operatively coupled to the memory and being configured to;
  
  receive a text string for the location or point of interest;
  
  receive a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest;
  
  adapt a phoneme-based speech model based on the text string with the received plurality of speech signals;
  
  determine a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model;
  
  select one of the plurality of speech signals as a most common pronunciation of the text string based on the determined score;
  
  annotate the electronic map with the most common pronunciation of the text string; and
  
  provide audio information of the most common pronunciation to a given user device for a user of the given user device to hear the most common pronunciation.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The server apparatus of claim 8, wherein the one or more processors are configured to provide the most common pronunciation of the text string to the given user device in response to a selection of the location or point of interest.
  - 10. The server apparatus of claim 8, wherein the one or more processors are further configured to select one or more additional ones of the plurality of speech signals as variant pronunciations of the text string.
  - 11. The server apparatus of claim 10, wherein the one or more processors are further configured to annotate the electronic map with the one or more variant pronunciations of the text string.
  - 12. The server apparatus of claim 10, wherein the one or more processors are further configured to provide information to a given user device for a user of the given user device to hear at least one of the one or more variant pronunciations.
  - 13. The server apparatus of claim 8, wherein the audio information is provided upon any of:
    - selection of an image of the location or point of interest,selection of a name of the location or point of interest, ortyping in text indicating the name of the location or point of interest.
  - 14. The server apparatus of claim 8, wherein the most common pronunciation is provided to a given user device for presentation to a user independent of a visual form of the location or point of interest displayed on the given user device.

15. A non-transitory computer-readable recording medium storing instructions thereon, the instructions, when executed by one or more processors, cause the one or more processors to perform a method of:
- receiving a text string for a location or point of interest;
  
  receiving a plurality of speech signals, each speech signal of the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest;
  
  adapting a phoneme-based speech model based on the text string with the received plurality of speech signals;
  
  determining a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model;
  
  selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined score;
  
  annotating the electronic map with the most common pronunciation of the text string; and
  
  providing audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation.
- View Dependent Claims (16, 17, 18)
- - 16. The non-transitory computer-readable recording medium of claim 15, wherein the method further comprises selecting one or more additional ones of the plurality of speech signals as variant pronunciations of the text string.
  - 17. The non-transitory computer-readable recording medium of claim 16, wherein the method further comprises annotating the electronic map with the one or more variant pronunciations of the text string.
  - 18. The non-transitory computer-readable recording medium of claim 16, wherein the method further comprises providing information to a given client device for a user of the given client device to hear the one or more variant pronunciations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Chechik, Gal
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US14/609,858
Time in Patent Office

858 Days
Field of Search

704243, 704 2, 704260, 704221, 704251, 704 9, 704236, 704246, 704200, 704254, 704267, 704231, 704277, 701532, 379 8816
US Class Current
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/06   Elementary speech units use...

G10L 15/08   Speech classification or se...

G10L 2015/085   Methods for reducing search...

Annotating maps with user-contributed pronunciations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Annotating maps with user-contributed pronunciations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links