Annotating maps with user-contributed pronunciations
First Claim
Patent Images
1. A method comprising:
- receiving, by one or more processors, a text string for a location or point of interest;
receiving, by the one or more processors, a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest;
adapting, by the one or more processors, a phoneme-based speech model based on the text string with the received plurality of speech signals;
determining, by the one or more processors, a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model;
the one or more processors selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined scores;
annotating, by the one or more processors, an electronic map including the location or point of interest with the most common pronunciation of the text string; and
providing, by the one or more processors, audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user'"'"'s request.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, by one or more processors, a text string for a location or point of interest; receiving, by the one or more processors, a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest; adapting, by the one or more processors, a phoneme-based speech model based on the text string with the received plurality of speech signals; determining, by the one or more processors, a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model; the one or more processors selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined scores; annotating, by the one or more processors, an electronic map including the location or point of interest with the most common pronunciation of the text string; and providing, by the one or more processors, audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 19, 20)
-
-
8. A server apparatus for providing audible data to user devices on a network, the server apparatus comprising:
-
a memory storing map information of a location or point of interest associated with an electronic map; and one or more processors operatively coupled to the memory and being configured to; receive a text string for the location or point of interest; receive a plurality of speech signals, each speech signal in the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest; adapt a phoneme-based speech model based on the text string with the received plurality of speech signals; determine a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model; select one of the plurality of speech signals as a most common pronunciation of the text string based on the determined score; annotate the electronic map with the most common pronunciation of the text string; and provide audio information of the most common pronunciation to a given user device for a user of the given user device to hear the most common pronunciation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable recording medium storing instructions thereon, the instructions, when executed by one or more processors, cause the one or more processors to perform a method of:
-
receiving a text string for a location or point of interest; receiving a plurality of speech signals, each speech signal of the plurality of speech signals comprising a user pronunciation of the text string for the location or point of interest; adapting a phoneme-based speech model based on the text string with the received plurality of speech signals; determining a score for each of the received plurality of speech signals based on a similarity of each speech signal in the plurality of speech signals with the adapted phoneme-based speech model; selecting one of the plurality of speech signals as a most common pronunciation of the text string based on the determined score; annotating the electronic map with the most common pronunciation of the text string; and providing audio information of the most common pronunciation to a given client device for a user of the given client device to hear the most common pronunciation. - View Dependent Claims (16, 17, 18)
-
Specification