Annotating maps with user-contributed pronunciations
First Claim
Patent Images
1. A method of selecting a user spoken utterance, the method comprising:
- receiving, at a processing device, a set of user spoken utterances of a text string, each spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest;
generating, at the processing device, a speech model based on the text string and the set of received user spoken utterances from each corresponding user;
comparing the generated speech model to each pronunciation received in the set of user spoken utterances;
selecting a given one of the received user spoken utterances, as a most typical pronunciation of the text string, based on measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either;
a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, ora sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances;
annotating a mapping application with the selected pronunciation of the selected spoken utterance; and
providing audio information of the selected user spoken utterance to a user device in response to selection of the location or point of interest by a user.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user'"'"'s request.
-
Citations
25 Claims
-
1. A method of selecting a user spoken utterance, the method comprising:
-
receiving, at a processing device, a set of user spoken utterances of a text string, each spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest; generating, at the processing device, a speech model based on the text string and the set of received user spoken utterances from each corresponding user; comparing the generated speech model to each pronunciation received in the set of user spoken utterances; selecting a given one of the received user spoken utterances, as a most typical pronunciation of the text string, based on measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either; a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, or a sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances; annotating a mapping application with the selected pronunciation of the selected spoken utterance; and providing audio information of the selected user spoken utterance to a user device in response to selection of the location or point of interest by a user. - View Dependent Claims (2, 3, 4, 5, 6, 24, 25)
-
-
7. A method of selecting a user spoken utterance, comprising:
-
receiving, at a processing device, a plurality of user spoken utterances of a text string, each user spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest; the processing device generating a speech model for each received user spoken utterance from each corresponding user; the processing device measuring a distance value between each of the generated speech models and every other generated speech model; the processing device selecting a given one of the user spoken utterances, as a most typical pronunciation of the text string, based on the measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either; a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, or a sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances; annotating a mapping application with the selected user spoken utterance; and
providing audible output of the text string based on the selected user spoken utterance in response to a user input.- View Dependent Claims (8, 9)
-
-
10. A method of providing audible output of a location name or a point of interest on an electronically generated map, the method comprising:
-
receiving, at a processing device, a plurality of user spoken utterances of the location name or point of interest, each spoken utterance being a pronunciation of the location name or point of interest by a corresponding different user; the processing device selecting one of the user spoken utterances, as a most typical pronunciation of the text string, from the plurality of user spoken utterances based on acoustic features of the pronunciations of each of the plurality of user spoken utterances, the selection based on the acoustic features of the pronunciations including one or more of; a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, or a sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances; annotating the electronically generated map with the selected user spoken utterance; and providing audio output of the location name or point of interest to a user device based on the selected user spoken utterance. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A server apparatus for providing audible data to a user device on a network, the server apparatus comprising:
-
a memory for storing information including map information; and a processor operatively coupled to the memory and being configured to; receive a plurality of user spoken utterances of a location name or point of interest, each spoken utterance being a pronunciation of the location name or point of interest by a corresponding different user; select one of the user spoken utterances from the plurality of user spoken utterances, as a most typical pronunciation of the text string, based on acoustic features of the pronunciations of each of the plurality of user spoken utterances, the selection based on the acoustic features of the pronunciations including one or more of; a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, or a sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances; annotate a computer generated map with the selected user spoken utterance; and
provide audio output of the location name or point of interest to the user device based on the selected user spoken utterance. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification