DISAMBIGUATING HETERONYMS IN SPEECH SYNTHESIS
First Claim
1. A method for operating an intelligent automated assistant, the method comprising:
- at an electronic device with a processor and memory storing one or more programs for execution by the processor;
receiving, from a user, a speech input containing a heteronym and one or more additional words;
processing the speech input using an automatic speech recognition system to determine at least one of;
a phonemic string corresponding to the heteronym as pronounced by the user in the speech input; and
a frequency of occurrence of an n-gram with respect to a corpus, wherein the n-gram includes the heteronym and the one or more additional words;
determining a correct pronunciation of the heteronym based on at least one of the phonemic string and the frequency of occurrence of the n-gram;
generating a dialogue response to the speech input, wherein the dialogue response includes the heteronym; and
outputting the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the determined correct pronunciation.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for disambiguating heteronyms in speech synthesis are provided. In one example process, a speech input containing a heteronym can be received from a user. The speech input can be processed using an automatic speech recognition system to determine a phonemic string corresponding to the heteronym as pronounced by the user in the speech input. A correct pronunciation of the heteronym can be determined based on at least one of the phonemic string or using an n-gram language model of the automatic speech recognition system. A dialogue response to the speech input can be generated where the dialogue response can include the heteronym. The dialogue response can be outputted as a speech output. The heteronym in the dialogue response can be pronounced in the speech output according to the correct pronunciation.
-
Citations
25 Claims
-
1. A method for operating an intelligent automated assistant, the method comprising:
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving, from a user, a speech input containing a heteronym and one or more additional words; processing the speech input using an automatic speech recognition system to determine at least one of; a phonemic string corresponding to the heteronym as pronounced by the user in the speech input; and a frequency of occurrence of an n-gram with respect to a corpus, wherein the n-gram includes the heteronym and the one or more additional words; determining a correct pronunciation of the heteronym based on at least one of the phonemic string and the frequency of occurrence of the n-gram; generating a dialogue response to the speech input, wherein the dialogue response includes the heteronym; and outputting the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the determined correct pronunciation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
14. A method for operating an intelligent automated assistant, the method comprising:
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving, from a user, a speech input; processing the speech input using an automatic speech recognition system to determine a text string corresponding to the speech input; determining an actionable intent based on the text string; generating a dialogue response to the speech input based on the actionable intent, wherein the dialogue response includes a heteronym; determining a correct pronunciation of the heteronym using an n-gram language model of the automatic speech recognition system and based on the heteronym and one or more additional words in the dialogue response; and outputting the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the determined correct pronunciation. - View Dependent Claims (15, 16, 17, 18, 19)
-
20. A method for operating an intelligent automated assistant, the method comprising:
at an electronic device with a processor and memory storing one or more programs for execution by the processor; receiving, from a user, a speech input containing a heteronym and one or more additional words; processing the speech input using an automatic speech recognition system to determine a phonemic string corresponding to the heteronym as pronounced by the user in the speech input; generating a dialogue response to the speech input, wherein the dialogue response includes the heteronym; and outputting the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the phonemic string. - View Dependent Claims (21, 22, 23)
-
24. A non-transitory computer-readable storage medium comprising instructions for causing one or more processors to:
-
receive, from a user, a speech input containing a heteronym and one or more additional words; process the speech input using an automatic speech recognition system to determine a text string corresponding to the speech input, wherein processing the speech input includes determining at least one of; a phonemic string corresponding to the heteronym as pronounced by the user in the speech input; and a frequency of occurrence of an n-gram with respect to a corpus, wherein the n-gram includes the heteronym and the one or more additional words; determine an actionable intent based on the text string; determine a correct pronunciation of the heteronym based on at least one of the phonemic string, the frequency of occurrence of the n-gram, and the actionable intent; generate a dialogue response to the speech input, wherein the dialogue response includes the heteronym; and output the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the determined correct pronunciation.
-
-
25. An electronic device comprising:
-
one or more processors; memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving, from a user, a speech input containing a heteronym and one or more additional words; processing the speech input using an automatic speech recognition system to determine a text string corresponding to the speech input, wherein processing the speech input includes determining at least one of; a phonemic string corresponding to the heteronym as pronounced by the user in the speech input; and a frequency of occurrence of an n-gram with respect to a corpus, wherein the n-gram includes the heteronym and the one or more additional words; determining an actionable intent based on the text string; determining a correct pronunciation of the heteronym based on at least one of the phonemic string, the frequency of occurrence of the n-gram, and the actionable intent; generating a dialogue response to the speech input, wherein the dialogue response includes the heteronym; and outputting the dialogue response as a speech output, wherein the heteronym in the dialogue response is pronounced in the speech output according to the determined correct pronunciation.
-
Specification