Unsupervised data-driven pronunciation modeling
First Claim
1. A computerized method comprising:
- receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
reproducing the pronunciation data for the out-of-vocabulary word as an audible signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Pronunciation for an input word is modeled by generating a set of candidate phoneme strings having pronunciations close to the input word in an orthographic space. Phoneme sub-strings in the set are selected as the pronunciation. In one aspect, a first closeness measure between phoneme strings for words chosen from a dictionary and contexts within the input word is used to determine the candidate phoneme strings. The words are chosen from the dictionary based on a second closeness measure between a representation of the input word in the orthographic space and orthographic anchors corresponding to the words in the dictionary. In another aspect, the phoneme sub-strings are selected by aligning the candidate phoneme strings on common phoneme sub-strings to produce an occurrence count, which is used to choose the phoneme sub-strings for the pronunciation.
-
Citations
62 Claims
-
1. A computerized method comprising:
-
receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and reproducing the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computerized method comprising:
-
receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary; selecting phoneme sub-strings from the dictionary according to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographic vector space; and generating a pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computerized method comprising:
-
storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and transmitting the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (18, 19, 20)
-
-
21. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a processing system to perform a method comprising:
-
receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings have pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and reproducing the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A computer-readable storage medium storing computer-executable instructions comprising:
-
an instruction to receive pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings have pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and an instruction to reproduce the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a processing system to perform a method comprising:
-
receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary; selecting phoneme sub-strings from the dictionary accord to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographic vector space; and generating pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings. - View Dependent Claims (32, 33, 34, 35)
-
-
36. A computer-readable storage medium storing computer-executable instructions comprising:
-
an instruction to receive an orthographical vector space comprising a vector representation of an out-of vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary; an instruction to select phoneme sub-strings from the dictionary accord to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words; and an instruction to generate a pronunciation data for the out-of vocabulary word from the selected phoneme sub-strings. - View Dependent Claims (37, 38, 39, 40)
-
-
41. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a data processing system to perform a method comprising:
-
storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical space defined by a dictionary; and transmitting the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (42, 43, 44)
-
-
45. A computer-readable storage medium storing computer-executable instructions comprising:
-
an instruction to store pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and an instruction to transmit the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (46, 47, 48)
-
-
49. An apparatus comprising:
-
means for receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and means for reproducing the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (50, 51, 52, 53)
-
-
54. An apparatus comprising:
-
means for receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary; means for selecting phoneme sub-strings from the dictionary according to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographical vector space; and means for generating a pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings. - View Dependent Claims (55, 56, 57, 58)
-
-
59. An apparatus comprising:
-
means for storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and means for transmitting the pronunciation data for the out-of-vocabulary word as an audible signal. - View Dependent Claims (60, 61, 62)
-
Specification