Unsupervised data-driven pronunciation modeling

US 7,702,509 B2
Filed: 11/21/2006
Issued: 04/20/2010
Est. Priority Date: 09/13/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computerized method comprising:

receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and

reproducing the pronunciation data for the out-of-vocabulary word as an audible signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Pronunciation for an input word is modeled by generating a set of candidate phoneme strings having pronunciations close to the input word in an orthographic space. Phoneme sub-strings in the set are selected as the pronunciation. In one aspect, a first closeness measure between phoneme strings for words chosen from a dictionary and contexts within the input word is used to determine the candidate phoneme strings. The words are chosen from the dictionary based on a second closeness measure between a representation of the input word in the orthographic space and orthographic anchors corresponding to the words in the dictionary. In another aspect, the phoneme sub-strings are selected by aligning the candidate phoneme strings on common phoneme sub-strings to produce an occurrence count, which is used to choose the phoneme sub-strings for the pronunciation.

Citations

62 Claims

1. A computerized method comprising:
- receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  reproducing the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computerized method of claim 1 further comprising:
    - storing the pronunciation data for subsequent reproduction.
  - 3. The computerized method of claim 1, wherein the orthographic vector space comprises the vector representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 4. The computerized method of claim 3, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 5. The computerized method of claim 4, wherein the feature vector {tilde over (v)}_pfor the out-of-vocabulary word {tilde over (w)}_pis calculated as
    {tilde over (v)}_p={tilde over (v)}_pS={tilde over (w)}_p^TUwhere, S is a singular diagonal matrix, U is a left singular matrix, {tilde over (v)}_pis a vector within a right singular matrix V^Tcorresponding to the out-of-vocabulary word, and ^Tdenotes matrix transposition.
  - 6. The computerized method of claim 1, wherein the dictionary comprises phoneme strings for in-vocabulary words.

7. A computerized method comprising:
- receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary;
  
  selecting phoneme sub-strings from the dictionary according to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographic vector space; and
  
  generating a pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 8. The computerized method of claim 7, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 9. The computerized method of claim 8, wherein the feature vector {tilde over (v)}_pfor the out-of-vocabulary word {tilde over (w)}_pis calculated as
    {tilde over (v)}_p={tilde over (v)}_pS={tilde over (w)}_p^TUwhere, S is a singular diagonal matrix, U is a left singular matrix, {tilde over (v)}_pis a vector within a right singular matrix V^Tcorresponding to the out-of-vocabulary word, and ^Tdenotes matrix transposition.
  - 10. The computerized method of claim 7, wherein selecting phoneme sub-strings comprises:
    - forming an orthographic neighborhood from in-vocabulary words corresponding to orthographic anchors that satisfy the closeness measure; and
      
      creating a pronunciation data neighborhood from phoneme strings for the in-vocabulary words in the orthographic neighborhood, the phoneme strings having at least one phoneme sub-string in the dictionary.
  - 11. The computerized method of claim 10, wherein selecting phoneme sub-strings further comprises:
    - selecting phoneme strings in the pronunciation data neighborhood for each context within the out-of-vocabulary word;
      
      aligning the selected phoneme strings on common phoneme sub-strings; and
      
      selecting a phoneme sub-string for each context from the common phoneme sub-strings.
  - 12. The computerized method of claim 11 further comprising:
    - merging the phoneme sub-strings for adjacent contexts when two phoneme sub-strings overlap.
  - 13. The computerized method of claim 11, wherein aligning the phoneme strings comprises:
    - calculating a minimum cost alignment A(k, l) between two phoneme strings φ
      
      ₁. . . φ
      
      _k. . . φ
      
      _Kand ψ
      
      ₁. . . ψ
      
      _l. . . ψ
      
      _Lhaving a length K and L respectively.
  - 14. The computerized method of claim 13, wherein the minimum cost alignment is calculated as
    A(k,l)=min{A(k−
    - 1,l−
      
      1)+C(k,l),G(i,k),H(j,l)}where C(k, l) is the cost of substituting phoneme ψ
      
      _lfor phoneme ψ
      
      _k, g(i, k) is the cost of a gap φ
      
      _i. . . φ
      
      _k, h(j, l) is the cost of a gap ψ
      
      _j. . . ψ
      
      _i,
  - 15. The computerized method of claim 7, wherein the closeness measure comprises the cosine of the angle between the representation of the out-of-vocabulary word and an orthographic anchor.
  - 16. The computerized method of claim 15, wherein the cosine K between the representation of the out-of-vocabulary word {tilde over (v)}_pand an orthographic anchor v_jis calculated using

17. A computerized method comprising:
- storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  transmitting the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (18, 19, 20)
- - 18. The computerized method of claim 17, wherein the vector orthographic space comprises the vector representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 19. The computerized method of claim 18, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 20. The computerized method of claim 17, wherein the dictionary comprises phoneme strings for in-vocabulary words.

21. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a processing system to perform a method comprising:
- receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings have pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  reproducing the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The computer-readable storage medium of claim 21, wherein the method further comprises:
    - storing the pronunciation data for subsequent reproduction.
  - 23. The computer-readable storage medium of claim 21, wherein the orthographic vector space comprises the vector representation of the out-of-vocabulary word and orthographic anchors correspond to in-vocabulary words.
  - 24. The computer-readable storage medium of claim 23, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 25. The computer-readable storage medium of claim 21, wherein the dictionary comprises phoneme strings for in-vocabulary words.

26. A computer-readable storage medium storing computer-executable instructions comprising:
- an instruction to receive pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings have pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  an instruction to reproduce the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The computer-readable storage medium of claim 26 further comprising:
    - an instruction to store the pronunciation data for subsequent reproduction.
  - 28. The computer-readable storage medium of claim 26, wherein the orthographic vector space comprises the vector representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 29. The computer-readable storage medium of claim 28, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 30. The computer-readable storage medium of claim 26, wherein the dictionary comprises phoneme strings for in-vocabulary words.

31. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a processing system to perform a method comprising:
- receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary;
  
  selecting phoneme sub-strings from the dictionary accord to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographic vector space; and
  
  generating pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings.
- View Dependent Claims (32, 33, 34, 35)
- - 32. The computer-readable storage medium of claim 31, wherein the representation of the out-of vocabulary word and the orthographic anchors are feature vectors.
  - 33. The computer-readable storage medium of claim 31, wherein selecting phoneme sub-strings comprises:
    - forming an orthographic neighborhood from in-vocabulary words correspond to orthographic anchors that satisfy the closeness measure; and
      
      creating a pronunciation data neighborhood from phoneme strings for the in-vocabulary words in the orthographic neighborhood, the phoneme strings have at least one phoneme sub-string in the dictionary.
  - 34. The computer-readable storage medium of claim 33, wherein selecting phoneme sub-strings comprises:
    - selecting phoneme strings in the pronunciation data neighborhood for each context within the out-of-vocabulary word;
      
      aligning the selected phoneme strings on common phoneme sub-strings; and
      
      selecting a phoneme sub-string for each context from the common phoneme sub-strings.
  - 35. The computer-readable storage medium of claim 34, wherein the method further comprises:
    - merging the phoneme sub-strings for adjacent contexts when two phoneme sub-strings overlap.

36. A computer-readable storage medium storing computer-executable instructions comprising:
- an instruction to receive an orthographical vector space comprising a vector representation of an out-of vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary;
  
  an instruction to select phoneme sub-strings from the dictionary accord to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words; and
  
  an instruction to generate a pronunciation data for the out-of vocabulary word from the selected phoneme sub-strings.
- View Dependent Claims (37, 38, 39, 40)
- - 37. The computer-readable storage medium of claim 36, wherein the representation of the out-of vocabulary word and the orthographic anchors are feature vectors.
  - 38. The computer-readable storage medium of claim 36, wherein the instruction to select phoneme sub-strings comprises:
    - an instruction to form an orthographic neighborhood from in-vocabulary words correspond to orthographic anchors that satisfy the closeness measure; and
      
      an instruction to create a pronunciation data neighborhood from phoneme strings for the in-vocabulary words in the orthographic neighborhood, the phoneme strings have at least one phoneme sub-string in the dictionary.
  - 39. The computer-readable storage medium of claim 38, wherein the instruction to select phoneme sub-strings further comprises:
    - an instruction to select phoneme strings in the pronunciation data neighborhood for each context within the out-of vocabulary word;
      
      an instruction to align the selected phoneme strings on common phoneme sub-strings; and
      
      an instruction to select a phoneme sub-string for each context from the common phoneme sub-strings.
  - 40. The computer-readable storage medium of claim 39 further comprising:
    - an instructions to merge the phoneme sub-strings for adjacent contexts when two phoneme sub-strings overlap.

41. A computer-readable storage medium storing computer-executable instructions which, when executed, cause a data processing system to perform a method comprising:
- storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical space defined by a dictionary; and
  
  transmitting the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (42, 43, 44)
- - 42. The computer-readable storage medium of claim 41, wherein the orthographic vector space comprises the vector representation of the out-of-vocabulary word and orthographic anchors corresponding to in-vocabulary words.
  - 43. The computer-readable storage medium of claim 42, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 44. The computer-readable storage medium of claim 41, wherein the dictionary comprises phoneme strings for in-vocabulary words.

45. A computer-readable storage medium storing computer-executable instructions comprising:
- an instruction to store pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  an instruction to transmit the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (46, 47, 48)
- - 46. The computer-readable storage medium of claim 45, wherein the orthographic vector space comprises the representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 47. The computer-readable storage medium of claim 46, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 48. The computer-readable storage medium of claim 45, wherein the dictionary comprises phoneme strings for in-vocabulary words.

49. An apparatus comprising:
- means for receiving pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation of the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  means for reproducing the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (50, 51, 52, 53)
- - 50. The apparatus of claim 49 further comprising:
    - means for storing the pronunciation data for subsequent reproduction.
  - 51. The apparatus of claim 49, wherein the orthographic vector space comprises the vector representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 52. The apparatus of claim 51, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 53. The apparatus of claim 49, wherein the dictionary comprises phoneme strings for in-vocabulary words.

54. An apparatus comprising:
- means for receiving an orthographical vector space comprising a vector representation of an out-of-vocabulary word and orthographic anchors for in-vocabulary words, the orthographic vector space defined by a dictionary;
  
  means for selecting phoneme sub-strings from the dictionary according to a closeness measure between the vector representation of the out-of-vocabulary word and the orthographic anchors for the in-vocabulary words in the orthographical vector space; and
  
  means for generating a pronunciation data for the out-of-vocabulary word from the selected phoneme sub-strings.
- View Dependent Claims (55, 56, 57, 58)
- - 55. The apparatus of claim 54, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 56. The apparatus of claim 54, wherein the means for selecting phoneme sub-strings comprises:
    - means for forming an orthographic neighborhood from in-vocabulary words corresponding to orthographic anchors that satisfy the closeness measure; and
      
      means for creating a pronunciation data neighborhood from phoneme strings for the in-vocabulary words in the orthographic neighborhood, the phoneme strings having at least one phoneme sub-string in the dictionary.
  - 57. The apparatus of claim 56, wherein the means for selecting phoneme sub-strings further comprises:
    - means for selecting phoneme strings in the pronunciation data neighborhood for each context within the out-of-vocabulary word;
      
      means for aligning the selected phoneme strings on common phoneme sub-strings; and
      
      means for selecting a phoneme sub-string for each context from the common phoneme sub-strings.
  - 58. The apparatus of claim 57 further comprising:
    - means for merging the phoneme sub-strings for adjacent contexts when two phoneme sub-strings overlap.

59. An apparatus comprising:
- means for storing pronunciation data for an out-of-vocabulary word, the pronunciation data comprising phoneme sub-strings selected from candidate phoneme strings having pronunciation data associated with orthographic anchors that are close to a vector representation the out-of-vocabulary word in an orthographical vector space defined by a dictionary; and
  
  means for transmitting the pronunciation data for the out-of-vocabulary word as an audible signal.
- View Dependent Claims (60, 61, 62)
- - 60. The apparatus of claim 59, wherein the orthographic space comprises the vector representation of the out-of-vocabulary word and the orthographic anchors corresponding to in-vocabulary words.
  - 61. The apparatus of claim 60, wherein the representation of the out-of-vocabulary word and the orthographic anchors are feature vectors.
  - 62. The apparatus of claim 59, wherein the dictionary comprises phoneme strings for in-vocabulary words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Bellegarda, Jerome R.
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US11/603,586
Publication Number

US 20070067173A1
Time in Patent Office

1,246 Days
Field of Search

704/243, 704/244, 704/258, 704/260
US Class Current

704/258
CPC Class Codes

G10L 15/063 Training

G10L 15/187 Phonemic context, e.g. pron...

Unsupervised data-driven pronunciation modeling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

62 Claims

Specification

Solutions

Use Cases

Quick Links

Unsupervised data-driven pronunciation modeling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

62 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links