SYSTEM AND METHOD FOR USER-SPECIFIED PRONUNCIATION OF WORDS FOR SPEECH SYNTHESIS AND RECOGNITION

US 20170178619A1
Filed: 02/28/2017
Published: 06/22/2017
Est. Priority Date: 06/07/2013
Status: Active Grant

First Claim

Patent Images

1-9. -9. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.

Citations

38 Claims

1-9. -9. (canceled)

10. A method for learning word pronunciations, comprising:
- at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors;
  
  detecting an error in a speech based interaction with a digital assistant;
  
  in response to detecting the error, receiving a speech input from a user, the speech input including a pronunciation of one or more words; and
  
  storing the pronunciation in association with a text string corresponding to the one or more words.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words.
  - 12. The method of claim 10, wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words.
  - 13. The method of claim 10, further comprising:
    - receiving a speech input including the one or more words;
      
      performing speech recognition on the speech input to generate a text string corresponding to the one or more words;
      
      determining a confidence metric of the text string; and
      
      detecting the error based on a determination that the confidence metric does not meet a predetermined threshold.
  - 14. The method of claim 10, further comprising:
    - synthesizing a speech output including the one or more words; and
      
      detecting the error based on an indication from the user that the one or more words were pronounced incorrectly.

15-28. -28. (canceled)

29. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device, cause the device to:
- detect an error in a speech based interaction with a digital assistant;
  
  in response to detecting the error, receive a speech input from a user, the speech input including a pronunciation of one or more words; and
  
  store the pronunciation in association with a text string corresponding to the one or more words.
- View Dependent Claims (30, 31, 32, 33)
- - 30. The computer readable storage medium of claim 29, wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words.
  - 31. The computer readable storage medium of claim 29, wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words.
  - 32. The computer readable storage medium of claim 29, wherein the instructions further cause the device to:
    - receive a speech input including the one or more words;
      
      perform speech recognition on the speech input to generate a text string corresponding to the one or more words;
      
      determine a confidence metric of the text string; and
      
      detect the error based on a determination that the confidence metric does not meet a predetermined threshold.
  - 33. The computer readable storage medium of claim 29, wherein the instructions further cause the device to:
    - synthesize a speech output including the one or more words; and
      
      detect the error based on an indication from the user that the one or more words were pronounced incorrectly.

34. An electronic device, comprising:
- one or more processors; and
  
  memory storing one or more programs, the one or more programs including instructions, which when executed by the one or more processors, cause the one or more processors to;
  
  detect an error in a speech based interaction with a digital assistant;
  
  in response to detecting the error, receive a speech input from a user, the speech input including a pronunciation of one or more words; and
  
  store the pronunciation in association with a text string corresponding to the one or more words.
- View Dependent Claims (35, 36, 37, 38)
- - 35. The device of claim 34, wherein the one or more words were received in a prior speech input provided by the user, and wherein the error is an error in speech recognition of the one or more words.
  - 36. The device of claim 34, wherein the one or more words were output in a speech output by the electronic device, and wherein the error is an error in speech synthesis of the one or more words.
  - 37. The device of claim 34, wherein the instructions further cause the one or more processors to:
    - receive a speech input including the one or more words;
      
      perform speech recognition on the speech input to generate a text string corresponding to the one or more words;
      
      determine a confidence metric of the text string; and
      
      detect the error based on a determination that the confidence metric does not meet a predetermined threshold.
  - 38. The device of claim 34, wherein the instructions further cause the one or more processors to:
    - synthesize a speech output including the one or more words; and
      
      detect the error based on an indication from the user that the one or more words were pronounced incorrectly.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
GRUBER, Thomas R., NAIK, Devang K., WEINER, Liam, BINDER, Justin G., SRISUWANANUKORN, Charles, WILLIAMS, Shaun Eric, CHEN, Hong, EVERMANN, Gunnar, NAPOLITANO, Lia T.

Granted Patent

US 9,966,060 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 13/027   Concept to speech synthesis...

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0631   Creating reference template...

G10L 2015/0638   Interactive procedures

SYSTEM AND METHOD FOR USER-SPECIFIED PRONUNCIATION OF WORDS FOR SPEECH SYNTHESIS AND RECOGNITION

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR USER-SPECIFIED PRONUNCIATION OF WORDS FOR SPEECH SYNTHESIS AND RECOGNITION

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links