SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

US 20160049144A1
Filed: 08/18/2014
Published: 02/18/2016
Est. Priority Date: 08/18/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving input;

normalizing the input, to yield normalized input;

generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;

when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and

when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

28 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving input;
  
  normalizing the input, to yield normalized input;
  
  generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;
  
  when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and
  
  when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the training of the acoustic model and the language model uses the phonemes.
  - 3. The method of claim 2, wherein the acoustic model and the language model are used in generating future speech and recognizing future speech.
  - 4. The method of claim 1, further comprising outputting the speech to a user as part of a dialog system.
  - 5. The method of claim 1, wherein the dictionary comprises syllable boundaries.
  - 6. The method of claim 1, wherein the dictionary comprises contextual rules.
  - 7. The method of claim 6, wherein the output is generated based on the contextual rules.

8. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving input;
  
  normalizing the input, to yield normalized input;
  
  generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;
  
  when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and
  
  when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the training of the acoustic model and the language model uses the phonemes.
  - 10. The system of claim 9, wherein the acoustic model and the language model are used in generating future speech and recognizing future speech.
  - 11. The system of claim 8, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising outputting the speech to a user as part of a dialog system.
  - 12. The system of claim 8, wherein the dictionary comprises syllable boundaries.
  - 13. The system of claim 8, wherein the dictionary comprises contextual rules.
  - 14. The system of claim 13, wherein the output is generated based on the contextual rules.

15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving input;
  
  normalizing the input, to yield normalized input;
  
  generating, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output comprising one of phonemes corresponding to the input and text corresponding to the input;
  
  when the output comprises the phonemes corresponding to the input, generating speech by performing prosody generation and unit selection synthesis using the phonemes; and
  
  when the output comprises the text corresponding to the input, training both an acoustic model and a language model for use in future speech recognition.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, wherein the training of the acoustic model and the language model uses the phonemes.
  - 17. The computer-readable storage device of claim 16, wherein the acoustic model and the language model are used in generating future speech and recognizing future speech.
  - 18. The computer-readable storage device of claim 15, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising outputting the speech to a user as part of a dialog system.
  - 19. The computer-readable storage device of claim 15, wherein the dictionary comprises syllable boundaries.
  - 20. The computer-readable storage device of claim 15, wherein the dictionary comprises contextual rules.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
CONKIE, Alistair D., GOLIPOUR, Ladan

Granted Patent

US 10,199,034 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/06   Elementary speech units use...

G10L 13/08   Text analysis or generation...

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 2015/025   Phonemes, fenemes or fenone...

SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

28 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others