Text-to-speech pre-processing

US 10,366,686 B2
Filed: 09/26/2017
Issued: 07/30/2019
Est. Priority Date: 09/26/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for text-to-speech (TTS) pre-processing, the method comprising:

receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores;

receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary;

receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information;

performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list items; and

sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples of techniques text-to-speech pre-processing for speech recognition and speech synthesis are disclosed. In one example implementation, a computer-implemented method includes receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores. The method further includes performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction. The method further includes sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.

11 Citations

15 Claims

1. A computer-implemented method for text-to-speech (TTS) pre-processing, the method comprising:
- receiving, by a processing device, an automated speech recognition output comprising an n-best list and associated confidence scores;
  
  receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary;
  
  receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information;
  
  performing, by the processing device, a TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list items; and
  
  sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the read back instruction comprises a pause instruction.
  - 3. The computer-implemented method of claim 1, wherein the read back instruction comprises an enunciation instruction.
  - 4. The computer-implemented method of claim 1, wherein the read back instruction comprises an intonation instruction.
  - 5. The computer-implemented method of claim 1, wherein the read back instruction comprises a volume instruction.
  - 6. The computer-implemented method of claim 1, wherein performing the TTS pre-processing further comprises comparing a user pronunciation to a default pronunciation and detecting and marking mismatches.
  - 7. The computer-implemented method of claim 6, wherein performing the TTS pre-processing further comprises altering and adapting the read back message by adding intonation information, pause information, volume information, and enunciation information to the read back message and switching words within the read back message to user pronunciation if a mismatch is marked.
  - 8. The computer-implemented method of claim 1, further comprising:
    - enabling, by the processing device, a user to alter a word or a phrase in the read back message while the audio device presents the read back message.

9. A system for text-to-speech (TTS) pre-processing, the system comprising:
- a memory comprising computer readable instructions; and
  
  a processing device for executing the computer readable instructions for performing a method, the method comprising;
  
  receiving, by the processing device, an automated speech recognition output comprising an n-best list and associated confidence scores;
  
  receiving, by the processing device, a user model from a model database, the user model containing proper names, favorite places, and user-specified vocabulary;
  
  receiving, by the processing device, a context model from the model database, the context model containing text or query history information, location context information, and date and time context information;
  
  performing, by the processing device, the TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction, wherein performing the TTS pre-processing further comprises aligning n-best list items on the n-best list, parsing the n-best list items, and identifying strong words and weak words using the associated confidence scores, the user model, the context model, parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list item; and
  
  sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, wherein the read back instruction comprises a pause instruction indicating a length of time of a pause.
  - 11. The system of claim 9, wherein the read back instruction comprises an enunciation instruction presented as bold text that represents text to be read back with more enunciation than non-bold text.
  - 12. The system of claim 9, wherein the read back instruction comprises an intonation instruction as a visual indicator, the visual indicator being one of an up arrow or a down arrow, wherein the up arrow denotes an increase in intonation, and wherein the down arrow denotes a decrease in intonation.
  - 13. The system of claim 9, wherein the read back instruction comprises a volume instruction.
  - 14. The system of claim 9, wherein performing the TTS pre-processing further comprises:
    - aligning n-best list items on the n-best list;
      
      parsing the n-best list items;
      
      identifying strong words and weak words using the associated confidence scores, a user model, a context model;
      
      parsing results from parsing the n-best list items, and an n-best list alignment including repetitions across the n-best list items;
      
      comparing a user pronunciation to a default pronunciation and detecting and marking mismatches; and
      
      altering and adapting the read back message by adding intonation information, pause information, volume information, and enunciation information to the read back message and switching words within the read back message to user pronunciation if a mismatch is marked.

15. A computer program product for text-to-speech (TTS) pre-processing, the computer program product comprising:
- a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processing device to cause the processing device to perform a method comprising;
  
  receiving, by the processing device, an automated speech recognition output comprising an n-best list and associated confidence scores;
  
  performing, by the processing device, the TTS pre-processing on the n-best list and associated confidence scores to generate a read back message, wherein the read back message comprises a read back instruction; and
  
  sending, by the processing device, the read back message to a TTS speech synthesizer for generating an audible signal based on the read back message to cause an audio device to present the read back message, wherein the read back instruction comprises a pause instruction indicating a length of time of a pause, an enunciation instruction presented as bold text that represents text to be read back with more enunciation than non-bold text, an intonation instruction as a visual indicator, the visual indicator being one of an up arrow or a down arrow, wherein the up arrow denotes an increase in intonation, and wherein the down arrow denotes a decrease in intonation, and a volume instruction, the volume instruction indicating a volume level based on a noise level detected via a microphone in a vehicle, and wherein the volume level is adjusted based on a signal-to-noise ratio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GM Global Technology Operations LLC (General Motors Company)
Original Assignee
GM Global Technology Operations LLC (General Motors Company)
Inventors
Winter, Ute, Grost, Timothy
Primary Examiner(s)
Sharma, Neeraj

Application Number

US15/715,695
Publication Number

US 20190096387A1
Time in Patent Office

672 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 2015/221   Announcement of recognition...

Text-to-speech pre-processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

11 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech pre-processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links