Distributed speech unit inventory for TTS systems

US 9,159,314 B2
Filed: 01/14/2013
Issued: 10/13/2015
Est. Priority Date: 01/14/2013
Status: Active Grant

First Claim

Patent Images

1. A computing device for performing text-to-speech (TTS) processing, comprising:

at least one processor;

a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;

to access a local database of speech units to be used in unit selection speech synthesis, wherein the local database is comprised from a larger database of speech units;

to receive text data for TTS processing;

to determine desired speech units to synthesize the received text data;

to identify first desired speech units in the local database;

to determine the second desired speech units are not in the local database;

to determine that the second desired speech units are in the larger database located at a remote device;

to receive the second desired speech units;

to concatenate audio segments corresponding to the first desired speech units in the local database and audio segments corresponding to the second desired speech units; and

to output audio data comprising speech corresponding to the received text data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a text-to-speech (TTS) system, a database including sample speech units for unit selection may be configured for use by a local device. The local unit database may be created from a more comprehensive unit database. The local unit database may include units which provide sufficient TTS results for frequently input text. Speech synthesis may then be performed by concatenating locally available units with units from a remote device including the comprehensive unit database. Aspects of the speech synthesis may be performed by the remote device and/or the local device.

13 Citations

View as Search Results

25 Claims

1. A computing device for performing text-to-speech (TTS) processing, comprising:
- at least one processor;
  
  a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;
  
  to access a local database of speech units to be used in unit selection speech synthesis, wherein the local database is comprised from a larger database of speech units;
  
  to receive text data for TTS processing;
  
  to determine desired speech units to synthesize the received text data;
  
  to identify first desired speech units in the local database;
  
  to determine the second desired speech units are not in the local database;
  
  to determine that the second desired speech units are in the larger database located at a remote device;
  
  to receive the second desired speech units;
  
  to concatenate audio segments corresponding to the first desired speech units in the local database and audio segments corresponding to the second desired speech units; and
  
  to output audio data comprising speech corresponding to the received text data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computing device of claim 1, wherein the local unit database is configured based at least in part on a desired TTS result quality, storage configuration of the device, user preference, frequency of use of units in the local unit database, or frequency of TTS activity of the device.
  - 3. The computing device of claim 1, wherein the local unit database is configured based at least in part on a desired level of network or processing activity of the remote device.
  - 4. The computing device of claim 1, wherein identifying the second desired speech units comprises comparing the desired speech units with a list of remotely available speech units.
  - 5. The computing device of claim 1, wherein the local unit database comprises at least one example of each available speech unit.

6. A method comprising:
- receiving text data for text-to-speech processing;
  
  determining first desired speech units and second desired speech units from the received text data;
  
  determining that a local database does not include the first desired speech units;
  
  receiving first audio segments corresponding to the first desired speech units from a remote database;
  
  receiving second audio segments corresponding to the second desired speech units from the local database; and
  
  creating audio corresponding to the received text data using the first audio segments and the second audio segments.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6, further comprising identifying the first audio segments and second audio segments by a local device.
  - 8. The method of claim 6, further comprising identifying the first audio segments and second audio segments by a remote device.
  - 9. The method of claim 6, wherein the local database is comprised from speech units selected from the remote database.
  - 10. The method of claim 6, further comprising reconfiguring the local database after creating the audio.
  - 11. The method of claim 10, wherein the reconfiguring comprises removing speech units from the local database.
  - 12. The method of claim 10, wherein the reconfiguring is based at least in part on a user preference, a network load, a storage configuration of a local device, an application operated by a user, desired inclusion of foreign speech units, and/or desired speech synthesis quality.
  - 13. The method of claim 10, wherein the reconfiguring is based at least in part on a frequency of use of at least one speech unit.

14. A computing device, comprising:
- at least one processor;
  
  a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;
  
  to receive text data for text-to-speech processing;
  
  to determine first desired speech units and second desired speech units from the received text data to determine that a local database does not include the first desired speech units;
  
  to identify the first desired speech units in a remote database for use in synthesizing the received text data;
  
  to identify the second desired speech units in the local database for use in synthesizing the received text data;
  
  to send first audio segments corresponding to the first desired speech units to a local device comprising the local database; and
  
  to send instructions to the local device to concatenate the first audio segments with second audio segments corresponding to the second desired speech units stored at the local device.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computing device of claim 14, wherein the local database is comprised from speech units selected from the remote database.
  - 16. The computing device of claim 14, wherein the at least one processor is further configured to reconfigure the local database after performing the concatenation.
  - 17. The computing device of claim 16, wherein the at least one processor is further configured to remove speech units from the local database.
  - 18. The computing device of claim 16, wherein the at least one processor is configured to reconfigure based at least in part on a user preference, a network load, a storage configuration of a local device, an application operated by a user, desired inclusion of foreign speech units, and/or desired speech synthesis quality.
  - 19. The computing device of claim 16, wherein the at least one processor is configured to reconfigure based at least in part on a frequency of use of at least one speech unit.

20. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
- program code to receive text data for text-to-speech processing;
  
  program code to determine first desired speech units and second desired speech units from the received text data;
  
  program code to determine that a local database does not include the first desired speech units;
  
  program code to identify the first desired speech units in a remote database for use in synthesizing the received text data;
  
  program code to identify the second desired speech units in the local database for use in synthesizing the received text data;
  
  program code to send first audio segments corresponding to the first desired speech units to a local device comprising the local database; and
  
  program code to send instructions to the local device to concatenate the first audio segments with second audio segments corresponding to the second desired speech units stored at the local device.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The non-transitory computer-readable storage medium of claim 20, wherein the local database is comprised from speech units selected from the remote database.
  - 22. The non-transitory computer-readable storage medium of claim 20, further comprising program code to reconfigure the local database after performing the speech synthesis.
  - 23. The non-transitory computer-readable storage medium of claim 22, wherein the program code to reconfigure comprises program code to remove speech units from the local database.
  - 24. The non-transitory computer-readable storage medium of claim 22, wherein the program code to reconfigure is based at least in part on a user preference, a network load, a storage configuration of a local device, an application operated by a user, desired inclusion of foreign speech units, and/or desired speech synthesis quality.
  - 25. The non-transitory computer-readable storage medium of claim 22, wherein the program code to reconfigure is based at least in part on a frequency of use of at least one speech unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Osowski, Lukasz M., Kaszczuk, Michal T.
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US13/740,762
Publication Number

US 20140200894A1
Time in Patent Office

1,002 Days
Field of Search

704/260
US Class Current

1/1
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/047   Architecture of speech synt...

G10L 13/08   Text analysis or generation...

Distributed speech unit inventory for TTS systems

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

13 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Distributed speech unit inventory for TTS systems

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others