SYSTEM AND METHOD FOR DISTRIBUTED TEXT-TO-SPEECH SYNTHESIS AND INTELLIGIBILITY

US 20100268539A1
Filed: 04/21/2009
Published: 10/21/2010
Est. Priority Date: 04/21/2009
Status: Active Grant

First Claim

Patent Images

1. A method for creating an audio index representation of an audio file from text input in a form of a text string and producing the audio file from the audio index representation, the method comprising:

receiving the text string;

converting the text string to an audio index representation of an audio file associated with the text string at a text-to-speech synthesizer, the converting including selecting at least one audio unit from a first audio unit synthesis inventory having a plurality of audio units, the selected at least one audio unit forming the audio file;

representing the selected at least one audio unit with the audio index representation; and

reproducing the audio file by concatenating the audio units identified in the audio index representation from the first audio unit inventory or a second audio unit synthesis inventory having the audio units identified in the audio index representation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for distributed text-to-speech synthesis and intelligibility, and more particularly to distributed text-to-speech synthesis on handheld portable computing devices that can be used for example to generate intelligible audio prompts that help a user interact with a user interface of the handheld portable computing device. The text-to-speech distributed system 70 receives a text string from the guest devices and comprises a text analyzer 72, a prosody analyzer 74, a database 14 that the text analyzer and prosody analyzer refer to, and a speech synthesizer 80. Elements of the speech synthesizer 80 are resident on the host device and the guest device and an audio index representation of the audio file associated with the text string is produced at the host device and transmitted to the guest device for producing the audio file at the guest device.

Citations

20 Claims

1. A method for creating an audio index representation of an audio file from text input in a form of a text string and producing the audio file from the audio index representation, the method comprising:
- receiving the text string;
  
  converting the text string to an audio index representation of an audio file associated with the text string at a text-to-speech synthesizer, the converting including selecting at least one audio unit from a first audio unit synthesis inventory having a plurality of audio units, the selected at least one audio unit forming the audio file;
  
  representing the selected at least one audio unit with the audio index representation; and
  
  reproducing the audio file by concatenating the audio units identified in the audio index representation from the first audio unit inventory or a second audio unit synthesis inventory having the audio units identified in the audio index representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein converting the text string to an audio index representation of the audio file associated with the text string is on a host device.
  - 3. The method of claim 2 wherein reproducing the audio file by concatenating the audio units is on a guest device.
  - 4. The method of claim 1 wherein converting the text string to the audio index representation of an audio file associated with the text string further comprises analyzing the text string with a text analyzer.
  - 5. The method of claim 1 wherein converting the text string to the audio index representation of an audio file associated with the text string further comprises analyzing the text string with a prosody analyzer.
  - 6. The method of claim 1 wherein selecting the at least one audio unit from the first audio unit synthesis inventory having a plurality of audio units comprises matching audio units from speech corpus and text corpus of the first audio unit synthesis inventory.
  - 7. The method of claim 1 wherein the audio file generates intelligible and natural-sounding speech.
  - 8. The method of claim 7 wherein the intelligible and natural-sounding speech is generated using reproduction of competing voices.

9. A method for distributed text-to-speech synthesis comprising:
- receiving text input in a form of a text string at a host device from a separate source;
  
  creating an audio index representation of an audio file from the text string on the host device andproducing the audio file on a guest device from the audio index representation, the creating of the audio index representation including converting the text string to an audio index representation of an audio file associated with the text string at a text-to-speech synthesizer, the converting including selecting at least one audio unit from a first audio unit synthesis inventory having a plurality of audio units, the selected at least one audio unit forming the audio file;
  
  representing the selected at least one audio unit with the audio index representation; and
  
  producing the audio file from the audio index representation including reproducing the audio file by concatenating the audio units identified in the audio index representation from the first audio unit synthesis inventory or a second audio unit synthesis inventory having the audio units identified in the audio index representation.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9 wherein converting the text string to the audio index representation of an audio file associated with the text string further comprises analyzing the text string with a text analyzer.
  - 11. The method of claim 9 wherein converting the text string to the audio index representation of an audio file associated with the text string further comprises analyzing the text string with a prosody analyzer.
  - 12. The method of claim 9 wherein selecting at least one audio unit from the first audio unit synthesis inventory having a plurality of audio units comprises matching audio units from speech corpus and text corpus of the unit synthesis inventory.
  - 13. The method of claim 9 wherein the audio file generates intelligible and natural-sounding speech.
  - 14. The method of claim 13 wherein the intelligible and natural-sounding speech is generated using reproduction of competing voices.

15. A system for distributed text-to-speech synthesis comprising:
- a guest device configured for sending text input in the form of a text string to a host device for converting the text string to an audio index representation of an audio file associated with the text string, the converting at the host system including selecting at least one audio unit from an audio unit synthesis inventory having a plurality of audio units and wherein the guest device further comprises;
  
  a unit-concatenative module anda second inventory of synthesis units, the unit-concatenative module configured for producing the audio file from the audio index representation by concatenating the audio units identified in the audio index representation from the first audio unit synthesis inventory or a second audio unit synthesis inventory having the audio units identified in the audio index representation.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The system as recited in claim 15 further comprising:
    - the host device, wherein the host device and the guest device are in communication with each other, the host device adapted to receive a text input in a form of text string from either the guest device or any other source;
      
      the host device having a unit-selection module configured to create an audio index representation of an audio file from the text string on the host device and to convert the text string to an audio index representation of an audio file associated with the text string at a text-to-speech synthesizer, the unit-selection module being arranged to select at least one audio unit from an audio unit inventory having a plurality of audio units, the selected at least one audio unit forming the audio file, the selected at least one audio unit being represented by the audio index representation.
  - 17. The system of claim 15 wherein the audio file generates intelligible and natural-sounding speech.
  - 18. The system of claim 15 wherein the intelligible and natural-sounding speech is generated using reproduction of competing voices.
  - 19. The system of claim 15 wherein the guest device is a portable handheld device.

20. A host system for creating an audio index representation of an audio file from a text input in a form of text string and producing the audio file from the audio index representation, the method comprising:
- a text-to-speech synthesizer for receiving a text string and converting the text string to an audio index representation of an audio file associated with the text string at a text-to-speech synthesizer, the text-to-speech synthesizer comprising a unit-selection unit and an audio unit inventory having a plurality of audio units, the unit-selection unit for selecting at least one audio unit from the audio unit inventory, the selected at least one audio unit forming the audio file, and representing the selected at least one audio unit with the audio index representation, for reproduction of the audio file by concatenating the audio units identified in the audio index representation from the audio unit inventory or another audio unit synthesis inventory having the audio units identified in the audio index representation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Creative Technology Ltd.
Original Assignee
Creative Technology Ltd.
Inventors
Xu, Jun, Lee, Teck Chee

Granted Patent

US 9,761,219 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

SYSTEM AND METHOD FOR DISTRIBUTED TEXT-TO-SPEECH SYNTHESIS AND INTELLIGIBILITY

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR DISTRIBUTED TEXT-TO-SPEECH SYNTHESIS AND INTELLIGIBILITY

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links