Text-to-speech synthesis system

US 7,523,036 B2
Filed: 03/21/2006
Issued: 04/21/2009
Est. Priority Date: 06/01/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A text-to-speech synthesis apparatus comprising:

storage means for storing phoneme data of a plurality of speaker voices;

selecting means for selecting at least two speaker voices from said plurality of speaker voices;

searching means for searching said storage means for phoneme data of the speaker voices selected by said selecting means; and

text-to-speech synthesis processing means for linking said phoneme data of said speaker voices retrieved by said searching means to convert input data into a synthetic speech;

wherein said text-to-speech synthesis processing means can convert said input data into a synthetic speech including at least two speaker voices.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is intended to provide a text-to-speech synthesis apparatus, including a storage for storing phoneme data of a plurality of speakers; a selector for selecting one of the plurality of speakers in accordance with an operation performed by a user; a searcher for searching the storage for phoneme data of the speaker selected by the selector; a text-to-speech synthesis processor for linking the phoneme data of the speaker retrieved by the searcher to convert input data into a synthetic speech; and a fee-charge controller for controlling a fee-charge operation for the user in accordance with the phoneme data selected by the selector. Consequently, the user can perform text-to-speech synthesis on the desired input data such as drama data by use of the obtained phoneme data.

Citations

50 Claims

1. A text-to-speech synthesis apparatus comprising:
- storage means for storing phoneme data of a plurality of speaker voices;
  
  selecting means for selecting at least two speaker voices from said plurality of speaker voices;
  
  searching means for searching said storage means for phoneme data of the speaker voices selected by said selecting means; and
  
  text-to-speech synthesis processing means for linking said phoneme data of said speaker voices retrieved by said searching means to convert input data into a synthetic speech;
  
  wherein said text-to-speech synthesis processing means can convert said input data into a synthetic speech including at least two speaker voices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The text-to-speech synthetic apparatus according to claim 1, wherein said input data includes at last two parts and said selecting means selects speaker voices corresponding to each part of said input data.
  - 3. The text-to-speech synthetic apparatus according to claim 2, wherein said input data includes conversation parts of at least two characters and selecting means selects speaker voices corresponding to each character in the conversation parts.
  - 4. The text-to-speech synthetic apparatus according to claim 1, wherein said speaker voices includes at least one voice of celebrities, entertainers, actors, actresses, voice actors, politicians, characters of movies or animations.
  - 5. The text-to-speech synthetic apparatus according to claim 1, further comprising:
    - fee-charge control means for controlling a fee-charge operation for the said user in accordance with said phoneme data selected by said selecting means,wherein said fee-charge control means sends, to an external settlement center, fee-charge data corresponding to said phoneme data selected by said selecting means.
  - 6. The text-to-speech synthesis apparatus according to claim 1, wherein said storage means stores prosody data for each of said plurality of speaker voices, said searching means searches for said prosody data along with said phoneme data of the speaker voice selected by said selecting means, and said text-to-speech synthesis processing means converts said input data into a synthetic speech on a basis of said searched phoneme data and said prosody data.
  - 7. The text-to-speech synthesis apparatus according to claim 1, wherein said input data is at least one of voice data and text data.
  - 8. The text-to-speech synthesis apparatus according to claim 1, further comprising input means for directly inputting said input data.
  - 9. The text-to-speech synthesis apparatus according to claim 1, further comprising communication means for receiving said input data via a network.
  - 10. The text-to-speech synthesis apparatus according to claim 1, wherein said storage means stores script data and said text-to-speech synthesis processing means links said phoneme data of said speaker voices searched by said searching means to convert said script data into a synthetic speech.
  - 11. The text-to-speech synthesis apparatus according to claim 10, wherein said storage means stores said script data in a classified manner and said selecting means selects said script data along with said one speaker.
  - 12. The text-to-speech synthesis apparatus according to claim 10, wherein said script data is at least one of voice data and text data.

13. A text-to-speech synthesis apparatus comprising:
- selecting means for selecting at least two speaker voices;
  
  transmitting means for transmitting speaker voice identification data for identifying said speaker voices selected by said selecting means to another apparatus;
  
  receiving means for receiving phoneme data of said speaker voices corresponding to said speaker voice identification data transmitted from said transmitting means; and
  
  text-to-speech synthesis processing means for linking said phoneme data of said speaker voices received by said receiving means to convert input data into a synthetic speech,wherein said text-to-speech synthesis processing means can convert said input data into a synthetic speech including at least said two speaker voices including said speaker voice corresponding to said speaker voice identification data.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The text-to-speech synthesis apparatus according to claim 13, wherein said input data includes at least two parts and said selecting means selects speaker voices corresponding to each part of said input data.
  - 15. The text-to-speech synthesis apparatus according to claim 13, wherein said input data includes conversation parts of at least two characters and selecting means selects speaker voices corresponding to each character in the conversation parts.
  - 16. The text-to-speech synthesis apparatus according to claim 13, wherein said speaker voices includes at least once voice of celebrities, entertainers, actors, actresses, voice actors, politicians, characters of movies or animations.
  - 17. The text-to-speech synthesis apparatus according to claim 13, further comprising:
    - fee-charge control means for controlling a fee-charge operations for the said user in accordance with said phoneme data selected by said selecting means,wherein said fee-charge control means sends, to an external settlement center, fee-charge data corresponding to said phoneme data to be received by said receiving means.
  - 18. The text-to-speech synthesis apparatus according to claim 13, wherein said input data is at least one of voice data and text data.
  - 19. The text-to-speech synthesis apparatus according to claim 13, further comprising input means for directly inputting said input data.
  - 20. The text-to-speech synthesis apparatus according to claim 13, further comprising communication means for receiving said input data via a network.
  - 21. The text-to-speech synthesis apparatus according to claim 13, wherein said selecting means selects script data as specified by said user;
    - said transmitting means transmits, to another apparatus, script identification data for identifying said script data selected by said selecting means;
      
      said receiving means receives phoneme data of a speaker voice corresponding to said speaker voice identification data transmitted by said transmitting means and said script data corresponding to said script identification data;
      
      said text-to-speech synthesis processing means links said phoneme data of said speaker voice received by said receiving means to convert said script data into a synthetic speech.
  - 22. The text-to-speech synthesis apparatus according to claim 21, wherein said receiving means receives said synthetic speech of said SCRIPT data obtained by said another apparatus on the basis of said speaker voice identification data and said script identification data transmitted by said transmitting means.
  - 23. The text-to-speech synthesis apparatus according to claim 21, wherein said script data is at least one of voice data and text data.

24. A text-to-speech synthesis apparatus comprising:
- a memory for storing phoneme data of a plurality of speaker voices;
  
  a selecting section for selecting any one of said plurality of speaker voices;
  
  a search section for searching said memory for the phoneme data of the speaker voices selected by said selecting section;
  
  a text-to-speech synthesis processing section for linking said phoneme data of said speaker voices retrieved by said search section to convert script data into a synthetic speech;
  
  a storage section for accumulating said synthetic speech converted from said script data on the basis of the phoneme data of said plurality of speaker voices; and
  
  a reproducing section for retrieving said synthetic speech of said speaker voices selected by said selecting section and reproducing said synthetic speech,wherein said text-to-speech synthesis processing means can convert said script into a synthetic speech including at least said two speaker voices including said speaker voice corresponding to said speaker voice identification data.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 25. The text-to-speech synthesis apparatus according to claim 24, wherein said script includes at least two parts and said selecting means selects speaker voices corresponding to each part of said script.
  - 26. The text-to-speech synthesis apparatus according to claim 24, wherein said script includes conversation parts of at least two characters and selecting means selects speaker voices corresponding to each character in the conversation parts.
  - 27. The text-to-speech synthesis apparatus according to claim 24, wherein said speaker voices includes at least one voice of celebrities, entertainers, actors, actresses, voice actors, politicians, characters of movies or animations.
  - 28. The text-to-speech synthesis apparatus according to claim 24, further comprising:
    - fee-charge control means for controlling a fee-charge operation for the said user in accordance with said phoneme data selected by said selecting means,wherein said fee-charge control section sends, to an external settlement center, fee-charge data corresponding to the phoneme data of the one speaker selected by said selecting section.
  - 29. The text-to-speech synthesis apparatus according to claim 24, wherein said memory stores prosody data for each of said plurality of speaker voices, said search section searches for said prosody data along with said phoneme data of said one speaker selected by said selecting section;
    - and said text-to-speech synthesis processing section converts said script data into a synthetic speech on the basis of said user-specified phoneme data and prosody data.
  - 30. The text-to-speech synthesis apparatus according to claim 24, wherein said script data is at least one of voice data and text data.
  - 31. The text-to-speech synthesis apparatus according to claim 24, further comprising an input section for directly inputting said script data.
  - 32. The text-to-speech synthesis apparatus according to claim 24, further comprising a communication section for receiving said script data via a network.
  - 33. The text-to-speech synthesis apparatus according to claim 24, wherein said memory stores said script data and said text-to-speech synthesis processing section links said phoneme data of said speaker voice retrieved by said search section to convert said script data into a synthetic speech.
  - 34. The text-to-speech synthesis apparatus according to claim 33, wherein said memory stores said script data in a classified manner and said selecting section selects said script data along with said speaker voice.
  - 35. The text-to-speech synthesis apparatus according to claim 33, wherein said script data is at least one of voice data and text data.

36. A text-to-speech synthesis apparatus comprising:
- a selecting section for selecting at least two speaker voices;
  
  a transmitting section for transmitting, to another apparatus, speaker voice identification data for identifying said speaker voices selected by said selecting section;
  
  a receiving section for receiving phoneme data of the speaker voice corresponding to said speaker voice identification data transmitted by said transmitting section and a synthetic speech of said speaker voice;
  
  a text-to-speech synthesis processing section for linking said phoneme data of said speaker voice received by said receiving section to convert script data into a synthetic speech; and
  
  a reproducing section for reproducing said synthetic speech received by said receiving means;
  
  wherein said text-to-speech synthesis processing means can convert said script into a synthetic speech including at least said two speaker voices including said speaker voice corresponding to aid speaker voice identification data.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 37. The text-to-speech synthesis apparatus according to claim 36, wherein said script includes at least two parts and said selecting means selects speaker voices corresponding to each part of said script.
  - 38. The text-to-speech synthesis apparatus according to claim 36, wherein said script includes conversation parts of at least two characters and selecting means selects speaker voices corresponding to each character in the conversation parts.
  - 39. The text-to-speech synthesis apparatus according to claim 36, wherein said speaker voices includes at least one voice of celebrities, entertainers, actors, actresses, voice actors, politicians, characters of movies or animations.
  - 40. The text-to-speech synthesis apparatus according to claim 36, further comprising:
    - a fee-charge control section for controlling a fee-charge operation for said user in accordance with said phoneme data received by said receiving section;
      
      wherein said fee-charge control section sends, to an external settlement center, fee-charge data corresponding to said phoneme data received by said receiving section.
  - 41. The text-to-speech synthesis apparatus according to claim 36, wherein said script data is at least one of voice data and text data.
  - 42. The text-to-speech synthesis apparatus according to claim 36, further comprising an input section for directly inputting said script data.
  - 43. The text-to-speech synthesis apparatus according to claim 36, further comprising a communication section for receiving said script data via a network.
  - 44. The text-to-speech synthesis apparatus according to claim 36, wherein said selecting section selects script data as specified by a user;
    - said transmitting section transmits, to said another apparatus, script identification data for identifying script data corresponding to the speaker voice selected by said selecting section;
      
      said receiving section receives phoneme data of a speaker voice corresponding to said speaker voice identification data transmitted by said transmitting section and said script data corresponding to said script identification data;
      
      said text-to-speech synthesis processing section links said phoneme data of said speaker voice received by said receiving section to convert said script data into a synthetic speech; and
      
      said fee-charge control section controls a fee-charge operation for said user in accordance with said phoneme data of said speaker voice received by said receiving section and said script data.
  - 45. The text-to-speech synthesis apparatus according to claim 44, wherein said receiving section receives the synthetic speech of said script data generated on said another apparatus on the basis of said speaker voice identification data and said script identification data transmitted by said transmitting section and said fee-charge control section controls a fee-charge operation for said user in accordance with said synthetic speech received by said receiving section.
  - 46. The text-to-speech synthesis apparatus according to claim 44, wherein said script data is at least one of voice data and text data.

47. A text-to-speech synthesis method comprising the steps of:
- selecting at least two speaker voices;
  
  searching phoneme data of the speaker voices selected at selecting step; and
  
  text-to-speech synthesis processing for linking said phoneme data of said speaker voices retrieved in said searching step to convert input data into a synthetic speech,wherein said text-to-speech synthesis processing can convert said input data into a synthetic speech including at least said two speaker voices.

48. A computer readable recording medium device on which is stored a text-to-speech synthesis program which, when implemented by a computer, comprises acts of:
- selecting at least two speaker voices;
  
  searching phoneme data of the speaker voices selected at selecting step; and
  
  text-to-speech synthesis processing for linking said phoneme data of said speaker voices retrieved in said searching step to convert input data into a synthetic speech;
  
  wherein said text-to-speech synthesis processing can convert said input data into a synthetic speech including at least said two speaker voices.

49. A text-to-speech synthesis method comprising the steps of:
- selecting at least two speaker voices;
  
  transmitting speaker voice identification data for identifying said speaker voices selected in said selecting step to another apparatus;
  
  receiving phoneme data of said speaker voices corresponding to said speaker voice identification data transmitted in said transmitting step; and
  
  text-to-speech synthesis processing linking said phoneme data of said speaker voices received in said receiving step to convert input data into a synthetic speech;
  
  wherein said text-to-speech synthesis processing can convert said input data into a synthetic speech including at least said two speaker voices.

50. A computer readable recording medium device on which is stored a text-to-speech synthesis program which, when implemented by a computer, comprises acts of:
- selecting at least two speaker voices;
  
  transmitting speaker voice identification data for identifying said speaker voices selected in said selecting step to another apparatus;
  
  receiving phoneme data of said speaker voices corresponding to said speaker voice identification data transmitted in said transmitting step; and
  
  text-to-speech synthesis processing linking said phoneme data of said speaker voice received in said receiving step to convert input data into a synthetic speech;
  
  wherein said text-to-speech synthesis processing can convert said input data into a synthetic speech including at least said two speaker voices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Akabane, Makoto, Yamada, Keiichi, Kudo, Junichi, Shiraishi, Goro, Yano, Hajime, Tange, Akira
Primary Examiner(s)
McFadden, Susan

Application Number

US11/385,210
Publication Number

US 20060161437A1
Time in Patent Office

1,127 Days
Field of Search

704/260
US Class Current

704/260
CPC Class Codes

G06Q 30/06   Buying, selling or leasing ...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

Text-to-speech synthesis system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech synthesis system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links