Text-to-speech conversion system

US 20030074196A1
Filed: 07/19/2001
Published: 04/17/2003
Est. Priority Date: 01/25/2001
Status: Active Grant

First Claim

Patent Images

1. A text-to-speech conversion system for converting a text into a speech waveform, and outputting the speech waveform, said system comprising;

a conversion processing unit for converting a text inputted from outside into a speech waveform;

a phrase dictionary for previously registering sound-related terms to be expressed as natural sound data of actually recorded sounds; and

a waveform dictionary for previously registering waveform data corresponding to the sound-related terms, obtained from the actually recorded sounds, wherein said conversion processing unit has a function such that as for a term in the text matching a sound-related term registered in said phrase dictionary upon collation of the former with the latter, waveform data corresponding to the relevant sound-related term matching the term in the text, registered in said waveform dictionary, is outputted as a speech waveform of the term.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system according to the invention comprises a text-to-speech conversion processing unit, and a phrase dictionary as well as a waveform dictionary, connected independently from each other to the conversion processing unit. The conversion processing unit is for converting any Japanese text inputted from outside into speech. In the phrase dictionary, sound-related terms representing the actually recorded sounds, for example, notations of terms such as onomatopoeic words, background sounds, lyrics, music titles, and so forth, are previously registered. Further, in the waveform dictionary, waveform data obtained from the actually recorded sounds, corresponding to the sound-related terms, are previously registered. Furthermore, the conversion processing unit is constituted such that as for a term in the text matching the sound-related term registered in the phrase dictionary upon collation of the former with the latter, actually recorded speech waveform data corresponding to the relevant sound-related term matching the term in the text, registered in the waveform dictionary, is outputted as a speech waveform of the term.

55 Citations

View as Search Results

49 Claims

1. A text-to-speech conversion system for converting a text into a speech waveform, and outputting the speech waveform, said system comprising;
- a conversion processing unit for converting a text inputted from outside into a speech waveform;
  
  a phrase dictionary for previously registering sound-related terms to be expressed as natural sound data of actually recorded sounds; and
  
  a waveform dictionary for previously registering waveform data corresponding to the sound-related terms, obtained from the actually recorded sounds, wherein said conversion processing unit has a function such that as for a term in the text matching a sound-related term registered in said phrase dictionary upon collation of the former with the latter, waveform data corresponding to the relevant sound-related term matching the term in the text, registered in said waveform dictionary, is outputted as a speech waveform of the term.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 36, 37, 46)
- - 2. A text-to-speech conversion system according to claim 1, further comprising an application determination unit for determining whether or not the term in the text satisfies application conditions for the collation thereof with said phrase dictionary, and reading out only the sound-related term matching the term satisfying the application conditions from said phrase dictionary to said conversion processing unit.
  - 3. A text-to-speech conversion system according to claim 1, further comprising a controller for editing the registered contents of the sound-related terms registered in said phrase dictionary, and the waveform data registered in said waveform dictionary, respectively.
  - 4. A text-to-speech conversion system according to claim 1, wherein said phrase dictionary is an onomatopoeic word dictionary for registering onomatopoeic words.
  - 5. A text-to-speech conversion system according to claim 2, wherein said application conditions include a condition such that the term in the text is surrounded by quotation marks.
  - 6. A text-to-speech conversion system according to claim 2, wherein said application conditions include a condition such that a specific symbol is provided before and/or after the term in the text.
  - 7. A text-to-speech conversion system according to claim 2, wherein said application conditions include a condition such that in the case where the sound-related terms together with information on the subject thereof are registered in said phrase dictionary, there is a match between the information on the subject and the grammatical subject of the text.
  - 8. A text-to-speech conversion system according to claim 2, further comprising application conditions change means capable of changing said application conditions.
  - 36. A text-to-speech conversion system according to claim 1, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files.
  - 37. A text-to-speech conversion system according to claim 1, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files, said conversion processing unit comprising;
    - an input unit to which the text is inputted;
      
      a pronunciation dictionary for registering pronunciation of respective words;
      
      a text analyzer connected to said input unit, said pronunciation dictionary, and said phrase dictionary, for generating a phonetic/prosodic symbol string of the text by using the waveform file name of the sound-related term registered in said phrase dictionary against a term registered in both said pronunciation dictionary and said phrase dictionary among terms in the text inputted from said input unit, and by using the pronunciation of the respective words registered in said pronunciation dictionary against other terms;
      
      a speech waveform memory for storing speech element data; and
      
      a rule-based speech synthesizer connected to said speech waveform memory, said waveform dictionary, and said text analyzer, for converting respective symbols except said waveform file name, in said phonetic/prosodic symbol string, into a speech waveform with the use of said speech element data while reading out waveform data corresponding to said waveform file name from said waveform dictionary, thereby outputting a synthesized waveform consisting of the speech waveform and the waveform data.
  - 46. A text-to-speech conversion system according to claim 2, wherein said application determination unit comprises a rules dictionary for storing the application conditions, and a condition determination unit for determining whether or not said phrase dictionary is to be applied, interconnecting said conversion processing unit and said phrase dictionary.

9. A text-to-speech conversion system for converting a text into a speech waveform, and outputting the speech waveform, said system comprising;
- a conversion processing unit for converting a text inputted from outside into a speech waveform;
  
  a phrase dictionary for previously registering sound-related terms to be expressed as natural sound data of actually recorded sounds; and
  
  a waveform dictionary for previously registering waveform data corresponding to the sound-related terms, obtained from the actually recorded sounds, wherein said conversion processing unit has a function such that in the case where there is a match between a term in the text and a sound-related terms registered in said phrase dictionary upon collation of the former with the latter, waveform data corresponding to the relevant sound-related term matching the term in the text, registered in said waveform dictionary, is superimposed on a speech waveform of the text before outputted.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 38, 39, 40, 41, 42, 43, 47)
- - 10. A text-to-speech conversion system according to claim 9, further comprising an application determination unit for determining whether or not the term in the text satisfies application conditions for the collation thereof with said phrase dictionary, and reading out only the sound-related term matching the term satisfying the application conditions from said phrase dictionary to said conversion processing unit.
  - 11. A text-to-speech conversion system according to claim 9, wherein said conversion processing unit has a function of adjusting the time length of the waveform data read out from said waveform dictionary.
  - 12. A text-to-speech conversion system according to claim 11, wherein in case that the time length of the waveform data is longer than that of the speech waveform of the text, the time length is adjusted by truncating the relevant waveform data at the position where the speech waveform of the relevant text comes to the end.
  - 13. A text-to-speech conversion system according to claim 11, wherein in case that the time length of the waveform data is longer than that of the speech waveform of the text, the time length is adjusted by gradually attenuating the sound volume of the relevant waveform data so as to become zero at the position where the speech waveform of the relevant text comes to the end.
  - 14. A text-to-speech conversion system according to claim 11, wherein in case that the time length of the waveform data is shorter than that of the speech waveform of the text, the time length is adjusted by coupling together the relevant waveform data repeated in succession.
  - 15. A text-to-speech conversion system according to claim 9, further comprising a controller for editing the registered contents of the sound-related terms registered in said phrase dictionary, and the waveform data registered in said waveform dictionary, respectively.
  - 16. A text-to-speech conversion system according to claim 9, wherein said phrase dictionary is an background sound dictionary for registering background sounds.
  - 17. A text-to-speech conversion system according to claim 10, wherein said application conditions include a condition such that the term in the text is surrounded by quotation marks.
  - 18. A text-to-speech conversion system according to claim 10, wherein said application conditions include a condition such that a specific symbol is provided before and/or after the term in the text.
  - 19. A text-to-speech conversion system according to claim 10, wherein said application conditions include a condition such that in the case where the sound-related terms together with information on the subject thereof are registered in said phrase dictionary, there is a match between the information on the subject and the grammatical subject of the text.
  - 20. A text-to-speech conversion system according to claim 10, further comprising application conditions change means capable of changing said application conditions.
  - 38. A text-to-speech conversion system according to claim 9, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files.
  - 39. A text-to-speech conversion system according to claim 10, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files.
  - 40. A text-to-speech conversion system according to claim 9, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files, said conversion processing unit comprising;
    - an input unit to which the text is inputted;
      
      a pronunciation dictionary for registering pronunciation of respective words;
      
      a text analyzer connected to said input unit, said pronunciation dictionary, and said phrase dictionary, for generating a phonetic/prosodic symbol string of the text by using the waveform file name of the relevant sound-related term registered in said phrase dictionary against a term registered in both said pronunciation dictionary and said phrase dictionary among terms in the text inputted from said input unit, and by using the pronunciation of the respective words registered in said pronunciation dictionary against other terms;
      
      a speech waveform memory for storing speech element data; and
      
      a rule-based speech synthesizer connected to said speech waveform memory, said waveform dictionary, and said text analyzer, for converting respective symbols except said waveform file name, in said phonetic/prosodic symbol string, into a speech waveform with the use of said speech element data while reading out waveform data corresponding to said waveform file name from said waveform dictionary, thereby outputting the speech waveform and the waveform data concurrently.
  - 41. A text-to-speech conversion system according to claim 10, wherein the sound-related terms registered in said phrase dictionary include a notation of the relevant sound-related term, and a waveform file name corresponding to the notation, while the waveform data registered in said waveform dictionary are natural sound data of actually recorded sounds, and stored as waveform files, said conversion processing unit comprising;
    - an input unit to which the text is inputted;
      
      a pronunciation dictionary for registering pronunciation of respective words;
      
      a text analyzer connected to said input unit, said pronunciation dictionary, and said phrase dictionary, for generating a phonetic/prosodic symbol string of the text by using the waveform file name of the relevant sound-related term registered in said phrase dictionary against a term registered in both said pronunciation dictionary and said phrase dictionary among terms in the text inputted from said input unit, and by using the pronunciation of the respective words registered in said pronunciation dictionary against other terms;
      
      a speech waveform memory for storing speech element data; and
      
      a rule-based speech synthesizer connected to said speech waveform memory, said waveform dictionary, and said text analyzer, for converting respective symbols except said waveform file name, in said phonetic/prosodic symbol string, into a speech waveform with the use of said speech element data while reading out waveform data corresponding to said waveform file name from said waveform dictionary, thereby outputting the speech waveform and the waveform data concurrently.
  - 42. A text-to-speech conversion system according to claim 9, wherein said phrase dictionary is a background sound dictionary for registering a notation of respective background sounds, and a waveform file name corresponding to respective notations.
  - 43. A text-to-speech conversion system according to claim 10, wherein said phrase dictionary is a background sound dictionary for registering a notation of respective background sounds, and a waveform file name corresponding to respective notations.
  - 47. A text-to-speech conversion system according to claim 10, wherein said application determination unit comprises a rules dictionary for storing the application conditions, and a condition determination unit for determining whether or not said phrase dictionary is to be applied, interconnecting said conversion processing unit and said phrase dictionary.

21. A text-to-speech conversion system for converting a text into a speech waveform, and outputting the speech waveform, said system comprising;
- a conversion processing unit for converting a text containing lyrics, inputted from outside, into a speech waveform;
  
  a song phrase dictionary for previously registering pairs of lyrics and song phonetic/prosodic symbol strings corresponding thereto; and
  
  a song phonetic/prosodic symbol string processing unit for analyzing a song phonetic/prosodic symbol string in order to convert said song phonetic/prosodic symbol string into a synthesized speech waveform of a singing voice, wherein said conversion processing unit has a function such that as for lyrics in the text, matching lyrics registered in said song phrase dictionary upon collation of the former with the latter, a speech waveform of a singing voice, converted on the basis of the song phonetic/prosodic symbol string paired off with registered lyrics that have matched, registered in said song phrase dictionary, is outputted as a speech waveform of the relevant lyrics.
- View Dependent Claims (22, 23, 24, 25, 26, 44, 48)
- - 22. A text-to-speech conversion system according to claim 21, further comprising an application determination unit for determining whether or not the lyrics in the text satisfies application conditions for the collation thereof with said song phrase dictionary, and reading out the song phonetic/prosodic symbol string paired off with the registered lyrics matching the relevant lyrics satisfying the application conditions from said song phrase dictionary to said conversion processing unit.
  - 23. A text-to-speech conversion system according to claim 21, further comprising a controller for editing the registered contents of the lyrics, and the song phonetic/prosodic symbol string, paired off with the registered lyrics, respectively.
  - 24. A text-to-speech conversion system according to claim 22, wherein said application conditions include a condition such that the lyrics in the text is surrounded by quotation marks.
  - 25. A text-to-speech conversion system according to claim 22, wherein said application conditions include a condition such that a specific symbol is provided before and/or after the lyrics in the text.
  - 26. A text-to-speech conversion system according to claim 22, further comprising application conditions change means capable of changing said application conditions.
  - 44. A text-to-speech conversion system according to claim 21, wherein said conversion processing unit comprises:
    - an input unit to which the text is inputted;
      
      a pronunciation dictionary for registering pronunciation of respective words;
      
      a text analyzer connected to said input unit, said pronunciation dictionary, and said phrase dictionary, for generating a phonetic/prosodic symbol string of the text by using said song phonetic/prosodic symbol string registered in said song phrase dictionary against the lyrics among terms in the text inputted from said input unit, and by using the pronunciation of the respective words registered in said pronunciation dictionary against other terms;
      
      a speech waveform memory for storing speech element data; and
      
      a rule-based speech synthesizer connected to said speech waveform memory, said song phonetic/prosodic symbol string processing unit, and said text analyzer, for converting respective symbols except said song phonetic/prosodic symbol string, in the phonetic/prosodic symbol string, into a speech waveform with the use of said speech element data while collaborating with said song phonetic/prosodic symbol string processing unit and said speech waveform memory for causing said song phonetic/prosodic symbol string processing unit to generate waveform data corresponding to said song phonetic/prosodic symbol string, thereby outputting a synthesized waveform consisting of the speech waveform and the waveform data.
  - 48. A text-to-speech conversion system according to claim 22, wherein said application determination unit comprises a rules dictionary for storing the application conditions, and a condition determination unit for determining whether or not said phrase dictionary is to be applied, interconnecting said conversion processing unit and said phrase dictionary.

27. A text-to-speech conversion system for converting a text into a speech waveform, and outputting the speech waveform, said system comprising;
- a conversion processing unit for converting a text containing a music title, inputted from outside, into a speech waveform;
  
  a music title dictionary for previously registering music titles; and
  
  a musical sound waveform generator for generating a musical sound waveform corresponding to the relevant music title, wherein said musical sound waveform generator comprises a music dictionary for previously registering music data for use in performance, corresponding to the music titles registered in said music title dictionary, and a musical sound synthesizer for converting the relevant music data for use in performance into a musical sound waveform of music, and said conversion processing unit has a function such that as for a music title in the text, matching a music title registered in said music title dictionary upon collation of the former with the latter, the musical sound waveform of music corresponding to the registered music title is superimposed on a speech waveform of the text before outputted.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 45, 49)
- - 28. A text-to-speech conversion system according to claim 27, further comprising an application determination unit for determining whether or not the music title in the text satisfies application conditions for the collation thereof with said music title dictionary, and reading out only the registered music title matching the relevant music title satisfying the application conditions from said music title dictionary to said conversion processing unit.
  - 29. A text-to-speech conversion system according to claim 27, wherein said conversion processing unit has a function of adjusting the time length of the musical sound waveform sent from said musical sound synthesizer.
  - 30. A text-to-speech conversion system according to claim 29, wherein in case that the waveform length, namely, the time length of the musical sound waveform differs from the waveform length of the speech waveform of the text, said time length is adjusted with the longer of both the waveform lengths.
  - 31. A text-to-speech conversion system according to claim 29, wherein in case that the time length of the musical sound waveform is shorter than that of the speech waveform of the text, said time length is adjusted by coupling together relevant musical sound waveform data repeated in succession.
  - 32. A text-to-speech conversion system according to claim 27, further comprising a controller for editing the contents of music titles registered in said music title dictionary, and the music data for use in performance registered in said music dictionary, respectively.
  - 33. A text-to-speech conversion system according to claim 28, wherein said application conditions include a condition such that the music title in the text is surrounded by quotation marks.
  - 34. A text-to-speech conversion system according to claim 28, wherein said application conditions include a condition such that a specific symbol is provided before and/or after the music title in the text.
  - 35. A text-to-speech conversion system according to claim 28, further comprising application conditions change means capable of changing said application conditions.
  - 45. A text-to-speech conversion system according to claim 27, wherein the music titles registered in said music title dictionary include the notation of the relevant music title, and the music file name corresponding to the notation, while the music data for use in performance, registered in said music dictionary, are stored as waveform files, said conversion processing unit comprising;
    - an input unit to which the text is inputted;
      
      a pronunciation dictionary for registering pronunciation of respective words;
      
      a text analyzer connected to said input unit, said pronunciation dictionary, and said phrase dictionary, for generating a phonetic/prosodic symbol string of the text by using the music file name against the relevant music title among terms in the text inputted from said input unit, and by using the pronunciation of the respective words registered in said pronunciation dictionary against all other terms;
      
      a speech waveform memory for storing speech element data; and
      
      a rule-based speech synthesizer connected to said speech waveform memory, said musical sound waveform generator, and said text analyzer, for converting respective symbols of the phonetic/prosodic symbol string into a speech waveform with the use of said speech element data while reading out the music data for use in performance, corresponding to said music file name from said musical sound waveform generator, thereby concurrently outputting the speech waveform and the music data for use in performance.
  - 49. A text-to-speech conversion system according to claim 28, wherein said application determination unit comprises a rules dictionary for storing the application conditions, and a condition determination unit for determining whether or not said music title dictionary is to be applied, interconnecting said conversion processing unit and said music title dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LAPIS Semiconductor Co., Ltd. (ROHM Co., Ltd.)
Original Assignee
OKI Electric Industry Company Limited
Inventors
Kamanaka, Hiroki

Granted Patent

US 7,260,533 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/07 Concatenation rules

Text-to-speech conversion system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

49 Claims

Specification

Use Cases

Quick Links

Others

Text-to-speech conversion system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

49 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others