Information processing apparatus, information processing method, recording medium, and program

US 20020184004A1
Filed: 05/09/2002
Published: 12/05/2002
Est. Priority Date: 05/10/2001
Status: Active Grant

First Claim

Patent Images

1. An information processing apparatus comprising:

text input means for receiving input of text data;

first display control means for controlling display of a first display screen that aids a user to enter setting for speech synthesis;

first setting input means for receiving input of information representing the setting for speech synthesis, entered by the user with reference to the first display screen, display of which is controlled by said first display control means;

phoneme data holding means for holding at least one kind of phoneme data used for speech synthesis;

generation means for dividing the text data input via said text input means according to a predetermined rule to generate a plurality of text groups; and

speech synthesis means for executing speech synthesis using the phoneme data held in said phoneme data holding means based on the setting for speech synthesis, input via said first setting input means, to generate speech data corresponding to the text data;

wherein said first setting input means receives input of a plurality of settings for speech synthesis, and said speech synthesis means executes speech synthesis to generate speech data of different speech properties for adjacent ones of the plurality of text groups based on the plurality of settings for speech synthesis, input via said first setting input means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Two types of voice can be set for reading text data of an electronic mail. A user selects a detailed setting button associated with one of the voice types to display a voice setting window, in which setting for the voice can be made individually. A drop-down list box include preset voice types such as woman, man, child, robot, and alien, and also names of voice types corresponding to phonemes created by the user, allowing selection thereof. In relation to a voice selected from the drop-down list box, reading speed, voice pitch, and strength of stress are set according to positions of setting levers.

30 Citations

View as Search Results

14 Claims

1. An information processing apparatus comprising:
- text input means for receiving input of text data;
  
  first display control means for controlling display of a first display screen that aids a user to enter setting for speech synthesis;
  
  first setting input means for receiving input of information representing the setting for speech synthesis, entered by the user with reference to the first display screen, display of which is controlled by said first display control means;
  
  phoneme data holding means for holding at least one kind of phoneme data used for speech synthesis;
  
  generation means for dividing the text data input via said text input means according to a predetermined rule to generate a plurality of text groups; and
  
  speech synthesis means for executing speech synthesis using the phoneme data held in said phoneme data holding means based on the setting for speech synthesis, input via said first setting input means, to generate speech data corresponding to the text data;
  
  wherein said first setting input means receives input of a plurality of settings for speech synthesis, and said speech synthesis means executes speech synthesis to generate speech data of different speech properties for adjacent ones of the plurality of text groups based on the plurality of settings for speech synthesis, input via said first setting input means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. An information processing apparatus according to claim 1, further comprising speech output means for outputting the speech data generated by the speech synthesis by said speech synthesis means.
  - 3. An information processing apparatus according to claim 2, further comprising second display control means for controlling display of text corresponding to the speech output by said speech output means.
  - 4. An information processing apparatus according to claim 1, further comprising output means for outputting the speech data generated by the speech synthesis by said speech synthesis means to an external recording apparatus or an external recording medium.
  - 5. An information processing apparatus according to claim 4, further comprising format conversion means for converting the speech data from a first format, in which the speech data is represented, into a second format, which allows recording on the external recording apparatus or the external recording medium, if the first format differs from the second format.
  - 6. An information processing apparatus according to claim 1, wherein the information representing the setting for speech synthesis includes at least one of speed, voice pitch, and strength of stress for reading the phoneme data.
  - 7. An information processing apparatus according to claim 1, wherein said text input means receives input of text data corresponding to a body of an electronic mail, and said generation means generates a plurality of text groups based on whether a predetermined symbol is present at the beginning of each line in the body of the electronic mail.
  - 8. An information processing apparatus according to claim 1, wherein said text input means receives input of text data corresponding to a body of an electronic mail, and said generation means generates a plurality of text groups based on whether a predetermined symbol is present, and the number of occurrences of the symbol, at the beginning of each line in the body of the electronic mail.
  - 9. An information processing apparatus according to claim 1, wherein said text input means receives input of text data corresponding to a body of an electronic mail, and said generation means generates a plurality of text groups based on whether each portion of the body of the electronic mail is a quotation or not.
  - 10. An information processing apparatus according to claim 1, wherein said text input means receives input of text data corresponding to a body of an electronic mail written in a markup language, and said generation means generates a plurality of text groups based on tag information included in the electronic mail.
  - 11. An information processing apparatus according to claim 1, further comprising:
    - third display control means for controlling display of a second display screen that aids the user to set details of the phoneme data;
      
      second setting input means for receiving input of information representing the details of the phoneme data, entered by the user with reference to the second display screen, display of which is controlled by said third display control means; and
      
      registration means for registering the information representing the details of the phoneme data, input via said second setting input means, in said phoneme data holding means.

12. An information processing method comprising:
- a text input step of receiving input of text data;
  
  a display control step of controlling display of a display screen that aids a user to enter setting for speech synthesis;
  
  a setting input step of receiving input of information representing the setting for speech synthesis, entered by the user with reference to the display screen, display of which is controlled in said display control step;
  
  a phoneme data holding step of holding at least one kind of phoneme data used for speech synthesis;
  
  a generation step of dividing the text data input in said text input step according to a predetermined rule to generate a plurality of text groups; and
  
  a speech synthesis step of executing speech synthesis using the phoneme data held in said phoneme data holding step based on the setting for speech synthesis, input in said setting input step, to generate speech data corresponding to the text data;
  
  wherein input of a plurality of settings for speech synthesis is received in said setting input step, and speech synthesis is executed in said speech synthesis step to generate speech data of different speech properties for adjacent ones of the plurality of text groups based on the plurality of settings for speech synthesis, input in said setting input step.

13. A recording medium having recorded thereon a computer-readable program comprising:
- a text input step of receiving input of text data;
  
  a display control step of controlling display of a display screen that aids a user to enter setting for speech synthesis;
  
  a setting input step of receiving input of information representing the setting for speech synthesis, entered by the user with reference to the display screen, display of which is controlled in said display control step;
  
  a phoneme data holding step of holding at least one kind of phoneme data used for speech synthesis;
  
  a generation step of dividing the text data input in said text input step according to a predetermined rule to generate a plurality of text groups; and
  
  a speech synthesis step of executing speech synthesis using the phoneme data held in said phoneme data holding step based on the setting for speech synthesis, input in said setting input step, to generate speech data corresponding to the text data;
  
  wherein input of a plurality of settings for speech synthesis is received in said setting input step, and speech synthesis is executed in said speech synthesis step to generate speech data of different speech properties for adjacent ones of the plurality of text groups based on the plurality of settings for speech synthesis, input in said setting input step.

14. A program for having a computer execute a process comprising:
- a text input step of receiving input of text data;
  
  a display control step of controlling display of a display screen that aids a user to enter setting for speech synthesis;
  
  a setting input step of receiving input of information representing the setting for speech synthesis, entered by the user with reference to the display screen, display of which is controlled in said display control step;
  
  a phoneme data holding step of holding at least one kind of phoneme data used for speech synthesis;
  
  a generation step of dividing the text data input in said text input step according to a predetermined rule to generate a plurality of text groups; and
  
  a speech synthesis step of executing speech synthesis using the phoneme data held in said phoneme data holding step based on the setting for speech synthesis, input in said setting input step, to generate speech data corresponding to the text data;
  
  wherein input of a plurality of settings for speech synthesis is received in said setting input step, and speech synthesis is executed in said speech synthesis step to generate speech data of different speech properties for adjacent ones of the plurality of text groups based on the plurality of settings for speech synthesis, input in said setting input step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Kato, Yasuhiko, Fujimura, Satoshi, Shizuka, Utaha

Granted Patent

US 6,996,530 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/200
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

Information processing apparatus, information processing method, recording medium, and program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Information processing apparatus, information processing method, recording medium, and program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links