Speech synthesis method and apparatus for electronic system

US 9,087,512 B2
Filed: 01/10/2013
Issued: 07/21/2015
Est. Priority Date: 01/20/2012
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis method for an electronic system, the speech synthesis method comprising:

performing a text tagging process, comprising;

receiving a speech signal file, wherein the speech signal file comprises text content and prosodic information, wherein the speech signal file is a recorded file of human voice from a user to recite a text content and received by a voice input unit;

analyzing the speech signal file to obtain the prosodic information and the text content of the speech signal file, respectively; and

automatically tagging the text content and the corresponding prosodic information to obtain a text tag file; and

performing a prosody mimicking process, comprising;

combining a human voice profile and the text tag file to obtain a speech synthesis file, wherein a speech synthesis sound is produced when the speech synthesis file is broadcasted.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis method for an electronic system and a speech synthesis apparatus are provided. In the speech synthesis method, a speech signal file including text content is received. The speech signal file is analyzed to obtain prosodic information of the speech signal file. The text content and the corresponding prosodic information are automatically tagged to obtain a text tag file. A speech synthesis file is obtained by synthesizing a human voice profile and the text tag file.

Citations

10 Claims

1. A speech synthesis method for an electronic system, the speech synthesis method comprising:
- performing a text tagging process, comprising;
  
  receiving a speech signal file, wherein the speech signal file comprises text content and prosodic information, wherein the speech signal file is a recorded file of human voice from a user to recite a text content and received by a voice input unit;
  
  analyzing the speech signal file to obtain the prosodic information and the text content of the speech signal file, respectively; and
  
  automatically tagging the text content and the corresponding prosodic information to obtain a text tag file; and
  
  performing a prosody mimicking process, comprising;
  
  combining a human voice profile and the text tag file to obtain a speech synthesis file, wherein a speech synthesis sound is produced when the speech synthesis file is broadcasted.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The speech synthesis method as recited in claim 1, wherein the prosodic information comprises one of intensity, volume, pitch, and duration or a combination thereof.
  - 3. The speech synthesis method as recited in claim 1, wherein the prosody mimicking process further comprises:
    - analyzing the text content and the prosodic information and extracting the text content and the prosodic information from the text tag file.
  - 4. The speech synthesis method as recited in claim 3, after the step of analyzing the text content and the prosodic information and extracting the text content and the prosodic information from the text tag file, the speech synthesis method further comprising:
    - combining the human voice profile, the text content, and the prosodic information to obtain the speech synthesis file.
  - 5. The speech synthesis method as recited in claim 1, wherein the human voice profile comprises a plurality of human voice models.
  - 6. The speech synthesis method as recited in claim 5, wherein the human voice models of the human voice profile are utilized according to different human characters and scenarios in the text content.
  - 7. The speech synthesis method as recited in claim 1, after the step of combining the human voice profile and the text tag file to obtain the speech synthesis file, the speech synthesis method further comprising:
    - outputting the speech synthesis file through an audio output unit.

8. A speech synthesis apparatus comprising:
- a text tagging apparatus receiving a speech signal file, wherein the speech signal file comprises text content and prosodic information, and the text tagging apparatus comprises;
  
  a text recognizer analyzing the speech signal file to obtain the text content of the speech signal file, wherein the speech signal file is a recorded file of human voice from a user to recite a text content and received by a voice input unit;
  
  a prosody analyzer analyzing the speech signal file to obtain the prosodic information of the speech signal file; and
  
  a tagging device automatically tagging the text content and the corresponding prosodic information to obtain a text tag file; and
  
  a prosody mimicking apparatus receiving the text tag file and comprising;
  
  an analyzer analyzing the text tag file to obtain the text content and the prosodic information; and
  
  a speech synthesizer combining a human voice profile, the text content, and the prosodic information to obtain the speech synthesis file, wherein a speech synthesis sound is produced when the speech synthesis file is broadcasted by the speech synthesizer.
- View Dependent Claims (9, 10)
- - 9. The speech synthesis apparatus as recited in claim 8, wherein the text tagging apparatus further comprises:
    - a user'"'"'s interface displaying the text content, a plurality of functions being performed through the user'"'"'s interface, wherein the functions comprise a broadcast function, a recording function, and a learning function,when the recording function is performed, the speech signal file is received,when the learning function is performed, the speech signal file is analyzed to obtain the prosodic information of the speech signal file, the prosodic information corresponding to the text content is automatically tagged to obtain the text tag file, and the speech synthesis file is obtained by combining the human voice profile and the text tag file, andwhen the broadcast function is performed, the speech synthesis file is broadcast.
  - 10. The speech synthesis apparatus as recited in claim 8, wherein the prosodic information comprises one of intensity, volume, pitch, and duration or a combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ASUSTek Computer, Inc.
Original Assignee
ASUSTek Computer, Inc.
Inventors
Chen, Yu-Chieh, Parng, Tai-Ming, Yu, Chih-Kai, Wu, Sung-Shen
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US13/737,955
Publication Number

US 20130191130A1
Time in Patent Office

922 Days
Field of Search

704/260
US Class Current

1/1
CPC Class Codes

G10L 13/02 Methods for producing synth...

G10L 13/08 Text analysis or generation...

Speech synthesis method and apparatus for electronic system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis method and apparatus for electronic system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links