Speech synthesis with prosodic phrase boundary information

US 6,996,529 B1
Filed: 03/08/2000
Issued: 02/07/2006
Est. Priority Date: 03/15/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of converting text to speech said method comprising:

receiving an input word sequence in the form of text;

comparing said input word sequence with each one of a plurality of reference word sequence, said plurality of reference word sequences including prosodic phrase boundary information;

identifying one or more reference word sequences which most closely match said input word sequence; and

predicting prosodic phrase boundaries for a synthesized spoken version of the input text on the basis of the prosodic phrase boundary information included with said one or more most closely matching reference word sequences.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Text-to-speech conversion uses pattern-matching to predict the position of phrase boundaries in spoken output. Text input to the is analyzed to identify groups of words (known as “chunks”) which are unlikely to contain internal phrase boundaries. Both the chunks and individual words are labeled with their syntactic characteristics. Access is made to a database of sentences which also contains such syntactic labels, together with indications of where a human reader would insert minor and major phrase boundaries. The parts of the database which have the most similar syntactic characteristics are found and phrase boundaries are predicted based on the phrase boundaries found in those parts. Other characteristics may also be used in the pattern-matching process.

Citations

10 Claims

1. A method of converting text to speech said method comprising:
- receiving an input word sequence in the form of text;
  
  comparing said input word sequence with each one of a plurality of reference word sequence, said plurality of reference word sequences including prosodic phrase boundary information;
  
  identifying one or more reference word sequences which most closely match said input word sequence; and
  
  predicting prosodic phrase boundaries for a synthesized spoken version of the input text on the basis of the prosodic phrase boundary information included with said one or more most closely matching reference word sequences.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method as in claim 1 further comprising:
    - identifying clusters of words in the input word sequence which are unlikely to include prosodic phrase boundaries;
      
      wherein;
      
      said plurality of reference word sequences are further provided with information identifying such clusters of words therein; and
      
      said comparison step comprises a plurality of per-cluster comparisons.
  - 3. A method as in claim 2 wherein said per-cluster comparison comprises quantifying the degree of similarity between the syntactic characteristics of the clusters.
  - 4. A method as in claim 2 wherein said per-cluster comparison comprises quantifying the degree of similarity between the syntactic characteristics of the words within the clusters.
  - 5. A method as in claim 2 wherein said per-cluster comparison comprises measuring the difference in the number of words in the clusters being compared.
  - 6. A method as in claim 1 wherein said comparison comprises measuring the similarity in the positions of prosodic phrase boundaries previously predicted for the input word sequence and the positions of the prosodic phrase boundaries in the reference word sequences.
  - 7. A program storage device readable by a computer, said device embodying computer readable code executable by the computer to perform method steps according to claim 1.
  - 8. A signal embodying computer executable code for loading into a computer for the performance of a method according to claim 1.

9. A text to speech conversion apparatus comprising:
- a word sequence store storing a plurality of reference word sequence, said plurality of reference word sequences including prosodic phrase boundary information;
  
  a program store storing a program;
  
  a processor in communication with said program store and said word sequence store;
  
  means for receiving an input word sequence in the form of text;
  
  wherein said program is executable to control said processor to;
  
  compare said input word sequence with each one of a plurality of said reference word sequences;
  
  identify one or more reference word sequences which most closely match said input word sequence; and
  
  derive prosodic phrase boundary information for the input text on the basis of the prosodic phrase boundary information included with said one or more most closely matching reference word sequences.

10. A text to speech conversion apparatus comprising:
- receiving means arranged in operation to receive an input word sequence in the form of text;
  
  a word sequence store storing a plurality of reference word sequences, said plurality of reference word sequences including prosodic phrase boundary information;
  
  comparison means arranged in operation to compare said input text with each one of a plurality of said reference word sequences;
  
  identification means arranged in operation to identify one or more reference word sequences which most closely match said input word sequence; and
  
  prosodic phrase boundary prediction means arranged in operation to predict prosodic phrase boundaries for the input text on the basis of the prosodic phrase boundary information included with said one or more most closely matching reference word sequences.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Minnis, Stephen
Primary Examiner(s)
Lerner, Martin

Application Number

US09/913,462
Time in Patent Office

2,162 Days
Field of Search

704/258, 704/260, 704/266, 704/267, 704/268, 704/269
US Class Current

704/258
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/10 Prosody rules derived from ...

Speech synthesis with prosodic phrase boundary information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis with prosodic phrase boundary information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links