Using non-speech sounds during text-to-speech synthesis

US 8,027,837 B2
Filed: 09/15/2006
Issued: 09/27/2011
Est. Priority Date: 09/15/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

parsing text into speech units and non-speech units at a first speech unit level;

attempting to match a non-speech unit with a first audio segment;

determining that there are unmatched non-speech units at the first speech unit level;

parsing speech units adjacent to unmatched non-speech units into speech units at a second speech unit level;

attempting to match an unmatched non-speech unit having an adjacent speech unit at the second speech unit level with a second audio segment; and

creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, apparatus, methods and computer program products are described for producing text-to-speech synthesis with non-speech sounds. In general, some of the pauses or silences that would otherwise be generated in synthesized speech are instead synthesized as non-speech sounds such as breaths. Non-speech sounds can be identified from pre-recorded speech that can include meta-data such as the grammatical and phrasal structure of words and sounds that precede and succeed non-speech sounds. A non-speech sound can be selected for use in synthesized speech based on the words, punctuation, grammatical and phrasal structure of text from which the speech is being synthesized, or other characteristics.

39 Citations

View as Search Results

13 Claims

1. A method, comprising:
- parsing text into speech units and non-speech units at a first speech unit level;
  
  attempting to match a non-speech unit with a first audio segment;
  
  determining that there are unmatched non-speech units at the first speech unit level;
  
  parsing speech units adjacent to unmatched non-speech units into speech units at a second speech unit level;
  
  attempting to match an unmatched non-speech unit having an adjacent speech unit at the second speech unit level with a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.
- View Dependent Claims (2)
- - 2. The method of claim 1, where a non-speech sound includes the sound of one or more of:
    - inhalation;
      
      exhalation;
      
      mouth clicks;
      
      lip smacks;
      
      tongue flicks; and
      
      salivation.

3. A computer-readable, non-transitory storage medium having instructions stored thereon, which, when executed by a processor, causes the processor to perform operations, comprising:
- parsing text into speech units and non-speech units at a first speech unit level;
  
  attempting to match a non-speech unit with a first audio segment;
  
  determining that there are unmatched non-speech units at the first speech unit level;
  
  parsing speech units adjacent to unmatched non-speech units into speech units at a second speech unit level;
  
  attempting to match an unmatched non-speech unit having an adjacent speech unit at the second speech unit level with a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech; and
  
  augmenting the portion of synthesized speech with the first or second audio segment.

4. A system comprising:
- a processor;
  
  memory having instructions stored thereon, which, when executed by the processor, cause the processor to perform operations, comprising;
  
  parsing text into speech units and non-speech units at a first speech unit level;
  
  attempting to match a non-speech unit with a first audio segment;
  
  determining that there are unmatched non-speech units at the first speech unit level;
  
  parsing speech units adjacent to unmatched non-speech units into speech units at a second speech unit level;
  
  attempting to match an unmatched non-speech unit having an adjacent speech unit at the second speech unit level with a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.

5. A method comprising:
- parsing a text string into phrase units and non-speech units;
  
  attempting to match a non-speech unit to a first audio segment;
  
  determining that there are unmatched non-speech units;
  
  parsing phrase units adjacent to unmatched non-speech units into word units;
  
  attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, further comprising:
    - after attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment, determining that there are unmatched non-speech units;
      
      parsing word units adjacent to unmatched non-speech units into subword units;
      
      attempting to match an unmatched non-speech unit having an adjacent subword unit to a third audio segment; and
      
      augmenting the portion of synthesized speech with the third audio segment.
  - 7. The method of claim 5, where a non-speech sound includes the sound of one or more of:
    - inhalation;
      
      exhalation;
      
      mouth clicks;
      
      lip smacks;
      
      tongue flicks; and
      
      salivation.

8. A computer-readable, non-transitory storage medium having instructions stored thereon, which, when executed by a processor, causes the processor to perform operations, comprising:
- parsing a text string into phrase units and non-speech units;
  
  attempting to match a non-speech unit to a first audio segment;
  
  determining that there are unmatched non-speech units;
  
  parsing phrase units adjacent to unmatched non-speech units into word units;
  
  attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.
- View Dependent Claims (9, 10)
- - 9. The computer-readable, non-transitory storage medium of claim 8, wherein the instructions include instructions which cause the processor to perform operations, comprising:
    - after attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment, determining that there are unmatched non-speech units;
      
      parsing word units adjacent to unmatched non-speech units into subword units;
      
      attempting to match an unmatched non-speech unit having an adjacent subword unit to a third audio segment; and
      
      augmenting the portion of synthesized speech with the third audio segment.
  - 10. The computer-readable, non-transitory storage medium of claim 8, where a non-speech sound includes the sound of one or more of:
    - inhalation;
      
      exhalation;
      
      mouth clicks;
      
      lip smacks;
      
      tongue flicks; and
      
      salivation.

11. A system comprising:
- a processor;
  
  memory having instructions stored thereon, which, when executed by the processor,cause the processor to perform operations, comprising;
  
  parsing a text string into phrase units and non-speech units;
  
  attempting to match a non-speech unit to a first audio segment;
  
  determining that there are unmatched non-speech units;
  
  parsing phrase units adjacent to unmatched non-speech units into word units;
  
  attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment; and
  
  creating a portion of speech by synthesizing a portion of the text string containing speech units into speech andaugmenting the portion of synthesized speech with the first or second audio segment.
- View Dependent Claims (12, 13)
- - 12. The system of claim 11, wherein the instructions include instructions which cause the processor to perform operations, comprising:
    - after attempting to match an unmatched non-speech unit having an adjacent word unit to a second audio segment, determining that there are unmatched non-speech units;
      
      parsing word units adjacent to unmatched non-speech units into subword units;
      
      attempting to match an unmatched non-speech unit having an adjacent subword unit to a third audio segment; and
      
      augmenting the portion of synthesized speech with the third audio segment.
  - 13. The system of claim 11, where a non-speech sound includes the sound of one or more of:
    - inhalation;
      
      exhalation;
      
      mouth clicks;
      
      lip smacks;
      
      tongue flicks; and
      
      salivation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Silverman, Kim E. A., Neeracher, Matthias
Primary Examiner(s)
YEN, ERIC L

Application Number

US11/532,470
Publication Number

US 20080071529A1
Time in Patent Office

1,838 Days
Field of Search

704/260, 704/268
US Class Current

704/268
CPC Class Codes

G10L 13/02 Methods for producing synth...

G10L 21/0364 for improving intelligibility

Using non-speech sounds during text-to-speech synthesis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Using non-speech sounds during text-to-speech synthesis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links