Systems and methods for selective rate of speech and speech preferences for text to speech synthesis

US 8,352,268 B2
Filed: 09/29/2008
Issued: 01/08/2013
Est. Priority Date: 09/29/2008
Status: Active Grant

First Claim

Patent Images

1. A method for customizing delivery of synthesized speech, the method comprising:

generating a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment;

obtaining user input requesting a variation in speech delivery accompanying the media asset;

in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises;

automatically detecting one or more repeated portions in the speech segment; and

automatically modifying the speech segment by performing one or more of;

(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; and

providing the customized speech segment from the server device to a user device for playback with the media asset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

663 Citations

30 Claims

1. A method for customizing delivery of synthesized speech, the method comprising:
- generating a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment;
  
  obtaining user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; and
  
  providing the customized speech segment from the server device to a user device for playback with the media asset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - shortening breaks between words within the speech segment to generate the customized speech segment.
  - 3. The method of claim 1 wherein the user input specifies one or more preferred information fields among a plurality of information fields available in the speech segment.
  - 4. The method of claim 1 wherein the user input requests at least one of fast forwarding and skipping playback of speech content at the user device.
  - 5. The method of claim 1 wherein the user input requests omission of repeated information from speech content delivered to the user device.
  - 6. The method of claim 1 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - including in the customized speech segment respective portions of the speech segment corresponding to one or more user-selected information fields, while omitting at least one field of information in the speech segment from the customized speech segment.
  - 7. The method of claim 1, further comprising:
    - detecting user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, modifying a delivery rate for a second speech segment to be delivered from the client device to the user device.
  - 8. The method of claim 1, further comprising:
    - detecting user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, customizing speech delivery for a second speech segment to be delivered from the client device to the user device.
  - 9. The method of claim 8, wherein customizing speech delivery for the second speech segment comprises at least one of:
    - (1) shortening breaks between words within the second speech segment before delivering the second speech segment to the user device, (2) truncating one or more phrases within the second speech segment before delivering the second speech segment to the user device, and (3) omitting delivery of the second speech segment to the user device.

10. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
- generate a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment;
  
  obtain user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; and
  
  provide the customized speech segment from the server device to a user device for playback with the media asset.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The computer-readable storage medium of claim 10 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - shortening breaks between words within the speech segment to generate the customized speech segment.
  - 12. The computer-readable storage medium of claim 10 wherein the user input specifies one or more preferred information fields among a plurality of information fields available in the speech segment.
  - 13. The computer-readable storage medium of claim 10 wherein the user input requests at least one of fast forwarding and skipping playback of speech content at the user device.
  - 14. The computer-readable storage medium of claim 10 wherein the user input requests omission of repeated information from speech content delivered to the user device.
  - 15. The computer-readable storage medium of claim 10 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - including in the customized speech segment respective portions of the speech segment corresponding to one or more user-selected information fields, while omitting at least one field of information in the speech segment from the customized speech segment.
  - 16. The computer-readable storage medium of claim 10, wherein the instructions further cause the one or more processors to:
    - detect user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, modify a delivery rate for a second speech segment to be delivered from the client device to the user device.
  - 17. The computer-readable storage medium of claim 10, wherein the instructions further cause the one or more processors to:
    - detect user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, customize speech delivery for a second speech segment to be delivered from the client device to the user device.
  - 18. The computer-readable storage medium of claim 17, wherein customizing speech delivery for the second speech segment comprises at least one of:
    - (1) shortening breaks between words within the second speech segment before delivering the second speech segment to the user device, (2) truncating one or more phrases within the second speech segment before delivering the second speech segment to the user device, and (3) omitting delivery of the second speech segment to the user device.

19. A system, comprising:
- one or more processors; and
  
  memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to;
  
  generate a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment;
  
  obtain user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; and
  
  provide the customized speech segment from the server device to a user device for playback with the media asset.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The system of claim 19 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - shortening breaks between words within the speech segment to generate the customized speech segment.
  - 21. The system of claim 19 wherein the user input specifies one or more preferred information fields among a plurality of information fields available in the speech segment.
  - 22. The system of claim 19 wherein the user input requests at least one of fast forwarding and skipping playback of speech content at the user device.
  - 23. The system of claim 19 wherein the user input requests omission of repeated information from speech content delivered to the user device.
  - 24. The system of claim 19 wherein customizing the speech segment by modifying selected portions of the speech segment further comprises:
    - including in the customized speech segment respective portions of the speech segment corresponding to one or more user-selected information fields, while omitting at least one field of information in the speech segment from the customized speech segment.
  - 25. The system of claim 19, wherein the instructions further cause the one or more processors to:
    - detect user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, modify a delivery rate for a second speech segment to be delivered from the client device to the user device.
  - 26. The system of claim 19, wherein the instructions further cause the one or more processors to:
    - detect user input fast forwarding or skipping playback of at least a first speech segment previously delivered to the user device; and
      
      in response to the detecting, customize speech delivery for a second speech segment to be delivered from the client device to the user device.
  - 27. The system of claim 26, wherein customizing speech delivery for the second speech segment comprises at least one of:
    - (1) shortening breaks between words within the second speech segment before delivering the second speech segment to the user device, (2) truncating one or more phrases within the second speech segment before delivering the second speech segment to the user device, and (3) omitting delivery of the second speech segment to the user device.

28. A method for customizing delivery of synthesized speech, the method comprising:
- generating a speech segment from one or more text strings associated with or identifying a media asset;
  
  obtaining user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device; and
  
  providing the customized speech segment from the server device to a user device for playback with the media asset,wherein customizing the speech segment by modifying selected portions of the speech segment further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.

29. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
- generate a speech segment from one or more text strings associated with or identifying a media asset;
  
  obtain user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device; and
  
  provide the customized speech segment from the server device to a user device for playback with the media asset,wherein customizing the speech segment by modifying selected portions of the speech segment further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.

30. A system, comprising:
- one or more processors; and
  
  memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to;
  
  generate a speech segment from one or more text strings associated with or identifying a media asset;
  
  obtain user input requesting a variation in speech delivery accompanying the media asset;
  
  in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device; and
  
  provide the customized speech segment from the server device to a user device for playback with the media asset,wherein customizing the speech segment by modifying selected portions of the speech segment further comprises;
  
  automatically detecting one or more repeated portions in the speech segment; and
  
  automatically modifying the speech segment by performing one or more of;
  
  (1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Naik, DeVang, Silverman, Kim, Bellegarda, Jerome
Primary Examiner(s)
AZAD, ABUL K

Application Number

US12/240,437
Publication Number

US 20100082344A1
Time in Patent Office

1,562 Days
Field of Search

704258-269, 704/270.1, 704/270
US Class Current

704/258
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

Systems and methods for selective rate of speech and speech preferences for text to speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

663 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for selective rate of speech and speech preferences for text to speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

663 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links